Re: Designing an analysis friendly Stockfish?
Posted: Sun Feb 06, 2011 12:12 am
Thanks Jeremy, there are already results of improvement not just for analysis!
Independent Computer Chess Discussion Forum
https://open-chess.org/
Nice!Peter C wrote:
I added a UCI option to control the depth in ply it's used up to. With my default setting of 20 (which could/should be tweaked a bit), it's tad better than SF default at 40 moves/1 second.
I use the UCI optionsI've noticed that our SF tends to lose on time more often than the default at hyperbullet (40 moves/1 second), especially in the first 4-5 games. Has anyone else noticed this behavior?
Peter
Yes, that patch is applied. I've updated to the most recent version of GTB probing code locally, but haven't built it yet. I can't find a changelog, so I don't know what's new. It would be great to figure out what didn't work there, though.Peter C wrote:I'd guess it's probably a bug in the GTB probing. Is the Decemberists' patch applied?Jeremy Bernstein wrote:Here's a game:
...
Note how the PA_J+GTB finds a forced mate around move 80 or so and then loses its way by move 99. What happened?
Does the lagging 10-15 Elo include version G?fruity wrote:I'm still fiddling around. According to my tests at 20'' all the versions so far are 10-15 Elo below SF default. Best result I've found so far isgaard wrote: Any results with the longer time control game for this fix? These updates to history/killers are in version I. Against the default with the new ok_to_use_tt() SF I have a score of +235 -140 =156 from the perspective of the default version at g/24"
(1) remove the history/killer tweak. It seems to work well only for low depths. Could be reintroduced later with tweaking for low depths only, but at the moment it doesn't help to find the best PV TT code, so leave it away for now.
(2) use template and inline for ok_to_use_TT() and the following codeThis seems to be about -5 ELO compared to SF default at 20'', but needs more testing, also at longer tc.Code: Select all
template <NodeType PvNode> inline bool ok_to_use_TT(const TTEntry* tte, Depth depth, Value alpha, Value beta, int ply) { Value v = value_from_tt(tte->value(), ply); return PvNode ? tte->depth() >= depth && tte->type() == VALUE_TYPE_EXACT && v < beta && v > alpha : ( tte->depth() >= depth || v >= Max(value_mate_in(PLY_MAX), beta) || v <= Min(value_mated_in(PLY_MAX), alpha) ) && ( ((tte->type() & VALUE_TYPE_LOWER) && v >= beta) || ((tte->type() & VALUE_TYPE_UPPER) && v <= alpha) ); }
It seems there is still work left to find a real neat solution regarding the playing strength.
OK. 88. Kf5 is suboptimal, dropping us from +M37 after 87.Dd5 back to +M41. 90.Kg6 drops us back to +M52 from +M39. 95.Kg4 +M49 instead of Ke4 +M47 and so on. It's one step forward, five steps backward.Jeremy Bernstein wrote:Yes, that patch is applied. I've updated to the most recent version of GTB probing code locally, but haven't built it yet. I can't find a changelog, so I don't know what's new. It would be great to figure out what didn't work there, though.Peter C wrote:I'd guess it's probably a bug in the GTB probing. Is the Decemberists' patch applied?Jeremy Bernstein wrote:Here's a game:
...
Note how the PA_J+GTB finds a forced mate around move 80 or so and then loses its way by move 99. What happened?
Here's the where the trouble starts.
I'm not sure what's going on, in fact, Kf4 might be the right move (checking it on the Shredder online TB now Kf4 is +M51). But at some point, Stockfish is making moves which are wrong, and it loses track of the forced mate.
My results were:gaard wrote: Does the lagging 10-15 Elo include version G?
I have tested this at g/2' and could not find a significant difference from default,
and neither could Ingo using a time control of 5'+3", granted with only 600 games.
That did indeed help. Thanks.fruity wrote:Minimum Thinking Time=2
Emergency Base Time=20
Emergency Move Time=7
Code: Select all
Score of Stockfish 2.0.1 PA GTB vs Stockfish 2.0.1: 110 - 188 - 202
ELO difference: -55
Can I suggest you run your tests until some minimum level of significance has been reached, for example,fruity wrote:My results were:gaard wrote: Does the lagging 10-15 Elo include version G?
I have tested this at g/2' and could not find a significant difference from default,
and neither could Ingo using a time control of 5'+3", granted with only 600 games.
Version G: 616 games/20'' 6 threads per engine: +93, -120, =403 (-15 Elo)
Version H: 1187 games/20'' 6 threads per engine: +210, -243 =734 (-10 Elo)
You can and you are right, but at that tc and threads usage I won't afford to spent all the neccessary time in testing.gaard wrote:Can I suggest you run your tests until some minimum level of significance has been reached, for example,fruity wrote:My results were:gaard wrote: Does the lagging 10-15 Elo include version G?
I have tested this at g/2' and could not find a significant difference from default,
and neither could Ingo using a time control of 5'+3", granted with only 600 games.
Version G: 616 games/20'' 6 threads per engine: +93, -120, =403 (-15 Elo)
Version H: 1187 games/20'' 6 threads per engine: +210, -243 =734 (-10 Elo)
2.6-sigma for 99% confidence? IMO, the Elo differences are not very useful with a single opponent at
very fast time controls.
Version G: 616 games/20'' 6 threads per engine: +93, -120 (-1.85-sigma)
Version H: 1187 games/20'' 6 threads per engine: +210, -243 (-1.55-sigma)
It's clear that we are not 200 Elo weaker, but the goal should be to be slightly better, not slightly weaker. Maybe it's not possible or very hard at least to reach that goal, mainly because of the fact that TT entries doesn't contain path information, the main reason why SF default doesn't use PV TT hits. I have a few ideas to test left before I give up. But the project at this point is success anyway.Peter C wrote:That did indeed help. Thanks.fruity wrote:Minimum Thinking Time=2
Emergency Base Time=20
Emergency Move Time=7
I ran a 500 game 40/1 test using cutechess-cli between the latest Stockfish PA GTB git and the default Stockfish (my own compile, with the same settings as the SF git compile). ProbeOnlyAtRoot was on for SF PA GTB.
Results:Not tooooo bad. This is probably statistically meaningless anyway, but at least we know we aren't like 200 elo weaker....Code: Select all
Score of Stockfish 2.0.1 PA GTB vs Stockfish 2.0.1: 110 - 188 - 202 ELO difference: -55
Peter