Page 2 of 2
Re: More on similarity testing
Posted: Thu Dec 30, 2010 3:44 am
by BB+
The tester seems to very clearly identify strong correlations between the playing styles of programs and it does this better than I had hoped.
I quite agree. I discussed this back with Larry (in PMs at Rybka forum) when you were first tossing this idea around. I had thought I had a few ideas for how to tweak the search, but the robustness in eval stays. Actually, now that I think of, the later IvanHoes have some sort of "randomiser", which merely seems to perturb the eval by some amount (I'd have to check the details). Maybe I can test eval versus perturbed-eval to see how much noise one needs to create to get an effect. I also think taking (at least the open-source) engines and cross-comparing correlations from
evaluate() with
go movetime 100 is a useful experiment.
One thing I like about fixed depth is that there's no dispute about what the "default" level of matching is (at least w/o SMP). I'm not sure this outweighs any negatives. Given that the time alloted appears to be a secondary factor, I would opt for whichever is easier. One issue with using "stop" (which does improve on "go movetime" I agree) is how the OS does time slicing with a "waiting" process (typically I think these are 1/100 of a second in Linux). As noted in the Stockfish discussion, you can still hit a "polling" discretisation behaviour when I/O is only checked every 30K nodes and the search is taking maybe 5 times this amount. If nothing else, as with any experiment, there needs to be some quality control.
One question I have about all of this: can this detect specific overlap in evaluation
features, or is it more about evaluation
numerology?
Re: More on similarity testing
Posted: Thu Dec 30, 2010 3:49 am
by Sentinel
BB+ wrote:One question I have about all of this: can this detect specific overlap in evaluation features, or is it more about evaluation numerology?
As I said in my previous post it catches only material + PST.
Try the test with Ippo with only lazy eval and you'll see.
Re: More on similarity testing
Posted: Thu Dec 30, 2010 6:50 am
by kingliveson
BB+ wrote: Actually, now that I think of, the later IvanHoes have some sort of "randomiser", which merely seems to perturb the eval by some amount (I'd have to check the details). Maybe I can test eval versus perturbed-eval to see how much noise one needs to create to get an effect.
I have a little data on that. IvanHoe 0A.0C.1A (from beta 999949j source) posted on the engine's sub-forum, actually uses the randomizer combined with the pieces weight tweaked a little. It does cause it to play slightly different, but nothing significant as far as similarity play style is concerned:
X:\chess\similar>similar -r 19
------ IvanHoe 0A.0C.1A x64 (time: 100 ms) ------
74.30 IvanhoeB49jAx64p (time: 100 ms)
73.95 IvanHoe 9.49b x64 (time: 100 ms)
73.55 RobboLito 0.09 x64 (time: 100 ms)
73.50 FireBird 1.01 x64 (time: 100 ms)
72.70 IvanHoe 9.70b x64 (time: 100 ms)
72.15 Houdini 1.01 x64 4_CPU (time: 100 ms)
67.35 Houdini 1.5 x64 (time: 100 ms)
66.25 Rybka 3 (time: 100 ms)
X:\chess\similar>similar -r 12
------ IvanHoe 9.49b x64 (time: 100 ms) ------
74.80 IvanhoeB49jAx64p (time: 100 ms)
74.45 FireBird 1.01 x64 (time: 100 ms)
74.30 IvanHoe 9.70b x64 (time: 100 ms)
73.95 IvanHoe 0A.0C.1A x64 (time: 100 ms)
73.70 RobboLito 0.09 x64 (time: 100 ms)
73.05 Houdini 1.01 x64 4_CPU (time: 100 ms)
68.95 Houdini 1.5 x64 (time: 100 ms)
67.25 Rybka 3 (time: 100 ms)
Re: More on similarity testing
Posted: Sat Jan 01, 2011 2:31 pm
by Hood
Hi,
how will you answer the following question
programs with different evals and searches are choosing the same move ?
It is possible because of the different searches they are estimating different future position.