Page 2 of 2

Re: More on similarity testing

Posted: Thu Dec 30, 2010 3:44 am
by BB+
The tester seems to very clearly identify strong correlations between the playing styles of programs and it does this better than I had hoped.
I quite agree. I discussed this back with Larry (in PMs at Rybka forum) when you were first tossing this idea around. I had thought I had a few ideas for how to tweak the search, but the robustness in eval stays. Actually, now that I think of, the later IvanHoes have some sort of "randomiser", which merely seems to perturb the eval by some amount (I'd have to check the details). Maybe I can test eval versus perturbed-eval to see how much noise one needs to create to get an effect. I also think taking (at least the open-source) engines and cross-comparing correlations from evaluate() with go movetime 100 is a useful experiment.

One thing I like about fixed depth is that there's no dispute about what the "default" level of matching is (at least w/o SMP). I'm not sure this outweighs any negatives. Given that the time alloted appears to be a secondary factor, I would opt for whichever is easier. One issue with using "stop" (which does improve on "go movetime" I agree) is how the OS does time slicing with a "waiting" process (typically I think these are 1/100 of a second in Linux). As noted in the Stockfish discussion, you can still hit a "polling" discretisation behaviour when I/O is only checked every 30K nodes and the search is taking maybe 5 times this amount. If nothing else, as with any experiment, there needs to be some quality control.

One question I have about all of this: can this detect specific overlap in evaluation features, or is it more about evaluation numerology?

Re: More on similarity testing

Posted: Thu Dec 30, 2010 3:49 am
by Sentinel
BB+ wrote:One question I have about all of this: can this detect specific overlap in evaluation features, or is it more about evaluation numerology?
As I said in my previous post it catches only material + PST.
Try the test with Ippo with only lazy eval and you'll see.

Re: More on similarity testing

Posted: Thu Dec 30, 2010 6:50 am
by kingliveson
BB+ wrote: Actually, now that I think of, the later IvanHoes have some sort of "randomiser", which merely seems to perturb the eval by some amount (I'd have to check the details). Maybe I can test eval versus perturbed-eval to see how much noise one needs to create to get an effect.
I have a little data on that. IvanHoe 0A.0C.1A (from beta 999949j source) posted on the engine's sub-forum, actually uses the randomizer combined with the pieces weight tweaked a little. It does cause it to play slightly different, but nothing significant as far as similarity play style is concerned:
X:\chess\similar>similar -r 19
------ IvanHoe 0A.0C.1A x64 (time: 100 ms) ------
 74.30  IvanhoeB49jAx64p (time: 100 ms)
 73.95  IvanHoe 9.49b x64 (time: 100 ms)
 73.55  RobboLito 0.09 x64 (time: 100 ms)
 73.50  FireBird 1.01 x64 (time: 100 ms)
 72.70  IvanHoe 9.70b x64 (time: 100 ms)
 72.15  Houdini 1.01 x64 4_CPU (time: 100 ms)
 67.35  Houdini 1.5 x64 (time: 100 ms)
 66.25  Rybka 3  (time: 100 ms)
X:\chess\similar>similar -r 12
------ IvanHoe 9.49b x64 (time: 100 ms) ------
 74.80  IvanhoeB49jAx64p (time: 100 ms)
 74.45  FireBird 1.01 x64 (time: 100 ms)
 74.30  IvanHoe 9.70b x64 (time: 100 ms)
 73.95  IvanHoe 0A.0C.1A x64 (time: 100 ms)
 73.70  RobboLito 0.09 x64 (time: 100 ms)
 73.05  Houdini 1.01 x64 4_CPU (time: 100 ms)
 68.95  Houdini 1.5 x64 (time: 100 ms)
 67.25  Rybka 3  (time: 100 ms)

Re: More on similarity testing

Posted: Sat Jan 01, 2011 2:31 pm
by Hood
Hi,

how will you answer the following question

programs with different evals and searches are choosing the same move ?

It is possible because of the different searches they are estimating different future position.