Here are the results of my experiment in "bestmove" matching, à la Don Dailey. I used fixed depth for a variety of reasons, notably that some engines screw up movetime, while others have polling behaviour that can sully any data in fast searches (with SMP another worry). While I much prefer the reproducibilty of 1-cpu fixed depth searches, I don't think that this should seen as a great advance in "scientific" methodology, however.
I formed a suite of 8306 positions. I did this by taking a few hundred games, and pruning all opening/endgame positions, then pruning those which had a move that was more than +0.25 (in a 1.0s search) above any others, and those for which the eval was more than 2.00 in size. Whether this is a good method to generate positions is an open matter, but hopefully it would give some control over strength issues.
Then I tested 10 engines at various depths. The determination of the proper "depth" is not a science, but I intended for it take between 2 and 4 hours to run (about 1 second per move). I restricted myself to the engine families listed below as with them I understand how to ensure that the data obtained are what is desired. [With some non-negligible but feasible effort, I could also completey isolate the evaluate() function in each of these if desired, so as to see if "bestmove" correlation and evaluate() correlation are themselves correlated].
Here are the bestmove-matching data:
Code: Select all
FR10 FR21 IH47 Ryb1 Ry12 R232 Ryb3 Gla2 SF15 SF19 Time
FR10.at.dp9 0 3920 3290 3529 3600 3581 3381 3876 3611 3528 3:36
FR21.at.dp10 3920 0 3927 4551 4478 4436 4064 4330 4248 4127 4:06
IH47c.at.dp15 3290 3927 0 4333 4423 4641 4921 3885 4370 4411 3:09
R1.at.dp10 3529 4551 4333 0 5523 5259 4552 4264 4408 4283 2:45
R12.at.dp11 3600 4478 4423 5523 0 5464 4638 4272 4468 4379 3:18
R232.at.dp11 3581 4436 4641 5259 5464 0 4840 4206 4454 4378 3:21
R3.at.dp10 3381 4064 4921 4552 4638 4840 0 4057 4434 4380 2:51
GL2.at.dp12 3876 4330 3885 4264 4272 4206 4057 0 4735 4365 2:41
SF151.at.dp13 3611 4248 4370 4408 4468 4454 4434 4735 0 5238 3:57
SF191.at.dp14 3528 4127 4411 4283 4379 4378 4380 4365 5238 0 2:35
All data and programmes are in the attached 7zip archive, which is in a semi-usable form (for instance, I #define things to be 8306 in the C code, to concord with the data size). The DEPTH needs to be given at compile time, while the engine name can be given as a command-line option. As noted, the correlation data I obtained should be entirely reproducible, though it would likely be more useful to run a similar experiment on a different set of positions (possibly pruned as above). I would usually run this via commands like:
Code: Select all
gcc -O3 -DDEPTH=\"11\" -o bestmove bestmove.c
time ./bestmove LINKS/Ryb232 < PRUNE.LIST > R232.at.dp11 &
[...]
./compare FR10.at.dp9 FR21.at.dp10 IH47c.at.dp15 R1.at.dp10 R12.at.dp11 \
R232.at.dp11 R3.at.dp10 GL2.at.dp12 SF151.at.dp13 SF191.at.dp14