Yes, IMHO, it is the only way to reliable make progress, but I am a weak chess player, so my opinion could be biased.thorstenczub wrote: the question for me was: is this incest testing really allowing to make progress ?
in old days of computerchess we generated games (of course much longer time controls)
and WATCHED the games and were looking for errors.
then programmer tried to fixed it, and the new engine was again on the autoplayers.
The key point is the position looked by you and by your engine is completly different.
I mean, you look at one position and evaluate weak and strong points, attack possibilities and so on, then your engine makes a move that you judge weak and perhaps the match gives you right because that move leads to a lose.
The point is that the position that you looked at is _not_ the same at which your engine looked. You engine looked (and evaluated) at milions or tens of milions of positions at 20-25 plies deeper in the search, the fanny thing is that none of that milions and milions of positions is _your_ starting positions at which you were looking at.
So the bottom line is that you have _no_ clue, just looking at one (the beginning) position why your engines played a weak move.
I think the testing methodology of KOMODO is the right one and very similar to our: we play only one game at a time, so to don't introduce noise due to many games on different CPU's at the same time, and with a little bigger TC: 1 minute instead of 30", but we never look at the games and we _only_ trust tests results even if Joona is a very good chess player.