Page 1 of 1

DiscoCheck 5.0 test

Posted: Sat Oct 26, 2013 12:51 pm
by lucasart
I measured +42 elo in self play (against previous 4.3 version) by running 10,000 games in 10"+0.1". What I'm curious to see now, is whether this improvement scales well at long time control, and against foreign opponents.

First match against Gaviota 0.86:
* 1,000 games in 60"+0.6", hash=64, ponder=off.
* book = 8moves GM from Adam Hair (48491 starting positions), playing each position with both sides.
* single threaded: playing 7 games concurrently with cutechess-cli on my 8 core CPU.

Result is good, and confirms that elo gain is the same at long TC against Gaviota, than at short TC in self play:

Code: Select all

discocheck_5.0.1_sse4.2 vs gaviota-0.86-linux64: 446 - 290 - 264  [0.578] 1000
ELO difference: 55
Here are the games decided before 25 moves (a nice selection of tactical shots):


Next opponent Stockfish 1.4 :o

Re: DiscoCheck 5.0 test

Posted: Sun Oct 27, 2013 2:17 am
by lucasart
Second match against Stockfish 1.4, in the same conditions:
* 1,000 games in 60"+0.6", hash=64, ponder=off.
* book = 8moves GM from Adam Hair (48491 starting positions), playing positions sequentially with both sides.
* single threaded: playing 7 games concurrently with cutechess-cli on my 8 core CPU.

Unsurprisingly, DC lost this match, but not ridiculously so:

Code: Select all

discocheck_5.0.1_sse4.2 vs stockfish_1.4: 276 - 372 - 352  [0.452] 1000
ELO difference: -33
The results of these two matches indicate that DC 5.0 is about at the level of Nemo 1.01 and Texel 1.02 now.

Here are the miniatures of this match (games decided in less than 25 moves):

Re: DiscoCheck 5.0 test

Posted: Sun Oct 27, 2013 2:43 am
by BB+
One thing that I like that you are doing is having 60s+0.6, rather than some micro-increment. In general, I think 1% is about right like 5m+3s. It seems that 1m+1s is a standard, and LK used 2m+1s, which are again in the same ballpark, unlike the 60s+0.05 that Stockfish uses. Obviously it is speculation as to whether the micro-increment makes time management more important, or de-emphasizes endgames (especially vis-a-vis TBs, maybe exaggerating their effect), or whatnot.

Re: DiscoCheck 5.0 test

Posted: Sun Oct 27, 2013 3:30 am
by lucasart
BB+ wrote:One thing that I like that you are doing is having 60s+0.6, rather than some micro-increment. In general, I think 1% is about right like 5m+3s. It seems that 1m+1s is a standard, and LK used 2m+1s, which are again in the same ballpark, unlike the 60s+0.05 that Stockfish uses. Obviously it is speculation as to whether the micro-increment makes time management more important, or de-emphasizes endgames (especially vis-a-vis TBs, maybe exaggerating their effect), or whatnot.
Yes, I don't want engines to play stupidly in the endgame because of time pressure. Endgame is important, and playing fast games already lessens the importance of endgames (lots of games decided in the middle game due to tactics), so I don't want to neglect endgame even more due to zero increment. And I want it to be comparable to rating lists (like 5'+3") because time management code is likely to be dependant on the ratio time/inc (I use a ratio time/inc=100 here).

That being said, this is based on personal taste. I don't know if results would be any different if I used 60"+0.05".

To mittigate the time wastage, I use the following adjudication rules:
  • draw: if for 5 consecutive moves the score (of both engines) is less than 40cp in absolute value, and we have passed move number 50 of the game, then adjudicate as a draw. Of course, this rule makes some errors, but in practice such errors rare enough not to modify the stats measurably.
  • resign: if for 3 consecutive moves the losing side shows scores less than -700cp, it will resign. Again, this makes some errors, such as desperados themes (crazy rook draw), where the forced repetition can be pushed further than the horizon. Very rare in practice, so saves quite a bit of time and does not modify the stats measurably.