Page 5 of 7
Re: value of LMR and null-move
Posted: Wed Jul 14, 2010 1:58 am
by hyatt
Sentinel wrote:hyatt wrote:Now I am lost. Did you disasble NM, FUTILITY and LMR in your test or just LMR?
My tests have been only without LMR (for stockfish)...
Both LMR and null move (but not futility pruning).
So it's a combined effect and that's the reason the difference is even higher on larger depths.
The bottom line is at 1'+1'' they should bring around 200elo combined in most of today's top programs (SF, Ippo, Rybka, even Crafty 23.3).
I will test only LMR once 4'+4'' test is finished (in 3 days time).
OK, now explain how that belongs in the discussion being held? We are trying to measure the effect of LMR only. And you provide bigger numbers than anyone else. I don't see how that furthers the discussion at all when we are talking apples, oranges and lemons. To compare, we have to compare something that makes some sort of sense. I was clear as to what I removed from Stockfish. I believe Ed was also just as clear about what he removed from Stockfish for his experiments. Throwing in null-move, with a different engine to boot, doesn't really help in the debate about what LMR is providing. We could have a discussion about turning both off, or about turning null-move off by itself, which is fine. But mixing things up is only confusing. I had thought your original numbers were surprising in light of the numbers Ed and I are seeing, and at first thought "perhaps LMR is significantly different in ip* and friends." Wrong assumption.
Re: value of LMR and null-move
Posted: Wed Jul 14, 2010 4:45 pm
by Sentinel
hyatt wrote:We are trying to measure the effect of LMR only. And you provide bigger numbers than anyone else. I don't see how that furthers the discussion at all when we are talking apples, oranges and lemons. To compare, we have to compare something that makes some sort of sense. I was clear as to what I removed from Stockfish. I believe Ed was also just as clear about what he removed from Stockfish for his experiments. Throwing in null-move, with a different engine to boot, doesn't really help in the debate about what LMR is providing. We could have a discussion about turning both off, or about turning null-move off by itself, which is fine. But mixing things up is only confusing. I had thought your original numbers were surprising in light of the numbers Ed and I are seeing, and at first thought "perhaps LMR is significantly different in ip* and friends." Wrong assumption.
Well the title of the thread (your thread) is "value of LMR
and null-move". Even your first post measured both (plus separate components).
I've started tests back then, not later in the thread when you suddenly decided to measure just LMR. And each time I stated clearly that the results are default vs. no LMR, no null move. I'm sorry you fill disappointed but you should have better read what I was writing...
And I will do just LMR test, but when I finish with the current one. For more I don't have enough computing power.
Re: value of LMR and null-move
Posted: Wed Jul 14, 2010 5:31 pm
by hyatt
Sentinel wrote:hyatt wrote:We are trying to measure the effect of LMR only. And you provide bigger numbers than anyone else. I don't see how that furthers the discussion at all when we are talking apples, oranges and lemons. To compare, we have to compare something that makes some sort of sense. I was clear as to what I removed from Stockfish. I believe Ed was also just as clear about what he removed from Stockfish for his experiments. Throwing in null-move, with a different engine to boot, doesn't really help in the debate about what LMR is providing. We could have a discussion about turning both off, or about turning null-move off by itself, which is fine. But mixing things up is only confusing. I had thought your original numbers were surprising in light of the numbers Ed and I are seeing, and at first thought "perhaps LMR is significantly different in ip* and friends." Wrong assumption.
Well the title of the thread (your thread) is "value of LMR
and null-move". Even your first post measured both (plus separate components).
I've started tests back then, not later in the thread when you suddenly decided to measure just LMR. And each time I stated clearly that the results are default vs. no LMR, no null move. I'm sorry you fill disappointed but you should have better read what I was writing...
And I will do just LMR test, but when I finish with the current one. For more I don't have enough computing power.
Yes, I had (in Crafty) measured each. Then we started to concentrate on LMR only, as per Ed's comments. Your tests are fine, but in every post I wrote I _clearly_ indicated what was being tested. The discussion started in another thread that was poorly titled for what was being discussed so the current discussion moved here (I think I started the thread).
For both LMR and NM, your numbers look pretty close to mine with Crafty. Why stockfish only drops off by 50 or so when LMR is removed is not known, yet...
Re: value of LMR and null-move
Posted: Wed Jul 14, 2010 8:52 pm
by Rebel
hyatt wrote: For both LMR and NM, your numbers look pretty close to mine with Crafty. Why stockfish only drops off by 50 or so when LMR is removed is not known, yet...
SF 1.8 vs SF 1.8 (no LMR) 15min blitz ended in +37 =56 -7 way above 50 elo.
You will argue of course but I think for this kind of search related test you need a decent TC so that the power of LMR can have its real influence as its strength comes from deeper searches.
Why not play 3000 games instead of 30,000 and multiply the TC with 10? Because of our subject at hand we are not interested in an exact elo with 2 decimals, an error-margin of +5/-5 elo is perfectly acceptable.
Ed
Re: value of LMR and null-move
Posted: Wed Jul 14, 2010 9:57 pm
by hyatt
Rebel wrote:hyatt wrote: For both LMR and NM, your numbers look pretty close to mine with Crafty. Why stockfish only drops off by 50 or so when LMR is removed is not known, yet...
SF 1.8 vs SF 1.8 (no LMR) 15min blitz ended in +37 =56 -7 way above 50 elo.
You will argue of course but I think for this kind of search related test you need a decent TC so that the power of LMR can have its real influence as its strength comes from deeper searches.
Why not play 3000 games instead of 30,000 and multiply the TC with 10? Because of our subject at hand we are not interested in an exact elo with 2 decimals, an error-margin of +5/-5 elo is perfectly acceptable.
Ed
I can probably get away with a small number of games since the two programs are not going to be within 20 of each other. However, I am running 1+1 so this will turn into 10m+10s which will be pretty long. Will start it right now...
However, last time I tried this, I tried 10s+0.1s, 1m+1s and 5m+5s and did not see any significant difference in the spread between the two versions. Let's see what happens to stockfish, again with no book or anything...
Re: value of LMR and null-move
Posted: Wed Jul 14, 2010 10:48 pm
by hyatt
OK, I am now playing 10m+10s games (a good bit slower than 15+0 blitz, but it avoids any time scrambles). It is slow going, but I am simply playing stockfish-normal against stockfish-noLMR + 4 other opponents. And then stockfish-LMR against the other 4 opponents as well. Looks like this so far:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Stockfish 1.8 64bit 2793 67 67 65 75% 2602 35%
2 Stockfish 1.8a 64bit 2728 66 66 62 60% 2626 37%
3 Toga2 2649 114 114 20 35% 2763 30%
4 Glaurung 2.2 2590 132 132 14 25% 2765 36%
5 Fruit 2.1 2436 149 149 19 13% 2758 5%
6 Glaurung 1.1 SMP 2404 146 146 22 9% 2760 9%
Since this is about 2 games per hour per node, and I am currently running on 1/2 the cluster less 6 nodes (someone else is using those) that leaves a good 50 nodes or so to run. So about 100 games per hour or so. Should hopefully have this done by tomorrow, although I will post an update every now and then. Right now +65 separates the two.
Re: value of LMR and null-move
Posted: Thu Jul 15, 2010 12:56 am
by hyatt
results pretty stable, so far:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Stockfish 1.8 64bit 2801 34 34 272 78% 2589 33%
2 Stockfish 1.8a 64bit 2742 32 32 268 66% 2612 34%
3 Glaurung 2.2 2579 58 58 86 22% 2771 36%
4 Toga2 2549 65 65 78 19% 2772 26%
5 Fruit 2.1 2472 69 69 87 13% 2772 18%
6 Glaurung 1.1 SMP 2457 66 66 103 12% 2772 17%
1.8a has no LMR, 1.8 is normal. +59 so far.
As I said, I don't find any difference at short or long time controls for this particular algorithm. Except perhaps down in the +/-10 range.
About 4 or 5 hours into the test, played just over 500 games so far...
Should be done tomorrow around noonish or so...
Just remembered, the total games is a bit misleading, because each stockfish is playing everybody else, so some of the games are counted twice if you add up the total games played, since 1.8 plays 1.8a and that shows up in a game for both 1.8 and 1.8a. So It is not playing quite as many games per hour as the above totals would represent, not that it matters.
Re: value of LMR and null-move
Posted: Thu Jul 15, 2010 2:47 am
by hyatt
Next update:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Stockfish 1.8 64bit 2807 25 25 481 78% 2589 34%
2 Stockfish 1.8a 64bit 2742 24 24 470 66% 2613 37%
3 Toga2 2566 47 47 142 21% 2775 29%
4 Glaurung 2.2 2560 45 45 152 19% 2775 32%
5 Fruit 2.1 2475 52 52 151 13% 2775 20%
6 Glaurung 1.1 SMP 2450 51 51 178 11% 2775 17%
now almost 1,000 games down, +65 for LMR over noLMR...
Re: value of LMR and null-move
Posted: Thu Jul 15, 2010 4:12 am
by hyatt
ok, now over 1,200 games into this. The error bars are down to +/-22, which means we are closing in on "the truth". Looks like the difference is now +62:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Stockfish 1.8 64bit 2812 23 23 625 79% 2587 34%
2 Stockfish 1.8a 64bit 2750 22 22 609 67% 2611 35%
3 Toga2 2571 41 41 187 21% 2781 29%
4 Glaurung 2.2 2556 41 41 199 18% 2781 31%
5 Fruit 2.1 2479 46 46 199 13% 2782 19%
6 Glaurung 1.1 SMP 2433 47 47 231 10% 2782 15%
Re: value of LMR and null-move
Posted: Thu Jul 15, 2010 5:14 am
by hyatt
+61 now, it is simply not going to change much more. Realistically even a +10 jump now would be a big one. last update tonight:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Stockfish 1.8 64bit 2814 21 21 727 79% 2587 34%
2 Stockfish 1.8a 64bit 2753 20 20 709 68% 2610 36%
3 Toga2 2567 38 38 223 20% 2783 30%
4 Glaurung 2.2 2562 38 38 228 19% 2784 31%
5 Fruit 2.1 2482 43 43 231 13% 2784 20%
6 Glaurung 1.1 SMP 2423 44 44 270 9% 2784 14%