Strange Stockfish behavior?

Code, algorithms, languages, construction...
Post Reply
hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Strange Stockfish behavior?

Post by hyatt » Fri Apr 01, 2011 10:16 pm

zamar wrote:
Then as I said, one should simply take each fail high position in the tree, and re-search with a relaxed beta bound, to see how many fail low the second time around. Then you _know_ how incorrect this assumption is.
Usually there is just no reason to do this. Root is an exception. We need to know the exact score of the move, so we need to resolve the fail-high and sometimes it just happens that while resolving we get fail-low. And you suggest that we should just ignore that completely free extra information that move could be very bad and just trust on previous fail-high. Sound very bogus for me...
Sorry. If it sounds bogus, you simply don't understand what is going on... Handle it however you like. It is not a huge elo gain, but it will lose you an extra game here and there when you miss a fail high that you should have found... Far be it from me to tell you how to write your code, I am just telling you that what you are doing is theoretically (and practically) unsound...


If a fail-high is not correct in a majority of the cases, then fail-high is no good.
Of course fail-high is very good in majority of cases. Like having two aces is very good thing in Poker _in_majority_of_cases. But fail-low (after fail-high) is a signal that your opponent might have three kings
Or, more likely, an indication that a fail-high TT entry (that helped you find the fail high) is no longer valid after you change the beta bound, and your search can't see deeply enough to find the better move. Or that during the re-search, you overwrote something that would have helped you get a real score, had you kept it in the TT.

Etc...


This is yet another of those ideas that says you should treat the root differently from the rest of the tree.
We must resolve fail-highs at root, even Crafty does that. Resolving gives us extra information. For optimum decision, you want to use all available information.
My point exactly. The fail high is information you are throwing away.
To which I would always answer "why?" And I always wait for a sound theoretical explanation, or a sound experimental explanation. And all I usually get is "sound" (as in noise). :)
beep beep beep

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Strange Stockfish behavior?

Post by hyatt » Fri Apr 01, 2011 10:22 pm

mcostalba wrote:
If a fail-high is not correct in a majority of the cases, then fail-high is no good.
Ok, here is the error in the argumentation.

You are talking of the general case, but you should focus on the small subset of cases where a fail high is strictly followed by a fail low. In the majority of cases fails high are good...simply because there is no fail low following ;-)
That is so badly flawed it is almost not worth commenting on, but here goes, anyway. Why don't you get fail lows at other fail highs? Because you stop the search immediately. At the root there is absolutely no requirement you resolve the score on a fail high. Belle worked like that, as did early versions of Cray Blitz when PVS was first mentioned, starting around 1978.

So you don't test fail highs anywhere else, except at the root, and you conclude that _only_ for the root, where you do test the fail-high with a re-search and it fails low, you should ignore it? :)

First, try the experiment. I did, already. "test" those search fail highs that are not at the root, just for fun, and count how many fail low vs how many produce a good score. Yes it will be way slower. But then you get _real_ data and you will see that some of those fail highs will _also_ fail low. You just don't normally notice because you don't do the re-search except at the root.

Again, why is the root different? It is a flawed idea...


Please let me add that approaching a case with the gross hammer of general principles makes you blind to details that are instead what really matters, what makes the difference between an idea that works and an useless one.
I agree 100%. But I am not blind to the details. I have actually _tested_ this, as I previously mentioned, tracking down an interesting and subtle bug years ago...

mcostalba
Posts: 91
Joined: Thu Jun 10, 2010 11:45 pm
Real Name: Marco Costalba

Re: Strange Stockfish behavior?

Post by mcostalba » Sat Apr 02, 2011 9:32 am

hyatt wrote:
mcostalba wrote:
If a fail-high is not correct in a majority of the cases, then fail-high is no good.
Ok, here is the error in the argumentation.

You are talking of the general case, but you should focus on the small subset of cases where a fail high is strictly followed by a fail low. In the majority of cases fails high are good...simply because there is no fail low following ;-)
That is so badly flawed it is almost not worth commenting on, but here goes, anyway. Why don't you get fail lows at other fail highs? Because you stop the search immediately.
I never mentioned other nodes apart from root. Did I ? You are introducing a new stuff that just blurs the discussion. Let's keep focused: we are talking of root nodes.

I have said that most of fail high in root nodes do not experience a strictly following fail low. So your statement:"fails high are good more then not" should be tested not against all the fail high that occurs in root nodes but only in the subset of fail high (at root node) strictly followed by a fail low.

I think this time you really cannot escape ;-) but actually you had already surprised me in the past many times with an incredible ability to steer the argumentations where it is more convenient for you :-)

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Strange Stockfish behavior?

Post by hyatt » Sat Apr 02, 2011 4:30 pm

Did you not say "we only do this at the root?" That certainly, by direct implication, mentions "non-root moves".

If you read what I wrote, I was implicitly addressinng fail-highs followed by fail-lows since that is the topic of the thread.

What I suggested doing was to expand the test into the tree. At every fail high, rather than terminating the search, test to see how often you get a fail-low if you increase beta and re-search. It will happen. No more frequently than when you get a fail-high/fail-low at the root, but no less either. yet you take that fail-high inside the search every time it happens, no questions asked, no verification to determine if it will fail low. Why inside the tree but not at the root?

If you have a good way to actually measure the accuracy of the fail-high/fail-low cases, by doing something _really_ deep on the original and the fail-high move to see which is better, then run it. I don't see any theoretically sound way of doing that.

Again, if you fail high and are wrong, you have a bug, plain and simple. And that bug is affecting you throughout the tree, not just at the root. It just so happens that you _see_ it happening at the root.

I don't know of any other program that ignores a fail-high at the root, regardless of what happens on the re-search. There's a reason...

There is nothing to "escape" from here. If you want to use an unsound idea, that's your choice. Others have told you this is broken by observation, I have told you it is broken because I actually tested it a few years back chasing just such a bug that was called by a subtle bug in null-move searches. Run whatever tests you want. But at least _do_ run some tests. And not on a few positions. I suppose I could run the test myself, again, but doing just what you do. But I already know the result, although I don't remember the exact number in terms of Elo. It is small. It is non-zero. Take it or leave it...

mcostalba
Posts: 91
Joined: Thu Jun 10, 2010 11:45 pm
Real Name: Marco Costalba

Re: Strange Stockfish behavior?

Post by mcostalba » Sat Apr 02, 2011 7:51 pm

hyatt wrote: If you have a good way to actually measure the accuracy of the fail-high/fail-low cases, by doing something _really_ deep on the original and the fail-high move to see which is better, then run it. I don't see any theoretically sound way of doing that.

Again, if you fail high and are wrong, you have a bug, plain and simple. And that bug is affecting you throughout the tree, not just at the root. It just so happens that you _see_ it happening at the root.
If we accept that a fail high is good, or at least, it has to be considered good, then why do we research in aspiration window after a fail high ? We should just research only after a fail low, instead after a fail high we could just move to the next iteration....but it is not like this that aspiration window works.

BTW I _have_ tested, please see some post before, always in this thread.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Strange Stockfish behavior?

Post by hyatt » Sun Apr 03, 2011 5:38 am

Again, and I do mean "again". There is _nothing_ that says you must re-search after the first fail high at the root. Original PVS as implemented in cray-blitz/blitz/belle did not re-search unless a _second_ move failed high. Now you have two moves and you have to research them to resolve which is best. Simple approach was to re-search first fail-high move to get a real score, then re-search second fail-high move on a null-window. If it fails low, the first fail-high move was better. If the second fails high, it is better although you don't know by how much. You then wait until you get _another_ fail-high move before you try to resolve the second, and you repeat this. the last fail-high move was never resolved...

The only problem is there is no PV. And unless you have some tricks up your sleeve (I try the PV, if there is none, which can happen on the fail-high/fail-low case, I try the hash move, if there is none, I do a short search for the opponent's side to find a move to ponder. Else you sit and wait.

There are other possible benefits to resolving the fail-high when you get it. At the next iteration, the move might legitimately fail low due to a deeper tactic. It is nice to know, when a move fails low, how much the score drops, so that you have some idea of how much extra time you are willing to commit to finding a better move...

Your last comment makes no sense (after the fail-high we could just go to the next iteration) unless you mean what I explained above. On a single fail-high, you are absolutely correct that you don't have to re-search if you don't want to. In Cray Blitz, if the first move failed high, and then failed low, I just kept the old PV from the previous search. If a later move failed high and then failed low, I just could not use the PV to order the next iteration since I didn't have one. I eventually changed to the "resolve at the fail-high" for several reasons..

(1) you just failed high. The TT is loaded with move ordering information from that search. Doing it again _now_ gives us the best chance to maximize move ordering accuracy before those entries get overwritten by searching other moves. (2) I always want a score, so that when I fail low on the first move at the next iteration, I need to know how far the score dropped to adjust how much time I was willing to commit to finding a better move; (3) I want a good move to ponder. (4) I want a good PV to order the next iteration...

Whether those are all worth the overhead is not something I have accurate data on. If you don't re-search, you save a lot of search space. But then you have less accurate ordering info for the next iteration (no PV), nor do you have an accurate idea of the actual score so that an aspiration window will be as high as possible...

User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: Strange Stockfish behavior?

Post by Uly » Sun Apr 10, 2011 1:03 am

mcostalba wrote:After 11068 games 1884 - 1934 - 7250 ELO -1 (+- 3.9)
When I created this thread, I didn't have elo in mind, but improving analysis friendliness like in the other thread.

Currently:

IF - A move fails high, and then fails low, but it is best, Stockfish will not find it unless the user forces the move.

This causes the user to either miss the move or have to interact, if the solution costs 1 elo then it's very reasonable! It would be a shame if a future version of Stockfish didn't show the fail high until it stood up, as then the user has nothing to force, and miss the move.

Anyway, I'm back, I'm going to try to find a position where the issue is reproducible at 1CPU and compare it with the version Jeremy sent me.

User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: Strange Stockfish behavior?

Post by Uly » Sun Apr 10, 2011 1:07 am

BB+ wrote:Is this a fair description?
1. At depth 20, move X is best with score 0.55.
2. At depth 21, move X "fails low" (returns a score worse than the aspiration window, taken here as [0.35,0.75]) with score less than 0.35.
3. Move Y then fails high with score above 0.75. The aspiration window is then changed to [0.45,1.05] or something.
4. Move Y now "fails low" with a score less than 0.45, due to search instability, hash interaction, or other assorted general nonsense.
5. And all the other moves X,W,V,U,... also score less than 0.45, so there is a general "fail low" at the root node.
The part "2. At depth 21, move X "fails low" (returns a score worse than the aspiration window, taken here as [0.35,0.75]) with score less than 0.35" is not necessary, I've seen instances when a move fails high and then is abandoned, even if the main move didn't fail low, and the abandoned move is best at same relative depth when forced (though this is really rare, I think I've seen 2 cases...)

User avatar
Uly
Posts: 838
Joined: Thu Jun 10, 2010 5:33 am

Re: Strange Stockfish behavior?

Post by Uly » Sun Apr 10, 2011 2:31 am

Turns out constructing a position that shows the problem in all its infamy isn't that hard: