Strange Stockfish behavior?

mcostalba · Post by **mcostalba** » Thu Mar 31, 2011 6:21 pm

Uly wrote: But the main difference in behavior here is that if 19.e4 fails low, the engine would go and search all the other alternative moves, it doesn't magically make 19.d4 the main move again like Stockfish does.

There is no magic here, you are misunderstanding what happens, please read again what Joona has written because this is the real point, you are confused by the kind of output, but what SF actually does is what all the other engine do.

For you information "to fail low in a node" it means to unsuccesfuly try _all_ the moves, not only the PV one, and this is what SF and all the other engines do when they fail low.

zamar · Post by **zamar** » Thu Mar 31, 2011 7:09 pm

Uly wrote: In this example, the resolved score won't fail low unless it's <0.45 (I've actually never seen a 0.75++ fail high with Rybka or Naum that fails that badly).

SF's score lives much more than Rybka's and Naum's, because of different pruning and reduction system. It's not a bug, it's a feature

But the main difference in behavior here is that if 19.e4 fails low, the engine would go and search all the other alternative moves.

This is exactly what SF does. There is just no clean way to report it back to user. (We just always report bestmove, but there is no clean way to tell that move which previously failed high is actually very bad, we just "magically" report the old move.)

hyatt · Post by **hyatt** » Thu Mar 31, 2011 9:49 pm

You are assuming that when a move fails high, and then fails low, that it is bad. That is a bogus assumption, as I have pointed out several times. If that is true, how can you trust _any_ of the fail highs that happen throughout the tree?

This is one of those discussions that make one go "hmmmm,..."

Do you have any data that proves that the fail low case shows the move is worse more often than if you took the fail-high at face value? This has been researched to death over the years. And the discussion always comes back to "either we trust a fail high, or we don't. If we don't, we might as well throw out alpha/beta completely since it is based explicitly on that 'trust'."

Everyone is trying to tell you that the fail high is correct far more often than it is not. Every move can appear worse as you go deeper, so there are no guarantees. But if the search fails high and you throw it away, you are no longer using "alpha/beta"...

BB+ · Post by **BB+** » Fri Apr 01, 2011 4:24 am

Is this a fair description?
1. At depth 20, move X is best with score 0.55.
2. At depth 21, move X "fails low" (returns a score worse than the aspiration window, taken here as [0.35,0.75]) with score less than 0.35.
3. Move Y then fails high with score above 0.75. The aspiration window is then changed to [0.45,1.05] or something.
4. Move Y now "fails low" with a score less than 0.45, due to search instability, hash interaction, or other assorted general nonsense.
5. And all the other moves X,W,V,U,... also score less than 0.45, so there is a general "fail low" at the root node.

Between events #1 and #2, move X is "best". Between #2 and #3, I think it still has that label, even though the score has dropped. Between #3 and #4, move Y is best. Between #4 and #5 seems to the area of dispute (go with X or Y?). And maybe the situation after #5 is also disputed.

Some engines might alternatively try to resolve the score of move X in step #2 (not a great idea IMO) before going on to other moves, though I'm not sure this solves every problem.

mcostalba · Post by **mcostalba** » Fri Apr 01, 2011 7:00 am

hyatt wrote: Do you have any data that proves that the fail low case shows the move is worse more often than if you took the fail-high at face value?

This is result of my patch that does not update best move when failing low (while insted is still updated after a fail high). It means that if a move say e4 fails high becomes the new best move, if at the research with wider window we have a fail low then we still keep e4 as best so that if we go out of time "bestmove e4" is sent to GUI.

After 11068 games 1884 - 1934 - 7250 ELO -1 (+- 3.9)

So there is no easy evidence of what is best.

Rein Halbersma · Post by **Rein Halbersma** » Fri Apr 01, 2011 8:27 am

hyatt wrote: Then perhaps "static eval" is not a good way to prune/reduce moves in the first place?

And nobody is suggesting that you "blindly play move X". If, at the start of depth N, move A is best, then B fails high and fails low on the re-search, B is the move to play. Unless C, D, ... fail high in which case _they_ become the best move. But I would _never_ reject a fail-high move. To do so rejects the entire idea behind alpha/beta. If some search idea breaks that, then the idea is not sound.

If you only trust "sound" schemes then pure alpha-beta is the way to go and any reduction scheme based on static eval or reduced searches are not sound. This of course includes aspiration windows, null-move, LMR and all the other ELO-improving heuristics. The point is not to distrust fail-high scores, but to distrust a fail-high score on an aspiration window [root_alpha, root_beta], followed by a fail-low score on [root_alpha, root_beta + more]. Aspiration windows combined with aggressive window-dependent pruning can falsely ignore deep defensive resources. That might not be "sound" in a 100% deterministic sense, but it is sound in a statistical sense.

hyatt · Post by **hyatt** » Fri Apr 01, 2011 5:41 pm

BB+ wrote:Is this a fair description?
1. At depth 20, move X is best with score 0.55.
2. At depth 21, move X "fails low" (returns a score worse than the aspiration window, taken here as [0.35,0.75]) with score less than 0.35.
3. Move Y then fails high with score above 0.75. The aspiration window is then changed to [0.45,1.05] or something.
4. Move Y now "fails low" with a score less than 0.45, due to search instability, hash interaction, or other assorted general nonsense.
5. And all the other moves X,W,V,U,... also score less than 0.45, so there is a general "fail low" at the root node.

Between events #1 and #2, move X is "best". Between #2 and #3, I think it still has that label, even though the score has dropped. Between #3 and #4, move Y is best. Between #4 and #5 seems to the area of dispute (go with X or Y?). And maybe the situation after #5 is also disputed.

Some engines might alternatively try to resolve the score of move X in step #2 (not a great idea IMO) before going on to other moves, though I'm not sure this solves every problem.

That's the general idea. There are a ton of ways to handle fail lows.

1. Instantly relax alpha and re-search (assume this is the first move at depth N+1 for now). You get a true score before you go on. Now, at least, you know how bad this move is and you can make some sort of judgement call on how much extra time you are willing to spend to find a better move (score drop = -.4, maybe 2x? score drop = -3.5 maybe 10x or you are going to lose, etc.) Crafty does it this way.

2. Keep searching other moves to see if any of them will trip over the alpha bound and produce a good score. This way you avoid the extra time to find a score for the first move when you would then spend more time to find a move better. The down side is that you might run thru the entire set of ply=1 moves with no best move, and now you have to relax beta and start over. The good news is that it is generally more efficient to fail low than to fail high. To fail low just requires one move search at ply=2, for each root move.

I have tried both, over the years, many times. I keep going back to 1. I want to know what is going on, not blindly search to see if there is something better, only to have to repeat the whole thing with a dropped alpha which will blow up the TT hits that now have a useless bound.

There are other alternatives.

3. Stanback (Zarkov) suggested that when a move fails low, you make sure that the ply=2 refutation move is stuffed back into the hash table, and then you start over at iteration 1. That hash entry will make that move fail low on each iteration, and will therefore try to find the second-best move. But with much better (iterated search) move ordering. The danger is that if that hash entry gets overwritten, you enter a loop since the original best move will pop to the top, only to fail low on the same iteration it initially failed low on. I have this on my to-do list, because when the best move fails low, it wrecks move ordering. All those hash entries suddenly become worthless since the bound is altered.

I've tried an alternative but at the time decided to not use it. Namely, use John's idea, but remove the original fail-low move from the ply-1 move list and search again from iteration 1. now that move can't fail high. Once you get back to the iteration where it failed low, you can add it back in, but after the new best move, which we hope will not fail low here. If it does, we might be in a really pathological position where everything is going to fail low. And that is always the troublesome case. For example, everyone has seen a really "deep" position where when you start your search, you see a score of +1.5 or +3.0 because it appears you can pick up material. But on the last iteration, you see what your opponent saw and realize that you can't take that safely. And your score is going to drop back to the +/- .3 range. Which means _all_ moves will fail low. And you probably don't want to have to do N iterated searches to the same depth to realize that _all_ mvoes are failing low. That might be a time killer...

But the idea of searching at the root, after you have found a best move, and another move fails high, then you relax beta and re-search and it fails low, so you throw that fail high away, is simply ridiculous. The TT can cause this. pruning/reduction decisions based on alpha/beta can cause this. If the fail high is bad at the root, why isn't it bad everywhere else?

hyatt · Post by **hyatt** » Fri Apr 01, 2011 5:46 pm

Rein Halbersma wrote:
hyatt wrote: Then perhaps "static eval" is not a good way to prune/reduce moves in the first place?

And nobody is suggesting that you "blindly play move X". If, at the start of depth N, move A is best, then B fails high and fails low on the re-search, B is the move to play. Unless C, D, ... fail high in which case _they_ become the best move. But I would _never_ reject a fail-high move. To do so rejects the entire idea behind alpha/beta. If some search idea breaks that, then the idea is not sound.
If you only trust "sound" schemes then pure alpha-beta is the way to go and any reduction scheme based on static eval or reduced searches are not sound. This of course includes aspiration windows, null-move, LMR and all the other ELO-improving heuristics. The point is not to distrust fail-high scores, but to distrust a fail-high score on an aspiration window [root_alpha, root_beta], followed by a fail-low score on [root_alpha, root_beta + more]. Aspiration windows combined with aggressive window-dependent pruning can falsely ignore deep defensive resources. That might not be "sound" in a 100% deterministic sense, but it is sound in a statistical sense.

Then as I said, one should simply take each fail high position in the tree, and re-search with a relaxed beta bound, to see how many fail low the second time around. Then you _know_ how incorrect this assumption is. I did this very test years ago, because I once had a very subtle bug that would let null-move searches cause this problem at the root. And Crafty would fail high, and then fail low, and make the move, and it was an outright blunder. But rather than just reject the move after the fail-low, I decided to debug, and found the problem. If a fail-high is not correct in a majority of the cases, then fail-high is no good. Yet we _know_ it is. This is yet another of those ideas that says you should treat the root differently from the rest of the tree. To which I would always answer "why?" And I always wait for a sound theoretical explanation, or a sound experimental explanation. And all I usually get is "sound" (as in noise).

zamar · Post by **zamar** » Fri Apr 01, 2011 6:28 pm

Then as I said, one should simply take each fail high position in the tree, and re-search with a relaxed beta bound, to see how many fail low the second time around. Then you _know_ how incorrect this assumption is.

Usually there is just no reason to do this. Root is an exception. We need to know the exact score of the move, so we need to resolve the fail-high and sometimes it just happens that while resolving we get fail-low. And you suggest that we should just ignore that completely free extra information that move could be very bad and just trust on previous fail-high. Sound very bogus for me...

If a fail-high is not correct in a majority of the cases, then fail-high is no good.

Of course fail-high is very good in majority of cases. Like having two aces is very good thing in Poker _in_majority_of_cases. But fail-low (after fail-high) is a signal that your opponent might have three kings

This is yet another of those ideas that says you should treat the root differently from the rest of the tree.

We must resolve fail-highs at root, even Crafty does that. Resolving gives us extra information. For optimum decision, you want to use all available information.

To which I would always answer "why?" And I always wait for a sound theoretical explanation, or a sound experimental explanation. And all I usually get is "sound" (as in noise).

beep beep beep

mcostalba · Post by **mcostalba** » Fri Apr 01, 2011 9:25 pm

If a fail-high is not correct in a majority of the cases, then fail-high is no good.

Ok, here is the error in the argumentation.

You are talking of the general case, but you should focus on the small subset of cases where a fail high is strictly followed by a fail low. In the majority of cases fails high are good...simply because there is no fail low following

Please let me add that approaching a case with the gross hammer of general principles makes you blind to details that are instead what really matters, what makes the difference between an idea that works and an useless one.

OpenChess

OpenChess

Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?

Re: Strange Stockfish behavior?