OpenChess

Posted: **Fri Dec 27, 2013 10:41 am**

The problem with all opening books and opening sets is that they are generally compiled from GM games

These include lots of blunders, and most books are not systematically comp filtered.
Human games have very poor diversity, but instead go deep along main lines.

What I want to test my engine as thouroughlly as possible, is an opening set, that is really shallow and really diversified. So I decided to make it using nothing more than my chess engine DiscoCheck. I just coded the following trivial algorithm:

Code: Select all

read input stream line by line (each line is assumed to be a FEN):
    for each FEN, generate legal moves from that position:
        for each legal move:
            play the move
            if the result of a 12 ply search is within acceptable bounds +/-70cp:
                print the resulting FEN into the output stream
            undo the move

That's it! It's all we need (and a little bit of Linux shell, of course).

We start with a file book0 containing a single line (starting position):

Code: Select all

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

We apply the algorithm and the resulting file book1 looks like this:

Code: Select all

rnbqkbnr/pppppppp/8/8/8/N7/PPPPPPPP/R1BQKBNR b KQkq - 1 1
rnbqkbnr/pppppppp/8/8/8/2N5/PPPPPPPP/R1BQKBNR b KQkq - 1 1
rnbqkbnr/pppppppp/8/8/8/5N2/PPPPPPPP/RNBQKB1R b KQkq - 1 1
rnbqkbnr/pppppppp/8/8/8/7N/PPPPPPPP/RNBQKB1R b KQkq - 1 1
rnbqkbnr/pppppppp/8/8/8/P7/1PPPPPPP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/8/1P6/P1PPPPPP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/8/2P5/PP1PPPPP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/8/3P4/PPP1PPPP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/8/4P3/PPPP1PPP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/8/5P2/PPPPP1PP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/8/6P1/PPPPPP1P/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/8/7P/PPPPPPP1/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/P7/8/1PPPPPPP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/1P6/8/P1PPPPPP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/2P5/8/PP1PPPPP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/3P4/8/PPP1PPPP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/5P2/8/PPPPP1PP/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/6P1/8/PPPPPP1P/RNBQKBNR b KQkq - 0 1
rnbqkbnr/pppppppp/8/8/7P/8/PPPPPPP1/RNBQKBNR b KQkq - 0 1

There are 20 positions in the 1-ply opening set, because the 20 legal moves of the opening position all lead to an acceptable position (12 ply search within acceptable bounds). This is basically the same as perft(1).

We now iterate again, applying the algorithm to book1 and the resulting file book2 has 374 lines:

Code: Select all

$ wc -l ./book2
374 ./book2

Of course, 374 depends on the engine used. Some blunders have already been filtered, because perft(2)=400 (26 blunders filtered).

We iterate again, to obtain book3 with 7362 lines:

Code: Select all

$ wc -l ./book3
7362 ./book3

Now, for the first time, we have transpositions, so the book3 FENs are not unique. There are 5484 unique ones:

Code: Select all

$ sort -u ./book3 |wc -l
5484

So we put the unique enumeration of the FENs in book3.u and iterate on book3.u to produce book4. Then we uniquely enumerate book4 into book4.u. And so on, ad nauseam.

I'm currently generating book4. Not finished yet. I'll put it for download once finished. I'm expecting around 100k unique starting positions from 2 moves only (4 plies).

Posted: **Fri Dec 27, 2013 11:29 am**

Nice work Lucas

How many cores it can handle?
I mean to generate this opening set from 4 to 5 ,6 ,7 and 8plies does 100,200cores help?!

Is it like generating tablebases..

JP.

Posted: **Fri Dec 27, 2013 12:41 pm**

Vael Jean-Paul wrote: How many cores it can handle?

Only 1 core. DiscoCheck is not SMP capable. And the trivial algorithm described above is not parallelized either (although it should not be too hard to parralelize it).

I mean to generate this opening set from 4 to 5 ,6 ,7 and 8plies does 100,200cores help?!

You would have to parallelize the book generation algorithm. It shouldn't be too hard, but it would definitely take much more work than the few trivial lines of code I wrote for this. I'm not interested in pushing it further than 4 plies. All I'm looking for is a opening set as shallow and diversified as possible. Shallowness IS the aim here: an opening set that allows to really test an engine's opening ability without book, even in the most unorthodox positions.

Is it like generating tablebases..

Perhaps on a metaphoric level, but nothing more. Generating tablebase is technically very different, and much more complicated.

Posted: **Fri Dec 27, 2013 1:35 pm**

OK, I've finished generating the 4 ply opening set: 97782 positions

Code: Select all

$ wc -l ./book4
97782 ./book4

Now, I need to filter for uniqueness, and a simple unique sort isn't enough. The prolem is that we can have quasi-duplicates like so:

Code: Select all

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 4 3

These two positions are the same: take the starting position and play 1. Nf3 Nf6 2. Ng1 Ng8. In theory they are different because the 50-move counter isn't the same, but in practice that's not a meaningful difference, so we need to trim off these useless counters to properly get rid of duplicates:

Code: Select all

$ cut -d\  ./book4 -f1,2,3,4 | sort -u > ./book4.u

It appears that there are much more transpositions at this 4-th iteration:

Code: Select all

$ wc -l ./book4.u 
52967 ./book4.u

Anyway, we still have 52967 unique positions at ply=4. File attached (EPD format).

Posted: **Sat Dec 28, 2013 1:25 am**

lucasart wrote:Anyway, we still have 52967 unique positions at ply=4. File attached (EPD format).

By construction the positions appear in the order of move generation. This order is of course very artificial, so make sure to tell your GUI to load the openings in random order when using this opening set. If your GUI cannot do that, then just randomize the file directly, like so:

Code: Select all

$ shuf ./book4.u > ./book4.u.s

Posted: **Sat Dec 28, 2013 5:50 am**

I only have a small percentage of the positions analyzed, but quite a few have very polar scores.
A sample is attached.

Posted: **Sat Dec 28, 2013 7:53 am**

Indeed. I can only do a shallow search (12 plies) because there are so many positions to analyze. So I will do it again with a threshold of 50cp instead of 70cp.

But then I will have to double check the positions with a deeper search. Consider this a first draft. I'll get there, eventually

Posted: **Sat Dec 28, 2013 8:15 am**

I will run some analysis over the weekend.
I have your whole data set done to 18 plies, and I will have a significant portion done to around 28 plies by Monday.

Posted: **Sat Dec 28, 2013 12:16 pm**

I change the generation algorithm to prune quiet moves that do not improve the static eval. This condition ensures that moves are of a developping nature. For example you wouldn't see openings like:
1. Nc3 Nc6 2. Nb1 Nb8
I've generated the first 4 iterations using depth=13 and 45cp threshold (instead of the previous 70cp). In combination with the eval improving condition, this helps reduce significantly the branching factor:

Code: Select all

$ wc -l ./book*.epd
      1 ./book0.epd
     20 ./book1.epd
    230 ./book2.epd
   2041 ./book3.epd
  17926 ./book4.epd

I'm now running the 5-th iteration, which should get me 150k-200k unique positions. Then I'll filter those with Critter to get a second opinin engine (and a deeper search).

Posted: **Sat Dec 28, 2013 9:45 pm**

Suggestion:
Instead of continuing forward at each phase, filter out the crap at each prior step.
Since the expansion is exponential, filtering out things at stage 1 and 2 will be worth a lot by stage 5.

And if a position loses a queen along the way, and nobody tested for it, we would not really achieve the final position in real life anyway.

Just a thought.

Some polar positions from your reduced set are attached.

OpenChess

Computer generated Opening set

Computer generated Opening set

Re: Computer generated Opening set

Re: Computer generated Opening set

Re: Computer generated Opening set

Re: Computer generated Opening set

Re: Computer generated Opening set

Re: Computer generated Opening set

Re: Computer generated Opening set

Re: Computer generated Opening set

Re: Computer generated Opening set