Ultimate Database

Discussion about chess-playing software (engines, hosts, opening books, platforms, etc...)
Post Reply
EmptikBest
Posts: 75
Joined: Tue Jun 20, 2023 2:38 pm

Ultimate Database

Post by EmptikBest »

Greetings to all fellow members,

I gathered a bunch of databases (merged some with "type" so probably a LOT of doubles), to create what I call the "Ultimate Database".. Including:
  • Caissabase
  • CCRL 40/40
  • Chess.com Elite
  • "Complete-10min+6sec" from some website I cant remember :(
  • "Complete-60min+15sec" from some website I cant remember :(
  • Elgeance DB
  • PGN Mentor
  • lichess-bot-strong-games
  • Lichess Elite Database thanks to nikonoel! (Note it is 38GB uncompressed because doubles were not removed, I dont know how to)
  • "Top40-1min-23.12.2022" from some website I cant remember :(
  • "Turnier-NN-60+0.6_gesamt-03.06.2022" from some website I cant remember :(
Link: https://pixeldrain.com/u/s2rtpS94

Do not be fooled by the 6.86GB compressed size (It took ~40 minutes to compress at maximum compression level using 7-Zip on 28 Threads and 24GB RAM), it is 61.8 GB uncompressed...

P.S: If somebody could DM me on how to remove doubles from a PGN file and how to merge them with something faster than "type" that would be great, then I would upload a cleaned DB and probably add ICCF, FICS etc
User avatar
deeds
Posts: 1493
Joined: Wed Oct 20, 2021 9:24 pm
Location: France
Contact:

Re: Ultimate Database

Post by deeds »

Try pgn-extract to delete duplicated games, split PGN files, merge PGN files, etc.

To manually merge PGN files, under Windows environment, i always used "copy /b *.pgn merged.pgn" and sometime i load all the PGN into SCID and i export them into a big one.

FICS games
EmptikBest
Posts: 75
Joined: Tue Jun 20, 2023 2:38 pm

Re: Ultimate Database

Post by EmptikBest »

deeds wrote: Sat Sep 09, 2023 9:39 am Try pgn-extract to delete duplicated games, split PGN files, merge PGN files, etc.

To manually merge PGN files, under Windows environment, i always used "copy /b *.pgn merged.pgn" and sometime i load all the PGN into SCID and i export them into a big one.

FICS games
Thanks, I will try pgn-extract first and then will try something to do with SCID that someone on outskirts told me :)

Also thanks for the FICS games, will add those too!
EmptikBest
Posts: 75
Joined: Tue Jun 20, 2023 2:38 pm

Re: Ultimate Database

Post by EmptikBest »

deeds wrote: Sat Sep 09, 2023 9:39 am Try pgn-extract to delete duplicated games, split PGN files, merge PGN files, etc.

To manually merge PGN files, under Windows environment, i always used "copy /b *.pgn merged.pgn" and sometime i load all the PGN into SCID and i export them into a big one.

FICS games
I ran "./pgn-extract -D -o Turnier-NN-60+0.6_gesamt-03.06.2022-Filtered.pgn Turnier-NN-60+0.6_gesamt-03.06.2022.pgn", the input file (Turnier-NN-60+0.6_gesamt-03.06.2022.pgn) was 1.14 GB, but the output file (Turnier-NN-60+0.6_gesamt-03.06.2022-Filtered.pgn) was 1.40GB???
User avatar
deeds
Posts: 1493
Joined: Wed Oct 20, 2021 9:24 pm
Location: France
Contact:

Re: Ultimate Database

Post by deeds »

This happens often because pgn-extract saves the parts in a format using more characters :

pgn-extract.exe -ooutput.pgn input.pgn

Image
EmptikBest
Posts: 75
Joined: Tue Jun 20, 2023 2:38 pm

Re: Ultimate Database

Post by EmptikBest »

ANNOUNCEMENT:

FICS 2000-2012 will be added in next update, thanks to deeds! These are 116GiB unfiltered, no doubles removed, after filtering will probably be less..
ICCF 2015-2022 will be added in next update, 323MB unfiltered..

ALL comments will be removed to save space, sorry :(

If I have time, maybe I'll make a seperate Chess960 archive..

P.S: If anyone has Chess960 games/DBs to share, please send them, I will probably make a seperate archive for 960
Post Reply