Creating a Gauntlet for testing

Code, algorithms, languages, construction...
Post Reply
halestorm
Posts: 19
Joined: Wed Jun 16, 2010 2:48 pm

Creating a Gauntlet for testing

Post by halestorm » Wed Jul 07, 2010 4:59 am

I want to create a gauntlet of top engines. Inspired by this site, I want to create a murderer's row against which I can pit a new engine to see how it stacks up. After testing, for a couple of days at 1+1, of course.

Questions:

Does it make sense to use the 'older' engines on the website? Yes, we know the ELOs, tehy are tested, true. No, since most newer engines have improved, they will beat these older engines, skewing results.

If the gauntlet should include newer ones, which eight or nine newer ones would you use as the 'gold standard?'

It seems from the approach of the author, that knowing the ELO of the gauntlet is essential. If you suggest a modern list of top engines, how do we establish their ELO? I've seen ELOs for today's engines that vary wildly... over 200 points apart for the same engine, depending on the list.

Are there any other adjustments you'd make from the author's approach?

I'd follow it verbatim, but I am concerned that Rybka 4 et al will just annihilate all in the gauntlet, making the results meaningless.

Thank you for any guidance you can provide. It's something I want to use for the benefit of you and the rest of the chess community for my testing. I have the machines to dedicate and want to do my part.

Thanks again.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: Creating a Gauntlet for testing

Post by hyatt » Wed Jul 07, 2010 8:06 pm

halestorm wrote:I want to create a gauntlet of top engines. Inspired by this site, I want to create a murderer's row against which I can pit a new engine to see how it stacks up. After testing, for a couple of days at 1+1, of course.

Questions:

Does it make sense to use the 'older' engines on the website? Yes, we know the ELOs, tehy are tested, true. No, since most newer engines have improved, they will beat these older engines, skewing results.

If the gauntlet should include newer ones, which eight or nine newer ones would you use as the 'gold standard?'

It seems from the approach of the author, that knowing the ELO of the gauntlet is essential. If you suggest a modern list of top engines, how do we establish their ELO? I've seen ELOs for today's engines that vary wildly... over 200 points apart for the same engine, depending on the list.

Are there any other adjustments you'd make from the author's approach?

I'd follow it verbatim, but I am concerned that Rybka 4 et al will just annihilate all in the gauntlet, making the results meaningless.

Thank you for any guidance you can provide. It's something I want to use for the benefit of you and the rest of the chess community for my testing. I have the machines to dedicate and want to do my part.

Thanks again.
I think the concept of "Elo" should be ignored here. If you pick 8 strong opponents, BayesElo will tell you how the new program stacks up against them in a relative way... The exact "Elo" is impossible to calculate anyway unless you play _all_ programs together which is too compute-intensive.

Post Reply