The Evidence against Rybka
Posted: Mon Jul 04, 2011 2:17 pm
I will be expanding this thread, to state specifically what evidence against Rybka was both considered, and then which parts of it merited inclusion in the Report (which I had no direct part in writing), but for now I'd like to highlight this comment:
Regarding 54% for Crafty/Fruit overlap (in D.2.3 of RYBKA_FRUIT.pdf): this was in a preliminary (and cruder) version of the EVAL_COMP analysis, with the final Crafty/Fruit number being 34% when the whole process of comparison had been decided upon. Furthermore, this 54% is unidirectional (that is, it measures the Fruit/Crafty overlap, but not the Crafty/Fruit overlap, and as mentioned therein, Crafty contains about 10-15 features that Fruit lacks). The Rybka/Fruit number in that comparison was 78.3%, not 73%. Indeed, the whole EVAL_COMP document was produced precisely to quantify a relationship between raw percentages and expected "random" occurrences. As there didn't seem to be any great need in distinguishing between a 1 in 10^6 and a 1 in 10^8 (or more) occurrence, rather crude statistical measures were used (and at any rate, whatever method was chosen would be subject to question, while Rajlich was free to re-analyse the data if he thought he got a raw deal, etc.).
In any event, it's nice to see some discussion of the evidence, and not the Process.
EDIT: On a different note,
The EVAL_COMP work aimed not at the most rigorous statistical methods, but rather in something readily understandable to the Panel members (else the final number of opinions might have been even lower than it was). One thought was that additional layers of statistical abstraction wouldn't change the conclusion in the end, but would lead to a general glazing-over of eyes [and those who are more stats-conversant could use the data given in the PDF to run whatever tests they prefer]. Most computer chess programmers will have heard of Gaussian distributions, but perhaps not of more abstruse statistical measures.This "paper" uses an embarrassingly amateurish methodology for looking for correlation between feature sets.
Regarding 54% for Crafty/Fruit overlap (in D.2.3 of RYBKA_FRUIT.pdf): this was in a preliminary (and cruder) version of the EVAL_COMP analysis, with the final Crafty/Fruit number being 34% when the whole process of comparison had been decided upon. Furthermore, this 54% is unidirectional (that is, it measures the Fruit/Crafty overlap, but not the Crafty/Fruit overlap, and as mentioned therein, Crafty contains about 10-15 features that Fruit lacks). The Rybka/Fruit number in that comparison was 78.3%, not 73%. Indeed, the whole EVAL_COMP document was produced precisely to quantify a relationship between raw percentages and expected "random" occurrences. As there didn't seem to be any great need in distinguishing between a 1 in 10^6 and a 1 in 10^8 (or more) occurrence, rather crude statistical measures were used (and at any rate, whatever method was chosen would be subject to question, while Rajlich was free to re-analyse the data if he thought he got a raw deal, etc.).
In any event, it's nice to see some discussion of the evidence, and not the Process.
EDIT: On a different note,
The inner quotation is from Zach, not from me. As for the question of Rybka/Fruit evaluation feature functionality (and its relation to copyright): I'll probably address that stuff a little bit later.I was going to add some proof that Mark was looking at functionality rather than code copying or transliteration, but it was so trivial that I'll just reference one item, where he calls the similarity between Rybka's evaluation function and Fruit's evaluation function his most damning piece of evidence:Rybka's evaluation has been the subject of much speculation ever since its appearance. Various theories have been put forth about the inner workings of the evaluation, but with the publication of Strelka, it was shown just how wrong
everyone was. It is perhaps ironic that Rybka's evaluation is its most similar part to Fruit; it contains, in my opinion,
the most damning evidence of all.