Re: Questions for BB about Rybka PST = Fruit PST
Posted: Tue Aug 16, 2011 2:14 am
Here are some a few other "certainties" from CW in that thread:BB+ wrote:Note the lack of any doubt in his conclusion (though it was later rebutted by Adam Hair, who evidently did care), as could have been engendered by a word like "possibly".Chris Whittington wrote:BB wasn't measuring "overlap" he was measuring chess programming skill. If anybody cares to put the program ELO's against BB's "scores" he will find high correlation.
Once the "obvious ELO correlation" is demoted to a lesser certainty standard, I think little is left but the rudeness.Chris Whittington wrote:our side would appeal to the arbiter/court that given the obvious ELO correlation, that the BB results are both questionable and unreliable on the basis that we can't be sure what is being measured and the document "eval function - BB" should be omitted from consideration, or junked to put it more rudely.
Chris Whittington wrote:When BB presented that paper, somebody should have said "er, but your results map to ELO, do they not? go away, do your homework again and come back with something better that isn't going to make us look stupid". [...] Time to get grovelling, apologise to Vas and get ready with the donation to a charity of his choice.
That there could be a "what are we measuring" problem with ELO was also "obvious" to the Panel (even with its "unreasonable" members like Ken Thompson), but they managed to form a consensus that EVAL_COMP was indeed measuring what it said. Again there seems to be little comprehension of (and gross speculation regarding) the machinations of the Panel.Chris Whittington wrote:Meanwhile, the scientific way, and I assume you are seeking after truth, with the BB eval function paper would be to go away and redo it using programs of comparable ELO from 2005-6, ie top programs, then you could eliminate the obvious "what exactly are we measuring" ELO problem from the process. Until then the paper is not acceptable to any reasonable person due to ambiguity in what is being measured.