I might note that (from the internal evidence) the paper appears to date from 2008.
mballicora wrote:(though they do not describe the improvement part)
In fact, in practise I might guess that the "improvement part" (step 2 of algorithm 3) could be more relevant than the rest. Maybe the speed of feedback allows one to tune multiple parameters more easily than with game testing, so issues of hill-climbing methodology are not so prevalent?
I made comments about it later, but almost always very few or nobody was interested to talk with very few exceptions. This "tuning" topic always involved some sort of secrecy, or nobody considered it seriously. Of course, now the people will look at it differently.
I too have typically found the silence about "tuning" in general rather odd, with the main chatter being various personae (notably VD, and VR to some extent) claiming that they [or the NSA] have super-secret methods with great superiority to whatever everyone is using.
Certainly the mainstream has gone toward game-testing rather than position-testing in the last years, though this could be herd-mentality more than anything else. [Incidentally, I would tend to guess (following DD's comment) that temporal-distance learning or something would likely be somewhat superior to win-draw-loss accounting].
It seems that the method tends to give a dynamice style?
I had mused that this might be due to the drawishness measurement in Junior, but now you report the same result with no such parameter. MvK says:
I'm speculating that this style is because the tuning now favors positions where humans tend to lose their games -- but using comp-comp games should reduce that? It is somewhat of a mystery...
Here is another quotable from Ban (emphasis in original):
Amir Ban wrote:[...] Chess programmers still code expert chess knowledge into their evaluation functions, to the best of their understanding and capability, subject to its observed success in improving the results achieved by their programs (itself a time-consuming and statistically error-prone determination).
The field lacks a theory of evaluation which is able to suggest answers to any of the following questions: What is the meaning of the numerical value of the evaluation? What makes an evaluation right? In what sense can it be wrong? Between two evaluation functions, how to judge which is better?
It is the purpose of this article to suggest a theory of evaluation within which these questions and related ones can be answered. Furthermore I will show that once such a theory is formulated, it may immediately and economically be tested vis-a-vis the entire wealth of recorded chess game repertoire. As a result I will derive a novel automatic learning procedure that works by fitting its evaluation to recorded game results. While new to computer games, this method bears similarity to techniques of maximum-likelihood optimization and logistic regression widely used in medical and social sciences.
I tend to be skeptical and think some of this to be hyperbole, but would enjoy being proven incorrect.