Rating intransitivity in round robins
Posted: Mon Jun 02, 2014 7:56 pm
Suppose that you have 30 engines that play a round robin of 100 games against all other opponents (2900 total games each).
There are two main engines, A and B.
A beats everyone but B by 60-40, and loses to B by 30-70, for 1710 points
B beats everyone but A by 58-42, and beats A by 70-30, for 1694 points.
Who should be rated higher by a rating list, A or B? Explain your answer, discussing contempt if you wish.
Similarly for the following situations (and any others you deem relevant):
A beats everyone but B by 75-25, and loses to B by 45-55, for 2145 points
B beats everyone but A by 74-26, and beats A by 55-45, for 2127 points
A beats everyone but B by 74-26, and beats B by 65-35 for 2137 points.
B beats everyone but A by 75-25, and loses to A by 65-35 for 2125 points.
Note that you are essentially determining the A-B relation from playing 2800 games each against opponent(s) C, and 100 games head-to-head. Whether or not this is a valuable weighting is perhaps a philosophical question (and could depend on how much better A and B are than the C's).
For extra credit, in all cases compute the relevant error margins.
There are two main engines, A and B.
A beats everyone but B by 60-40, and loses to B by 30-70, for 1710 points
B beats everyone but A by 58-42, and beats A by 70-30, for 1694 points.
Who should be rated higher by a rating list, A or B? Explain your answer, discussing contempt if you wish.
Similarly for the following situations (and any others you deem relevant):
A beats everyone but B by 75-25, and loses to B by 45-55, for 2145 points
B beats everyone but A by 74-26, and beats A by 55-45, for 2127 points
A beats everyone but B by 74-26, and beats B by 65-35 for 2137 points.
B beats everyone but A by 75-25, and loses to A by 65-35 for 2125 points.
Note that you are essentially determining the A-B relation from playing 2800 games each against opponent(s) C, and 100 games head-to-head. Whether or not this is a valuable weighting is perhaps a philosophical question (and could depend on how much better A and B are than the C's).
For extra credit, in all cases compute the relevant error margins.