Chris Whittington wrote:we have unclear sacrifices left which are the type I concentrate on.
For sacrifices, I'd consider them "sound unless proven unsound", if Qxh7 is obviously unsound, then this classification can be applied.
Then we can reward the engine for any sacrifice that it does that doesn't evidently lose the game (it may lose after further investigation, but we don't do this investigation). This way style can be attempted to be measured:
First, you need both opponents to be about the same strength, as I said previously, an engine with the highest amount of style that loses most games is undesirable. I think one opponent being 50 elo than the other is about right, the idea is to get rid of the strength from the equation so that the true style of the engine can appear (for instance, it would be disruptive if an engine of great style can't show it due to a very strong opponent attacking and it having to defend instead of showing off). But one doesn't need to be obsessive about getting the strength right.
After both engines are calibrated (I suggest time handicap for this), then you play a few games between them, from those games, you look at sacrifices from both sides, and give them different points, as a score, like, for each time a knight captures a pawn and is recaptured the next turn, add 2 points, for each time a knight is left hanging and a move that doesn't save it is made, add 1 point, for each time a rook captures a knight, add 1.5, for when the opponent hangs the queen and the engine does not capture it, we add 3 points, etc. You would just need to be aware of long series of captures where no sacrifice is happening and it'll be fine.
At the end, both engines will end with some score (we ignore game results, those should only matter to calibrate the engines to the same strength and nothing else), we divide this number by the total number of games, and then we have the engine's style measured. For a more accurate measure I'd suggest repeating this process against a different engine.
The subjective part comes about, how many points do we give to each situation? (like, what should be more rewarded, a Rook for pawn sacrifice very far away from the action or a knight sacrifice that opens the king's shield for a killer attack?) And which scenarios should we look for? (we could also add negative values for things like useless piece shuffling, so that too much of it will eventually cancel out sacrifices).
This is just a thought experiment, so one can tweak those rewards depending on the output (like, and engine with a clearly better style that scores worse would easily signal what rewards need tweaking or need to be added).
The good thing is that this already can be applied to engines that are close in strength, since those already have thousands of games played that could be checked out for style measurement.