I think that the depth of candidate moves should lead the discussion of using engines to calculate brilliancy.
Example: you run SF with pv=3 and you get 3 candidate moves. Let's use alphabet letters rather than actual chess moves.
So you get at depth 1:
but at depth 2:
you get to depth 30:
clearly that means that move A was considered good for a while (and still is). B was considered good and discarded. D was briefly considered. C switched from a losing move to the best move.
I believe that, in a position that was created by a previous move that you are analysing, having top candidate moves becoming effective only at a deeper level makes for the previous move to be considered interesting.
Based on your definition of "only move", to which I do not particularly ascribe, this move would have to be considered losing at lower depth in order to be interesting.
However, if multiple top engine moves have this characteristic, the position might be objectively losing no matter what you do because a fundamental position flaw, or you might have a "super brilliant" move on your hands, where anything the opponent might try has a deep refutation - like those hard chess puzzles.
Yet I've briefly tested this approach and things are not that clear cut. For example a move might appear winning at low depth, then refuted by a deeper response, then winning again by an even deeper counter-response. In fact, once you think about it, this kind of move might feel even more "brilliant" as it plays with one's emotions.
I was thinking at one point to try to catalogue this as "shapes", where an underscore means low eval, a dash means medium eval and a superscore (just invented this for a top horizontal line) a high eval, and a fixed length of 3 to 5 characters. which would give you a maximum of 243 possible options, but is it representative? Especially for the last case...
So I don't have a clear solution, but I do feel that the evolution of the evaluation of candidate moves with the depth of the engine should play a role. I don't know which is it, because I don't want to use active engine eval in LT, but I feel it might be important.
Therefore, if you ever plan a data driven statistical model approach, take this metric into consideration, perhaps.
I think that the depth of candidate moves should lead the discussion of using engines to calculate brilliancy.
Example: you run SF with pv=3 and you get 3 candidate moves. Let's use alphabet letters rather than actual chess moves.
So you get at depth 1:
- A +2
- B +1.5
- C -2
but at depth 2:
- A +1
- D +0.8
- C 0
you get to depth 30:
- C +2
- A +1
- Z -1
clearly that means that move A was considered good for a while (and still is). B was considered good and discarded. D was briefly considered. C switched from a losing move to the best move.
I believe that, in a position that was created by a previous move that you are analysing, having top candidate moves becoming effective only at a deeper level makes for the previous move to be considered interesting.
Based on your definition of "only move", to which I do not particularly ascribe, this move would have to be considered losing at lower depth in order to be interesting.
However, if multiple top engine moves have this characteristic, the position might be objectively losing no matter what you do because a fundamental position flaw, or you might have a "super brilliant" move on your hands, where anything the opponent might try has a deep refutation - like those hard chess puzzles.
Yet I've briefly tested this approach and things are not that clear cut. For example a move might appear winning at low depth, then refuted by a deeper response, then winning again by an even deeper counter-response. In fact, once you think about it, this kind of move might feel even more "brilliant" as it plays with one's emotions.
I was thinking at one point to try to catalogue this as "shapes", where an underscore means low eval, a dash means medium eval and a superscore (just invented this for a top horizontal line) a high eval, and a fixed length of 3 to 5 characters. which would give you a maximum of 243 possible options, but is it representative? Especially for the last case...
So I don't have a clear solution, but I do feel that the evolution of the evaluation of candidate moves with the depth of the engine should play a role. I don't know which is it, because I don't want to use active engine eval in LT, but I feel it might be important.
Therefore, if you ever plan a data driven statistical model approach, take this metric into consideration, perhaps.