- Blind mode tutorial
lichess.org
Donate
Speedcurve Performance Analytics

Luke Chesser

What are ratings?

Chess Personalities
Should ratings measure performance and/or skill?

A wise TD once educated me: rating lists measure, and should only measure, player achievement. We don't have tests for player opening skill, endgame skill, time pressure skill, etc.; all we have are game outcomes and a player's performance is the sum of their achievements.

Dr. Regan was interviewed on Perpetual Chess Podcast:
Chess Cheat Detection Expert, IM Kenneth Regan Shares his Findings on the Carlsen/Niemann Scandal
where he informally explains his policies (including discard of opening moves or previous identical games in his personal database, including a committee secretly deciding upon player ratings, and including Chess.com's and FIDE's policy preferences). In the same interview he conjectures about ideas which cannot be cross-validated due to lack of data (for example, adding "move time" to his model under a popular assumption that all players play poorly in time pressure).

Dr. Regan is correct that we need more scientists as, "It's easy for anyone to spend 10-15 minutes generating pseudoscience which currently takes 10 hours of rigorous analysis to refute." I suggest that the 10 hours aspect could be improved as FOSS tooling and access to & quality of free databases improves, assuming we the public can gradually improve such tooling without helping cheaters too much.

Dr. Regan shares Pandemic Lag | Gödel's Lost Letter and P=NP which observes that players' skill as determined by his IPR model improve even while FIDE ratings were frozen (emphasis mine):

... I have shown that the ratings administered by the International Chess Federation (FIDE) have stayed stable in absolute regard to the objective quality of moves played as measured by my own predictive model, via my Intrinsic Performance Ratings (IPRs) geared to the FIDE rating scale. Having stable numbers is vital not only to my cheating tests but to the public understanding of the system on the whole. This goes for FIDE, for Internet gaming federations, and even for the use of Elo by Tinder.

I find that this case study (and subsequent player FIDE rating increases, Niemann included) lends some credibility to Regan's IPR system.

Public rating lists indicate player achievement, not player skill; only a tool would - against wisdom of their own expert written above - attempt to use FIDE rating lists to support a suggestion that a player's previous OTB games warrant further review, let alone in 6 tournaments containing "bupkis; negative z-score" public evidence. For the sake of not bringing chess into disrepute, such a tool could at least pay a statistician instead of asking "10 hours" or whatever work be done by FIDE... sure, if you're attempting business dealings with a high-profile tweeter, incentives could motivate you to forget how you love chess and the hard work FIDE arbiters and USCF TDs put into maintaining the integrity of our sport, but please try to remember that this game isn't all about you and that your credibility has a price.

Separately... most chess moves by competitive tournament players match the engine top move, not just for opening moves and endgame moves. Most legal moves are terrible! Who knew? Perhaps it's time to try some illegal moves...

These opinions are mine and that of any sane person, not credited to Lichess.org.


Image credit: Luke Chesser