Improved Chess Rating Comparisons Using Nonparametric Statistics

Lots of chess gamers surprise how unique ranking techniques map to each other. There is a prevalent idea that it’s not possible to map the scores since they are unique swimming pools of players and distinctive time controls. You can also argue that actively playing OTB is much different than actively playing on the net. These are valid factors but we can continue to choose a seem at how the ratings evaluate to give chess gamers a guidebook as to the place they stand in unique rating pools. This publish will clarify how we develop our Score Comparisons.

Obtain Details Sources

To start with, we obtain all of the scores for players in our database. This incorporates USCF, FIDE,, and Lichess rankings. Jesse is capable to pull API facts from lichess and, and I down load the latest score nutritional supplements for USCF and FIDE.

We will use the blitz vs USCF scores as an example for the remaining ways.

Subset The Knowledge

Gamers In Both Score Swimming pools

Subsequent, we come across all gamers from our sources that have both equally a username and a USCF score ID. This is our widest web on all attainable gamers eligible for the comparison.

Venn Diagram Of Comparison Established

Latest Non-Provisional Players

We really don’t want to involve gamers that have only performed a handful of video games. We also would like to exclude any person who last played a very long time in the past. To manage this for on-line rankings we subset to only players who have RD < 150. For more information on RD values and how ratings work, see the Chess Ratings post.

No Outliers

There are going to be some errors and abnormalities in the data that we need to check for. After hand-verifying some of the egregious values, we are left with a pretty clean set of players to analyze.

Rank The Data

Now that we have a pretty clean set of data based on players in both rating pools, we rank them each individually to help remove the noise. Here’s an example of the input data and the ranked data. Notice how the 1100 and 1150 USCF values swap places so both are ranked low to high. Blitz USCF
1000 900
1100 1150
1250 1100
Input Data Blitz USCF
1000 900
1100 1100
1250 1150
Ranked Data

This gives us a very smooth line to map the two rating systems. We also create a 2nd-order polynomial regression formula from this line which will be used for the +/- values later.

Example Of Both Rating Systems Ranked

In the comparison table, we create values every 50-100 points for blitz and lookup what the corresponding USCF rating is in the ranked data.

Standard Deviation

The final step is to figure out how certain we are in these predictions. We take the 2nd order polynomial regression equation from the ranked data to predict what each player’s USCF rating will be based on their blitz ratings.

Next, we take the difference between the predicted USCF rating and the actual USCF rating. Here’s an example of how the distribution can look between predicted and actual. This histogram happens to be for blitz and bullet, but we can see most values are centered at the predicted value, and an equal but decreasing number of players fall into the bins as we move left and right.

By taking the standard deviation of the predicted minus actual values, we are able to get a sense of how each comparison distribution looks. In the tables, I add a +/- value that corresponds to one standard deviation. Here’s an example:

If your blitz rating is 1550 and you’re wondering what an equivalent USCF rating would be, the best guess is 1540. Out of the players in our database that have both ratings, we’d expect 68% of players to be between 1540-260 (1280) and 1540+260 (1800). That’s a very big range so that should be kept in mind when looking at these comparisons. Most of the rating comparisons have a standard deviation between 150 and 200.

Leave a Reply

Your email address will not be published.