I'll start with an example. Let's say Djoković and Nadal face each other on fast grass court and once again on slow clay court. Odds are 1.90-1.90 for both matches. Who are you gonna pick? Obviously Djoković on grass as his playstyle is much more suitable for fast grass courts. It's also obvious you'll pick Nadal on clay as his playstyle is much better on slow clay courts.
I got this idea from Mr. Jeff Sackman (https://www.jeffsackmann.com/), who has done great things in the field of tennis statistics. Previously he has also calculated CPIs however I've made some differences in my approach.
How do you calculate CPI - oversimplified answer:
First you need to get data from all matches played in a certain period of time, I reccomend last 52 weeks (= last edition of each tournament, as surface speed changes from one edition to another). For my model I use aces, I'll explain why soon. Once you have all data, meaning aces from both players in every match in every tournament you can calculate how many aces are served in an average match (ATP average). You can also calculate how aces average for each player and aces average for each tournament. Now you can calculate indexes by dividing aces average for a tournament with ATP average. So for example if ATP average is 10 aces and New York average is 10, index of New York is 1.00. If for example ATP average is 10 aces and Wimbledon average is 12.5 aces, index of Wimbledonis 1.25 (25% faster than the average).
Why only aces? Well, I tried this with serve points won, but the difference is minimal - fastest court is below 1.10 and slowest is above 0.90. Aces give you a better representation of which tournaments are in fact "fast" and which "slow" as fastest courts go above 1.30 and slowest below 0.70. Mr. Sackman follows similar approach as he also uses aces only.
Another thing to consider is aces against, which is how many aces a certain player "allows". For example, it's much more impressive to serve 20 aces against Djoković than against Isner. By using aces against CPIs are more accurate as they consider who did you serve against. You also need to consider who is playing at each tournament. If you don't CPIs will be inflated or deflated because players choose tournaments that suite their playstyle if they have a choice. This means if you have one slow and one fast tournament, guys like Isner, Fritz, Opelka, ... will probably choose the fast tournament. As surface is already fast and all of them serve lots of aces CPI of this tournament will be inflated.
Reliability of this model grows with number of matches so you need to have as many matches as you can. But the same goes for each player. If a player plays only one match on clay in the whole season, this one match will be his season average on clay. You can already see where the problem is, can't you? I've decided to use only stats of those players who have played at least 3 matches on hardcourts, 3 matches on clay and 3 matches on grass OR indoors (as there are few matches on grass and indoors). So a minimum of 9 matches. Not ideal, but you also want as many players as you can, because you need as many matches in each tournament to be valid. I've decided a tournament is valid if there is at least 20 matches that have been played by valid players. This means you get 40 data entries. It keeps most tournaments in the model, but also excludes some of them, especially at challenger level.
How much should this influence your betting? I don't reccomend that you decide to pick players only based on this tool, however in some cases it's very useful. It gives you an information about which players should do well in certain tournaments.
I'll post CPIs for each week on my blog. For this week, I've already posted them.
If you have any questions about CPIs let me know in the comments.
Any progress with your model?
Interesting approach! Using aces as a metric for court speed indexing (CPI) seems to provide a practical way to assess court conditions and tailor predictions for player performance accordingly. The example with Djoković and Nadal on different court surfaces makes it clear how playstyle suitability can significantly impact match outcomes. Your consideration of aces against, factoring in the quality of opponents, adds depth to the CPI calculation. It's logical that serving 20 aces against a tougher opponent should be more impressive and indicative of a player's ability. Also, acknowledging player preferences for certain tournaments based on their playstyle is a thoughtful addition, preventing potential inflation or deflation of CPIs. The decision to include only players with a minimum of 9 matches on different surfaces helps ensure a more robust dataset, even though it may limit the number of players and tournaments included. The threshold of 20 matches per tournament for validity strikes a balance between inclusivity and data reliability. While CPI may not be the sole determinant for betting decisions, it provides valuable insights into which players may perform well in specific tournaments. Integrating this tool with other factors in your analysis could enhance the overall accuracy of predictions. Thanks for sharing your methodology!