Tennis And Data: What Is Actually Available For The Public? From Raw Numbers To Hawk-Eye Metrics - UBITENNIS

Tennis And Data: What Is Actually Available For The Public? From Raw Numbers To Hawk-Eye Metrics

Here is the second episode of our ongoing series on the advent of advanced analytics in the game. Let’s draw a few lines – what are the types of data, and who are they available to? Only those who are willing to spend a lot of money (like Federer) will get the entire benefit.

By Staff
10 Min Read

The first official statistics related to individual matches have been recorded since 1991. To give you an idea, if you visit the ATP site and try to retrieve the head-to-head tally between Edberg and Becker, you will find the details of their matches only from that year onwards. Therefore, the first thing we can say without any doubt is that players, coaches and journalists could only do one thing before 1991: try to guess what had happened.

Therefore, systematic data collection on tennis began in the early 1990s… and unfortunately has never changed since then, except for some charts generously provided during the Grand Slam tournaments and the Masters 1000 events. In the latest episode of our series of articles, we will focus on the ownership of that information and who is involved in the data’s collection. Today, we focus on how to describe these evanescent data nuggets.

Tennis is an optimal sport for data analysis. It features elementary units (any single point) and there is a hierarchical framework with a binary outcome (games and sets). Although a large amount of data is available, only a small part is shared with the public: aggregates of elementary units (points, games and sets won), relevant points won (break points saved and converted) and aggregate performance about the serve, the only shot tracked. Going back to the ATP website, only serve stats and not much else is to be found.

So, what are the data on which tennis players, coaches, journalists and fans would like to see? Let’s try to summarise the different types:

FIRST CATEGORY: RAW DATA

The summary data about points are the overall data, which tell us how many points have been won by a player, for example, or how many break points have been played. The perspective of this analysis are the points, the elementary unit in the hierarchy of the tennis point system. Therefore, the point is generally the basic unit of available statistics, to which only one information related to the shots played is associated, e.g. the serve. In practice, the point, despite being the elementary unit of the score, is a black box that can have a varied composition but the only known attribute is “point played on the first serve” or “point played on the second of serve”. End of the story. All hope will be lost while trying to give statistical representations of a match, at least with reference to the official data freely available and published by the ATP.

That the serve is the most important shot in tennis (perhaps along with the return) is well known – on average, 60-70 percent of all points fall into the under-five-shots rally category. Relying only on this information is quite limiting when you are trying to make an analytical framework based on solid experimental foundations of what happens in a match or try to draw general trends. The most interesting results we can get with these data are the correlation between the performance of won and lost matches with reference to serve and important points played. It is not surprisingly that in the Stats section of the ATP site – the ATP Leaderboards – only data relating to the serving, returning and under-pressure performances are to be found. We have already talked about the robustness of these indicators here at UbiTennis, and Stephanie Kovalchick, one of the most influential academics in the field of data analytics who also collaborates with Tennis Australia, wrote about this topic too. It is possible to carry out some historical analysis starting from these data; for example, the table below displays a statistic that compares the percentage difference – positive or negative – between the percentage of points won on serve and the percentage of break points saved by Federer in hard-fought matches:

Leave a comment