SECOND CATEGORY: SYNTHETICAL DATA
The second category of data is that relating to summary data about shots played. With regards to the serve, we have the binary information whether a first or second service has been played. With regards to other shots, on the other hand, an entire world is to be discovered. Fortunately, Grand Slam and ATP Tour Masters 1000 tournaments are collecting synthetic data relating to the shots played in a structured pattern, but it’s difficult to have access to this information. During these tournaments, several types of information are available, such as rally length, winners, forced errors, speed and rotation of the ball, height with which the ball passes over the net, shots played at the net, as well as, obviously, forehands and backhands played. In these cases, the black box, namely the characteristics of the individual points, becomes a little more understandable. However, the data are aggregated, so the tactics and patterns of a match can only be inferred deductively – nonetheless, it is quite a bounty of insights if we compare this dataset with the figures generally available.
As an example, the table below reports some figures based on synthetic Hawkeye data on Shapovalov, collected between 2017 and 2019:
This small table highlights the delta between matches won and lost by Shapovalov. While many fans are dazzled by the Canadian’s excellent backhand, it’s time to remember that a great way to understand if a match will be won or lost by Denis is his performance on the forehand. In the matches won, the forehand winners are 2.7 times greater than those of his opponent are, while when he lost a match only 1.8. Similarly, in the matches won, unforced errors of the forehand are 1.6 times greater than those of his opponent are, while they explode to 2.8 in those he lost. If we have an adequately wide sample of data, we will try to see how Denis fared in 2020, in order to ascertain the improvements of the Canadian… but that’s a subject for a future article.

