In the 2017 MIT Sloan Sports Analytics Hackathon presented by ESPN, I was tasked with “Measuring the Immeasurable” in the NBA using SportVU data for five random games, an admittedly very small sample. I decided to explore players’ vision. I thought players who had good vision would be unpredictable, random, and creative in their passing. To explore these phenomenon, I created two metrics: entropy and clustering.
My formulation of entropy measures the randomness of whom a player passes to. Entropy is a widely-used metric in information theory that measures the information content of something. Things with more entropy are more random and give more information when evaluated. Similarly, in my formulation passers who pass to a wider range of individuals have higher entropy because it is harder to predict who they will pass to. This may make them harder to guard on defense because one can’t anticipate where the ball will go to next.
Armed with this statistic, it is natural to see how it rates different players. This is an important way of validating that we are measuring something that matches up with our intuition of what a good passer is. We can see that James Harden has a much higher entropy and passes to a much wider range of people than Marvin Williams who appears to be keyed in on Kemba Walker. Harden is known as an excellent passer, so it is good that entropy confirms that observation.
Clustering measured the extent to which where players passed from was spatially clustered. It was the ratio between the distance from each pass to the closest pass and the expected distance if pass locations were chosen uniformly at random from the frontcourt. Players whose passes were from less clustered locations may be harder to guard because you don’t know when they are going to pass. Additionally, their vision may be higher since it means they are looking to pass from all over the court.
Again, we want to rate players on this metric. We can see that Patrick Beverly passes from a seemingly random, spread out assortment of points while Spencer Hawes almost always passes from around the three-point line, above the break. Thus, Patrick Beverley has a much lower clustering score than Spencer Hawes. While these trends are visible in the passing charts, clustering is useful single statistic that explains these trends on a high level.
To ensure that these statistics were measuring passing ability and not just arbitrary measures I had come up with, I tested the ability of these statistics to explain adjusted assists (assists plus secondary assists plus FT assists). Because the two statistics are largely collinear with each other and I have very few samples, I chose to add use entropy plus the inverse of clustering to predict adjusted assists per pass. A linear model with this predictor had a statistically significant coefficient and an r^2 of 0.186. This validates the intuition that these statistics are related to and give us useful new metrics of to measure passing ability.
Thank you to the ESPN Stats and Info Group for hosting the hackathon and a dinner for the finalists and to the MIT Sloan Sports Analytics Conference for a great weekend and bringing the sports analytics field together.