By: Evan Green
I was excited to see the announcement of Expected Completion Percentage (xComp)but had some concerns about how it was defined. Instead of calculating the percentage of a completion at the time of the throw, it calculates it at the time of the ball getting to the receiver. This difference limits xComp’s ability to tell how accurate a quarterback is to a large extent. Completion Percentage Above Expectation (xComp +/-)is even more diminished by the definition and is more a measure of a receiver’s ability to make than a quarterback’s ability.
A number of variables shouldn’t be included in the model or should be included in different ways. For instance, separation from the nearest defender at the time of the catch doesn’t account for the fact that good, accurate quarterbacks lead receivers away from defenders, thus increasing separation. This skill makes it seem like more accurate quarterbacks are attempting easier passes than their less accurate peers.
Distance to the sideline suffers the same problem. A good throw will give receivers the ability to make a catch the ball inbounds while a bad one will lead the receiver into a more difficult position. A quarterback doesn’t get credit for giving his receivers more catchable balls. Instead, xComp classifies the quarterback as attempting easier throws because he places the ball away from the sideline, instead of xComp +/- giving him credit for more accurately throwing a tough pass.
The new formulation would enable some other interesting applications. At any moment during the play, for every receiver, you could calculate a receiver’s xComp, and expected yards gained. This would allow you to tell if a quarterback is making the right decisions about whom to target or if he is missing open receivers.
Extensions of the current model could include modeling interceptions as well. Predicting interceptions could help determine if a quarterback has been unlucky with the number of interceptions thrown so far and might have his interception percentage decline in subsequent games. For this application, modeling at the time of the ball getting to the receiver is actually more informative than at the time of the throw.
Additionally, with the relatively small number of quarterbacks in a given season, individual factors such as the speed of the quarterback at the throw, the time taken to throw, and others may leak information about specific quarterbacks from training to validation data. It seems that the data was split randomly, instead of across players, which increases the chance for overfitting to player specific trends and decreases the ability of the model to extrapolate.
I appreciate the NFL releasing the inputs in the models as well as some out of sample validation. Despite these limitations, I think the current formulation is interesting and I’m looking forward to more research powered by Next Gen Stats.