NFL Kicker Evaluation



Beyond Make Percentage: Evaulating NFL Kickers

Introduction

Many teams have experienced either glee or distress at the hands of their kicker in the 2018-2019 NFL season. As a Bears fan, I know I’m going to be seeing that Cody Parkey miss over and over again in my head for years to come. On the flip side, Greg Zuerlein sending his team to the Superbowl was a moment of the year for Rams fans, especially as good memories were hard to come by for them two weeks later. While these late game hardships and heroics often come to define the public perception of kickers, it’s worth questioning what the best way to evaluate kickers is outside the bubble of emotion that follows a big game. In this project, I sought to find a universal metric that could define the reliability and relative value of NFL kickers, in the hopes of contribuiting information to the conversation beyond the recounting of last second kicks we’ll never forget.

General overview of methodology

In the style most modern sports value metrics, I’m seeking to define the contribution of a kicker to their team relative to an average NFL kicker. The obvious first idea was to look at a model fit over the kicks attempted throughout the season and estimate how many the average kicker would have made. This fails to account for confidence, though. For example, a team with great faith in their kicker might attempt a 60 yard field goal. If he misses, he would get penalized for this in such a metric while a kicker whose team would never have sent him out is not affected. My metric seeks to move beyond this by taking “team confidence” into effect.

To start out, I decided to look at which variables were useful in distinguishing the difficulty of a given field goals. Using play by play data from 2009-2017 obtained with the nflscrapR package, I fit a regression model over all field goals attempted to look at which variables were statistically significant.

mod <- glm(kick_good ~ TimeSecs + FieldGoalDistance + ScoreDiff + home_kicker + season , data = fgs,family = "binomial")
summary(mod)
## 
## Call:
## glm(formula = kick_good ~ TimeSecs + FieldGoalDistance + ScoreDiff + 
##     home_kicker + season, family = "binomial", data = fgs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7619   0.2448   0.3958   0.6350   1.5256  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        5.422e+00  1.878e-01  28.870  < 2e-16 ***
## TimeSecs           2.862e-05  3.067e-05   0.933 0.350679    
## FieldGoalDistance -1.024e-01  3.557e-03 -28.777  < 2e-16 ***
## ScoreDiff         -2.903e-03  3.373e-03  -0.861 0.389498    
## home_kickerTRUE    6.539e-02  6.169e-02   1.060 0.289147    
## season2010         1.101e-01  1.302e-01   0.846 0.397830    
## season2011         2.161e-01  1.304e-01   1.657 0.097527 .  
## season2012         3.352e-01  1.283e-01   2.612 0.009003 ** 
## season2013         5.595e-01  1.355e-01   4.130 3.63e-05 ***
## season2014         3.628e-01  1.319e-01   2.750 0.005966 ** 
## season2015         4.495e-01  1.351e-01   3.327 0.000879 ***
## season2016         4.537e-01  1.321e-01   3.433 0.000597 ***
## season2017         3.772e-01  1.286e-01   2.932 0.003364 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7903.9  on 8923  degrees of freedom
## Residual deviance: 6834.3  on 8911  degrees of freedom
##   (4 observations deleted due to missingness)
## AIC: 6860.3
## 
## Number of Fisher Scoring iterations: 5

While we might talk about the advantage of a kicker being clutch late in the game, of having the home crowd on his side, or of folding under the pressure of a tight game, the results of our linear model show that the only statistically significant variables are season and Field Goal Distance. My thought in seperating by season is that kickers get better collectively over time across the NFLso it’s worth accounting for this to even the playing field of evaluation. Our model for expected make porbability of a given field goal will be based solely upon these variables.

mod1 <- glm(kick_good ~ FieldGoalDistance + season, data = fgs,family = "binomial")
fgs$predicted_make <- predict(mod1,newdata = fgs,type = "response")



ggplot(fgs[fgs$FieldGoalDistance <= 65 & !is.na(fgs$FieldGoalDistance),], 
       aes(x = FieldGoalDistance, y = predicted_make, col = season)) + 
  geom_point(size = 1.5) + geom_smooth(method = "loess") + theme_bw() + labs(x = "Field Goal Distance", y = "Predicted Make Percentage", col = "NFL Season", title = "Fitted Field Goal Make Percentage")

As we can see, the three worst seasons for kickers were the first three years in our data set, in line with the hypothesis that kickers have generally improved over time. Now that we have this model, we can match it up with our play by play data.

Fitting Our Model

To fit our model for evaluating kickers, we will look at every fourth down faced by a team. The reason we do this is to take advantage of the confidence factor that makes our model unique. Looking at every fourth down allows us to geta binary variable of “Did this team feel that their kicker attempting a field goal in this situation was the best option.” Unfortuantely, we miss out on the late game field goals that were taken on one of the first three downs. Although these are often the field goals we remember, our earlier model showed that the time of game is not a good predictor of make percentage. Therefore, losing out on this small fraction of data is not going to be a deal breaker. We have compiled the data into a frame called “fourths.” The next thing we will do as add in some field goal specific variables.

fourths$isfg <- fourths$PlayType == "Field Goal"

isfglm <- glm(isfg ~ TimeSecs + ScoreDiff + distance_to_end*ydstogo, data = fourths, family = "binomial")
fourths$fgprob <- predict(isfglm,newdata = fourths, type = "response")
summary(isfglm)
## 
## Call:
## glm(formula = isfg ~ TimeSecs + ScoreDiff + distance_to_end * 
##     ydstogo, family = "binomial", data = fourths)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.1371  -0.1965  -0.0255   0.0000   2.9457  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)             -3.542e-01  8.534e-02  -4.151 3.31e-05 ***
## TimeSecs                 4.249e-04  2.259e-05  18.805  < 2e-16 ***
## ScoreDiff                5.164e-02  2.291e-03  22.542  < 2e-16 ***
## distance_to_end         -5.397e-02  2.335e-03 -23.115  < 2e-16 ***
## ydstogo                  8.734e-01  2.192e-02  39.854  < 2e-16 ***
## distance_to_end:ydstogo -2.169e-02  6.126e-04 -35.414  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 37770  on 34334  degrees of freedom
## Residual deviance: 12136  on 34329  degrees of freedom
## AIC: 12148
## 
## Number of Fisher Scoring iterations: 9

Here, we have created a glm that looks at the time of game, score differential, place on the field, and yards to go for every fourth down taken and outputs the percent chance that a field goal would be attempted. We will use this to unravel the confidence factor that goes into team decision making.

The next step we have taken is matching up each game with the kicker who was on the field for the team. We have compiled this into a frame called kicker data. The data is as seen below.

head(kicker_data)
##    kicker_name fgs_attempted fgs_attempted_exp fgs_made fgs_made_expexted
## 1 S.Gostkowski           274         288.20176      245         241.46773
## 2     C.Santos            99         100.74169       84          82.81871
## 3  C.Catanzaro           109         106.99742       93          87.96339
## 4   S.Hauschka           238         239.57874      209         198.08244
## 5    D.Hopkins            80          74.88821       69          63.09009
## 6    C.Sturgis           127         119.66647      102         100.91555
##   total_points_expected no_confidence_expected_makes attempts_added
## 1              724.4032                    235.53963     -14.201760
## 2              248.4561                     83.92704      -1.741692
## 3              263.8902                     91.84796       2.002577
## 4              594.2473                    201.49786      -1.578737
## 5              189.2703                     68.38280       5.111789
## 6              302.7466                    106.28725       7.333532
##   kicks_added kicks_added_no_confidence team
## 1    3.532273                9.46036736   NE
## 2    1.181291                0.07296434   KC
## 3    5.036605                1.15204029  ARI
## 4   10.917563                7.50213715  SEA
## 5    5.909915                0.61719865  WAS
## 6    1.084455               -4.28724793  MIA

The added columns are as follows.

  1. fgs_attempted_expected is a measure of the total field goals expected to be attempted by fitting our field goal prediction model over every fourth down faced by a team.

  2. fgs_made_expected is roughly fgs_attempted_expected but we have multiplied every probability of a field goal being attempted by the probability that this field goal was made.

  3. total_points_expected is just fgs_made_expected multiplied by three.

  4. no_confidence_expected_makes predicts how many field goals, of the field goals that a kicker attempted, he should have been expected to make.

  5. attempts_added shows how many more field goals were attempted than the average kicker would have taken.

  6. kicks_added shows how many more field goals were made than the average kicker would have made, accounting for all 4th down field goal oppurtunities.

  7. kicks_added_no_confidence shows how many more field goals were made than the average kicker would have made, only looking at the field goals that were actually attempted.

Making Plots With Our New Data

Now that we have our cleaned kicker data for 2009-2017, we can look at some plots to analyze kicker performance over this span. First, we can make a plot in ggplot that shows total field goals added over this range. This is our master stat, which I believe is much more effective in evaluation than either make percentage our field goals made above replacement (which wouldn’t account for the added attempts). We can see some familiar names at the top of the list.

For a better look, here is the plot zoomed on on the players who have performed most and least admirably in this time frame.

It seems like a whose who of the best known kickers from the last decade. Sometimes the eye test is confirmed by our analysis. Let’s take at the list no one wants to see their name on.

How does this plot differ from one that shows kicks added without taking into account hte value of the decision to take a field goal? This plot below shows the results of a more traditional kicks above replacement model, showing kicks added fitted only over kicks attempted.

As we can see, though the plot looks fairly similar in terms of the rough order of kickers, there is some small variation, and the values for “kicks added” are lower near the extremes. This makes sense because the average kicker attempts less field goals than an elite kicker, so in reality the average kicker would make less field goals than the average kicker who attempts all of Justin Tucker’s kicks. This discrepency is captured when confidence is factored in.

We can look at a scatter plot of these two metrics to see who is benifiting the most from the more complex metric.