NCAA Basketball Prediction Model Methodology

By: Luke Benz

June 30th, 2017

Using game results from the 2016-17 and 2017-18 seasons, I've built a weighted multiple linear regression model that predicts score differential using team, opponent, and game location (home, away, neutral). From the predicted score differential, I use a simple logistic regression to convert point spread into a win probability. The model coefficients, which I call "YUSAG" coefficients, essentially measure how much of a team's strength can be explained by its own results as compared to those of its opponents.The YUSAG coefficient is the number of points a given team would be expected to beat the average college basketball team on a neutral court. Taking the difference between two teams coefficients gives you how much team A would be expected to beat team B by on a neutral court.

Each game being fed into the model is given a different weight. Games that took place in the 2016-17 season are given the weight according to the following formula:

These weights explain how informative a certain game in the 2016-17 season is when predicting results in the 2017-18 season. The idea is that the percentage of returning minutes give an estimate as to how similar a 2017-18 team to its previous version the preceeding year. Also, I believe that each team builds its own identity (separate from the identity of previous teams), so I decrease the weight any given 2016-2017 game is worth the further into the 2017-18 season we are. Upon completing half of its 2017-2018 games, the 2016-2017 weights are set to zero, as the model now has sufficient information from the current season to predict games (This belief is based on the fact that I began predicting last season about halfway through the season, and the model preformed reasonably well). One notices that weights for the 2016-17 season are maximized at 0.5, which happens to be the minimum game weight for the 2017-18 season, as I believe that no game played last season is a better predictor of games in the current season than other games in the current season.

Weights for the 2017-18 season range between 0.5-1.0 depending on the "relative recency" of a game. Essentially, the relative recency of a game is the fraction of games completed that took placed through the game in question. Games that are more recent are given higher weight, while games that took place further back in the season receive less weight. The idea behind this is that how a team plays at the beginning of the season is less predictive of their success in February and March than how they play in January. 2017-2018 game weights are assigned according to the following formula, a variation on the Sigmoid function:

All games beyond the 0.4-0.5 relative recency threshold are given roughly the same weight, as seen below.

An adjustment is made to account for the potential contributions of the incoming freshman class. Data from's recruiting rankings was used as the basis for these adjustments. Any school that didn't register a score for its incoming freshman class was given a score of 0. Like the 2016-17 game weights, this adjustment decreases the deeper a team plays into it 2017-18 slate of games. Moreover, this adjustment vanishes once a team has played half of its games in 2017-18, as by that point in the season, its reasonable to believe the impact of the incoming freshman class has been baked into the existing game results. In the formula below, 17.58 is the standard deviation of 247Sports recruiting scores greater than zero. I chose this formula rather than some form of Z-score, as the underlying distribution was not normal. Additionally, I desired to keep the YUSAG Coefficient of the model baseline team at zero.

Finally, an adjustment is made to teams' YUSAG coefficients to account for incoming transfers that are eligible to play in the 2017-2018 season. I begin by taking data from Bart Torvik's T-rank site on incoming transfers. Specially, I took data on his projected offensive contributions for these players for the 2017-18 season. From there, I created a metric that captured the impact of of a team's incoming transfers, which could simply be added to a team's existing YUSAG coefficient. As before, this transfer adjustment decreases as the 2017-18 gets into full swing, as I make the assumption that the impact of the transfers in question is eventually baked into the game results.

While this is version 2.0 of my NCAA basketball model, it's the first time that I will be using it to make predictions at the beginning of the season, before any 2017-18 games have taken place, and to be honest, I have no idea how well it will perform starting out the season. Even with adjustment for incoming and outgoing players, the core of the model is based on game results. Thus, preseason ratings, and early 2017-18 predictions may be somewhat skewed by 2016-17 results. As more games are played come November and December, the model will learn the college basketball landscape for the 2017-18 season and will only improve its accuracy. If you enjoy college basketball even half as much as I do, then I hope you'll enjoy following my predictions this season. NCAA Power Rankings will be updated regularly throughout the season, and win probabilities for important games will be released a few times per week on our Twitter page. The majority of the basketball content produced will cover the Ivy League (#MathToThePalestra), with predictions, playoff probabilities, and relative game importance released weekly on Twitter. Additionally, several longer articles will be featured in the Yale Daily News sports section, or in the YDN's sports blog, Down The Field. Ken Massey's site will be tracking how our model (and dozens of other models) fairs this season, so feel free to check out our progress there. Happy hoopin'!

Author's Note: I would like to thank Bart Torvik for helping me think about preseason/early season modeling. If you haven't already seen his site, it's very impressive and you should check it out.