By Luke Benz
Updated June 16th, 2017
Recently, we built a model to predict the outcome of NCAA Baseball games, and have been using our model to analyze the NCAA College World Series. Our linear model uses team, opponent, and game location to assign to predict run differential. From there, we can use a simple logistic regression to translate run differential into win probability. Each team’s model coefficient is the number of run they would be expected to beat the model-baseline team (A&M Corpus Christi, first alphabetically) on a neutral field. Taking the difference between two teams coefficients gives you how much Team A would be expected to beat Team B by on a neutral field. The model also has coefficients for relative game location (Home, Away, Neutral). Home field advantage is worth roughly 1/3 of a run, according to our model.
In essence, this method determines how much of a team’s results are can be explained by its own strength compared to the strength of its opponent. This method works well because there are so many games played over the course of a season (on the order of 10,000) and enough cross-conference games during the non-conference schedule for the model to make connections relative conference strengths.