March Madness: The Statistical Model Approach To NCAA Tournament Predictions
Every person employs a slightly different strategy when making March Madness predictions. Some people choose based on their biases, the school mascots, or on historical reputations. Others research statistics and favor teams that, say, have a higher free-throw shooting percentage than their opponents. And there are certainly many people who take schedule and conference strength into consideration when determining whether a seed is properly ranked.
And, of course, there are numerous people that pick top seeds almost exclusively – along with just a couple upset predictions to shake things up.
So what’s the best bracket prediction approach? This question is obviously an impossible one to answer, as the stunning success in recent years of George Mason and VCU only begin to illustrate. Ultimately, when the tournament comes to a close, the higher ranked teams will usually be the ones who advance further and come out on top. But, meanwhile, a good number of upsets will take place in the interim. That’s all we really know for sure.
In an attempt to counter this uncertainty, a growing number of people are turning towards advanced prediction methods – ie statistical models – in an effort to scientifically break down the tournament as accurately as possible. One of the most prominent advocates of this approach is Nate Silver, a statistician who founded FiveThirtyEight.com and is known for correctly predicting numerous political races. Silver has created a modeled prediction for the tournament that takes the following factors into account:
-Predictor ratings from experts such as Jeff Sagarin and Sonny Moore. This allows Silver to take an average of all other statistical models and incorporate them into his program.
-USA Today and AP preseason polls. Silver uses these polls as an indicator of a team’s true talent that does not include performance this year. This helps the model correct for teams that have underperformed or overperformed in their regular season schedule.
-The NCAA committee’s seeding curve. The way a team is seeded can, naturally, impact its tournament play as well as reflect its actual overall quality. This factor, then, allows for the NCAA’s rating (from 1 to 64) to be programmed directly into the model.
-Geographical location. Silver notes that few commentators take geography and fan support into account when making tournament predictions, even though such factors have been shown to have an impact on play. Consequently, his model gives a boost to teams that can expect larger fan turnouts due to geography.
-Injuries. A team that loses its star player on the eve of the tournament is likely to underperform once the postseason starts. Conversely, a team whose star player has been sitting on the sideline in knee braces for most of the regular season can be expected to play better if that player returns in time for the tournament. Silver’s model, therefore, adds tweaks in both directions to account for major injuries.
A chart of Silver’s final predictions can be found on the New York Time’s website. While his bracket generally favors higher ranked teams, it often favors the lower-ranked team when two close seeds are concerned. For example, it sees number 2 seeds Missouri and Ohio State to be the favorites in their respective regions.
How accurate is this model? The data from last year indicates that the model outperformed most predictions – including the Las Vegas oddsmakers – but still made many mistakes, especially in the latter rounds. As for what this year will hold, well, we’re just going to have to wait and see.
