Every March, college basketball fans and analysts try to predict which teams will make deep runs in the NCAA tournament. While there’s always an element of unpredictability (hence, “March Madness”), certain factors consistently correlate with tournament success. In this post, we’ll explore these factors using interactive visualizations of historical tournament data.

MIDS 207

Our team from the machine learning class, MIDS 207 at UC Berkeley, attempted to take on this problem.

We initially started by exploring Kaggle datasets, provided from the annual Kaggle competition, which brought us about 60-67% accuracy.

Uninspired by the models performance, we looked to popular models to see what type of data they were using to increase their performance.

If you watch college basketball, you definately have heard of Ken Pom. He is a statitician that is known for his popular college basketball rankings being very accurate every year based on synthetic features he derives himself.

Lets dive into his features together:

Ken Pom Data

Below is an interactive visualization that allows you to explore relationships between these factors and tournament success. Use the controls to filter by year, adjust metrics, and highlight specific conferences or teams.

2024

Tournament Performance Over Time

Watch how different teams' performance metrics have evolved over the years. Use the controls below to play through the years or select specific metrics to analyze.