Methodology

show your work

Model

v2, fitted

Every constant in this model is fitted to history and validated on seasons it never saw. Here is the whole machine, including the parts that miss.

Ratings

In season, team strength comes from CollegeFootballData Elo, refreshed weekly. Before week 1 we build our own preseason prior: last season's final rating regressed 37.2% toward the team's conference mean (a MAC team and an SEC team regress to their own tiers, and realignment moves a team's anchor with it), then adjusted by three roster signals, each weighted by a coefficient fit on 2015-2025: returning production (46.2 Elo per standard deviation, by far the strongest), 247 talent composite (38.5), and recruiting class points (39.8). Net transfer-portal rating was fitted too and came back at roughly zero once the other three are known. We left it in at its fitted weight rather than pretend otherwise.

Validated leave-one-season-out on 3,083 early-season games, this prior beats both a flat regression-to-the-mean carry and CFBD's own week-1 ratings in 10 of 11 seasons (pooled log-loss 0.405 vs 0.422 and 0.423).

Game probabilities

Win probability is the standard Elo logistic with a fitted home-field advantage of 82.8 Elo points. Two fitted findings shape the uncertainty. First, ratings are persistently wrong by about 58.7 Elo points per team, and the error is season-long, not game-to-game: a misrated team stays misrated in the same direction. Second, once you account for that, early-season ratings are no worse-calibrated than November ratings. So every simulated season draws each team's true strength once and keeps it, rather than adding noise game by game.

The simulation

50,000 Monte Carlo seasons per week, reproducible by seed. Each one runs two tracks: truth (the drawn strengths decide every game) and perception (an Elo filter, K fitted at 109.3, tracking what the world would believe as results come in). The committee model ranks on perception, never truth, because a committee cannot act on information that does not exist. Committee score is perception plus a poll anchor that grows toward championship week, plus a 30-point conference-champion bump. Five auto-bids, seven at-large, top four champions get byes.

Conference championship pairings come from simulated standings run through each conference's actual tiebreaker policy: head-to-head (with the round-robin condition for 3+ team ties), common opponents, conference strength of schedule, CFP rank, and computer metrics, in each conference's published order, restarting the cascade when a tie splits. FBS-vs-FCS games use a single fitted FCS rating of 816 (nobody publishes FCS Elo; the constant matches a decade of actual FBS win rates).

The committee itself is also noisy: real selections include judgment calls (ask 2024 Alabama), so each simulated season adds fitted noise of 60 Elo points to every team's committee score. We tested the obvious alternative, fat tails on the strength draw, and it produced no measurable improvement, so the uncertainty lives where the evidence says it belongs: in the committee room, not the rosters.

Backtests

The engine is re-run “as of” each week of 2024 and 2025 using only what was knowable then, and scored against the real playoff fields. Predicted championship matchups at title week: 18 of 18 conferences correct. Field selection with all results in: 24 of 24 teams. Weekly playoff-odds Brier score improves from ~0.06 in September to ~0.01 by championship week; pooled Brier skill score vs base rates is 0.53. Calibration holds through the 90th percentile: teams we called 26% made it 25% of the time, teams we called 86% made it 84%.

What we get wrong

The very top is still a bit overconfident. Teams given 90-100% made the field roughly 84-92% of the time in the backtest, even after the committee-noise fit. The sample is small and correlated (one late snub echoes across many weeks), and injuries are what the model does not see. We publish the table instead of smoothing it.
Coaching changes, injuries, and suspensions are not modeled.
Margin of victory is not used; ratings are win/loss Elo with fitted corrections.
The champion bump and anchor schedule are structural assumptions validated by backtest, not independently fitted. They are the last two hand-set constants in the system.
Two seasons of the 12-team format exist; season-level claims carry that asterisk.

More on the project itself, including how it makes money, is on the about page.