Methods

How this works, in plain terms.

Calibrated predicts CS2 matches and scores itself in public. This page explains what our numbers mean and how we keep ourselves honest — without the secret recipe. If a term on the site ever loses you, it's probably explained here.

What Calibrated is

Calibrated is an information service. For each upcoming professional CS2 match we publish a win probability and the factors behind it. That's it — we never take, place, or broker a bet, and we never sell picks as "locks."

The model is trained without ever seeing betting odds, so it's forecasting the game, not chasing a market.

Calibration: the one idea

Most prediction sites brag about accuracy — how often the favourite won. We lead with something stricter and more useful: calibration.

A model is calibrated when its probabilities mean what they say: of all the times it says 60%, the team really should win about 60% of the time — no more, no less.

Why this matters more than accuracy: a pick can be "wrong" and the model still be perfectly honest. If we say a team has a 56% chance and they lose, that's not a broken model — at 56% we expect to be wrong roughly 4 times in 10. A calibrated 56% is worth far more than a confident-sounding number you can't trust.

The calibration check on the homepage and predictions page shows this directly: each dot is a group of past predictions — when we said X%, teams actually won Y%. The closer the dots sit to the diagonal line, the better calibrated we are.

The calibration score

Our headline number is a single calibration score — technically the . Read it like golf — lower is better:

0.250 — a coin flip (no skill).
0.000 — perfect (impossible in reality).
0.2233 — where we are today, across 4,069 scored matches.

To show that score is real and not graded on a curve, we line it up against simple baselines anyone could build:

Coin flip — 50/50 every match · 0.250
Always pick the favourite — 0.238
— a strong standard rating model · 0.231
Calibrated (us) — 0.2233

Scored in the open, on unseen matches

It's easy to look good on matches you've already studied. So we never score ourselves on those. Every match in our track record was tested out-of-time — predicted as if it were in the future, using only what was knowable beforehand, on matches the model was never trained on. No peeking.

And we show wins and losses, not a highlight reel. The Past / Results section lists every graded pick — including the ones we got wrong — because that's the only way a calibration claim can be trusted.

Reading a prediction

The probability bar

Each card shows two percentages that add to 100% — our win probability for each team — and the team we favour.

What moved the prediction

Below that is a diverging bar chart of the factors behind the pick. Each bar is one factor — rating gap, map-pool edge, lineup, recent form, and so on. Bar length = how strongly that factor favours a team; side & colour = which team it favours (green left, cyan right). It's a window into the reasoning, not a black box.

When we hold back

Limited history — when a team is new or barely tracked, we still show a direction but mark it limited and keep it out of the headline score.
Withheld — when we have no honest basis at all, we publish no number. We'd rather say nothing than guess.

Locked in before the match

To prove we don't quietly edit a prediction after the fact, every pick is locked in before the match starts. We take a of the prediction and publish it. After the match, we reveal the full pick — and anyone can re-compute the fingerprint to confirm it matches.

What this does and doesn't prove (we won't overstate it). The fingerprint proves we didn't change the prediction after locking it. It does not yet prove the timing to an outside party — that we locked it in when we say we did. Independent, third-party timestamping is on the roadmap, not built yet.

Plain-language glossary

The terms you might see, and what we call them on the site:

Calibration score"Brier score"

How well our probabilities match reality. Lower is better; 0.25 is a coin flip.

Calibration

When our 60%s really win about 60% of the time. What we're judged on — more than raw accuracy.

Tested on unseen matches"out-of-time"

Scored only on matches outside the model's training window — predicted as if they were still in the future.

Ratings-only baseline"calibrated Glicko"

A strong standard model that uses team ratings alone (Glicko is an Elo-style system). One of the bars we beat.

What moved the prediction

The factor bars on each card — bar length is the strength of a factor, side & colour is the team it favours.

Limited history"thin"

Not enough track record for a confident call — we show which team we lean toward, not a confident percentage, and keep it out of the headline score.

Digital fingerprint"SHA-256 hash"

A short code computed from the exact pick, published before the match so the prediction can't be changed afterwards.