worldcup-oracle: AI vs Polymarket — Can Time Series Models Beat a $480M Prediction Market?

An AI system that predicts every aspect of the 2026 FIFA World Cup using time series foundation models (Chronos-2, TimesFM 2.5, FlowState), then compares predictions against Polymarket odds ($525M+ volume) to find mispriced markets.

Last updated: April 8, 2026 | Tournament: June 11 - July 19, 2026 | 48 teams, 104 matches

Current AI Predictions (TSFM Ensemble)

Rank	Team	AI Win %	Polymarket	Edge	Signal
1	Spain	32.2%	16.0%	+16.2%	STRONG BUY
2	France	12.9%	13.5%	-0.6%	—
3	Argentina	11.9%	9.0%	+2.8%	BUY
4	England	6.0%	11.3%	-5.4%	STRONG SELL
5	Ecuador	5.4%	0.9%	+4.6%	BUY
6	Brazil	3.0%	8.6%	-5.6%	STRONG SELL
7	Mexico	2.7%	1.1%	+1.6%	—
8	Norway	2.6%	2.8%	-0.2%	—
9	Colombia	2.6%	1.7%	+0.9%	—
10	Morocco	2.5%	1.7%	+0.9%	—
11	Netherlands	2.0%	3.4%	-1.3%	—
12	Japan	1.9%	2.4%	-0.5%	—
13	Turkey	1.9%	0.8%	+1.1%	—
14	Portugal	1.9%	7.0%	-5.2%	STRONG SELL
15	Croatia	1.6%	1.1%	+0.6%	—
16	Germany	1.5%	5.5%	-4.0%	SELL
17	Canada	1.3%	0.5%	+0.8%	—
18	Switzerland	1.0%	0.9%	+0.1%	—
19	Uruguay	1.0%	1.1%	-0.2%	—
20	Paraguay	1.0%	0.4%	+0.5%	—
21	Senegal	0.6%	0.8%	-0.1%	—
22	Belgium	0.5%	1.9%	-1.4%	—
23-48	Others	<0.5% each	—	—	—

Biggest Edges: Where AI Disagrees Most with the Market

Team	AI	Polymarket	Edge	Direction	Kelly	Models Agree	Signal
Spain	32.2%	16.0%	+16.2%	BUY	9.7%	4/4	STRONG EDGE
Brazil	3.0%	8.6%	-5.6%	SELL	—	4/4	STRONG EDGE
England	6.0%	11.3%	-5.4%	SELL	—	4/4	STRONG EDGE
Portugal	1.9%	7.0%	-5.2%	SELL	—	4/4	STRONG EDGE
Ecuador	5.4%	0.9%	+4.6%	BUY	2.3%	4/4	edge
Germany	1.5%	5.5%	-4.0%	SELL	—	4/4	edge
Argentina	11.9%	9.0%	+2.8%	BUY	1.6%	4/4	edge

STRONG EDGE = absolute edge > 5 percentage points AND all 4 models agree on direction.

Per-Model Breakdown (Top 10)

Team	Ensemble	Polymarket	Chronos-2	TimesFM-2.5	FlowState	Elo Baseline
Spain	32.2%	16.0%	32.5%	33.5%	30.6%	33.6%
France	12.9%	13.5%	12.2%	12.7%	13.6%	12.0%
Argentina	11.9%	9.0%	11.7%	11.2%	12.8%	10.9%
England	6.0%	11.3%	5.5%	6.9%	5.5%	6.9%
Ecuador	5.4%	0.9%	5.4%	5.2%	5.7%	5.2%
Brazil	3.0%	8.6%	3.5%	2.7%	2.9%	2.4%
Mexico	2.7%	1.1%	2.5%	2.7%	2.9%	2.8%
Norway	2.6%	2.8%	2.6%	2.7%	2.4%	2.8%
Colombia	2.6%	1.7%	3.1%	2.4%	2.3%	2.7%
Morocco	2.5%	1.7%	2.6%	2.4%	2.5%	2.4%

Visualizations

AI vs Polymarket Scatter

Top Edges

Team Elo Trajectories + TSFM Forecasts

Top 8 Teams — Elo Trajectories

Backtest Validation: 3 Past World Cups

Before trusting the model, we backtested on the 2014, 2018, and 2022 World Cups using only data available before each tournament. All backtests use the correct 32-team format with the official FIFA bracket structure (1A vs 2B, etc.).

Cross-Tournament Comparison

Model	2014 Brazil	2018 Russia	2022 Qatar	Avg Brier	Avg BSS
Chronos-2	0.0250 (#3)	0.0347 (>5)	0.0192 (#2)	0.0263	+0.131
TimesFM-2.5	0.0250 (#3)	0.0352 (>5)	0.0195 (#2)	0.0266	+0.122
FlowState	0.0252 (#3)	0.0351 (>5)	0.0196 (#2)	0.0266	+0.120
Elo Baseline	0.0249 (#3)	0.0351 (>5)	0.0201 (#2)	0.0267	+0.118
Uniform (random)	0.0303	0.0303	0.0303	0.0303	0.000

Brier Score: lower is better. BSS (Brier Skill Score): higher is better, 0 = random. (#N) = actual champion's rank in model's top-5 predictions.

Did the Model Identify the Champion?

Model	2014 Germany	2018 France	2022 Argentina	Score
Chronos-2	#3	>5	#2	2/3
TimesFM-2.5	#3	>5	#2	2/3
FlowState	#3	>5	#2	2/3
Elo Baseline	#3	>5	#2	2/3

Key findings:

All backtests use the correct 32-team format with the official FIFA bracket (1A vs 2B, etc.)
All models correctly identified the champion in their top 3 for 2/3 tournaments
2018 was the hardest: France was Elo-ranked outside the top 5 pre-tournament; all models have negative BSS
Chronos-2 is the best TSFM model across all 3 tournaments (avg BSS +0.131)
TSFMs provide modest but consistent improvement over pure Elo (+0.131 vs +0.118)

Methodology

Architecture

Historical Matches (49K+ since 1990)
         |
    [Elo Engine] --> Per-team Elo time series (weekly, 260 weeks)
         |
         |--- [Chronos-2 (120M params)]  --> Elo forecast (20 weeks) --+
         |--- [TimesFM 2.5 (200M params)] --> Elo forecast (20 weeks) --+
         |--- [FlowState (9.1M params)]   --> Elo forecast (20 weeks) --+
         |                                                               |
         |    +----------------------------------------------------------+
         |    v
         |  [Bradley-Terry Bridge] --> P(win/draw/loss) per match
         |    |
         |    v
         |  [Equal-Weight Ensemble] --> Final match probabilities
         |
         +--- [XGBoost (match-level)] --> Direct P(win/draw/loss) ----> [Ensemble]
                                                                           |
                                                                           v
                                                               [Monte Carlo Simulator]
                                                                50,000 tournament runs
                                                                           |
                                                                           v
                                                               P(champion) per team
                                                                           |
                                                                           v
                                                               [Edge Detector]
                                                               vs Polymarket odds
                                                               + Kelly bet sizing

Data Sources

Historical matches: martj42/international_results — 49,287 matches since 1872, current through March 2026
Polymarket: Gamma API — real-time odds from $525M+ prediction market

Models (reused from fin-forecast-arena)

Model	Params	Type	Source
Chronos-2	120M	Probabilistic (21 sample paths)	Amazon
TimesFM 2.5	200M	Point + 10 quantiles	Google
FlowState	9.1M	Point + 9 quantiles	IBM
XGBoost	~50K	Match-level classifier	Baseline

All models run on CPU only (32GB RAM, no GPU). Memory managed via load-one-at-a-time pattern with gc.collect().

Key Design Decisions

Elo as TSFM input: Elo ratings form genuine continuous time series with trends, mean-reversion, and noise — similar to financial data the TSFMs were trained on
Bradley-Terry bridge: Converts continuous Elo forecasts into discrete match outcome probabilities using the Davidson (1970) draw model
Uncertainty propagation: TSFM quantile forecasts (q10/q90) are sampled and propagated through match predictions
Home advantage: +80 Elo for USA/Canada/Mexico in their host country matches
Half-Kelly sizing: Conservative bet sizing to manage risk

Edge Detection

An edge is the difference between the AI's probability and Polymarket's implied probability. A STRONG EDGE requires:

Absolute edge > 5 percentage points
At least 3 of 4 models agree on the direction

Bet sizing uses the Kelly criterion with a half-Kelly cap for safety.

Live Pipeline

Phase A: Pre-Tournament (now through June 10)

Daily: Fetch Polymarket odds, log movements
Weekly (Mondays): Re-run TSFM models, update predictions, regenerate plots

Phase B: During Tournament (June 11 - July 19) — not yet implemented

Planned: After each match day, update Elo with actual results, re-simulate remaining bracket, score previous predictions
Current status: Falls back to the Phase A pipeline. See pipeline/matchday_run.py for the TODO list.

Project Structure

worldcup-oracle/
├── config.py                 # All constants: 48 teams, 12 groups, parameters
├── data/
│   ├── fetcher_matches.py    # Download international match results
│   ├── fetcher_polymarket.py # Polymarket Gamma API client
│   ├── elo.py                # Elo rating engine
│   └── feature_engineering.py # Per-team time series features
├── models/
│   ├── chronos2_sports.py    # Chronos-2 wrapper
│   ├── timesfm_sports.py     # TimesFM 2.5 wrapper
│   ├── flowstate_sports.py   # FlowState wrapper
│   └── xgboost_sports.py     # XGBoost match-level classifier
├── prediction/
│   ├── strength_forecaster.py # Level 1: TSFM Elo forecasting
│   ├── match_predictor.py     # Level 2: Bradley-Terry bridge
│   ├── tournament_simulator.py # Level 3: Monte Carlo (50K sims)
│   └── ensemble.py            # Multi-model ensemble
├── markets/
│   ├── edge_detector.py       # AI vs market comparison + Kelly
│   ├── polymarket_tracker.py  # Daily odds snapshots
│   └── odds_converter.py      # Probability/odds math
├── evaluation/
│   ├── backtester.py          # Backtest on 2014/2018/2022 WCs
│   └── metrics.py             # Brier score, log loss, calibration
├── visualization/             # All chart generators
├── pipeline/
│   ├── daily_run.py           # Phase A: pre-tournament pipeline
│   └── matchday_run.py        # Phase B: during-tournament (stub)
├── tests/                     # 53 tests, all passing
├── requirements.txt           # Python dependencies
└── results/
    ├── predictions/           # Current AI predictions
    ├── odds_history/          # Polymarket odds snapshots
    ├── edges/                 # Edge detection reports
    ├── evaluations/           # Backtest results
    └── plots/                 # Generated charts

Sister Project

This project reuses model infrastructure from fin-forecast-arena — a benchmarking arena that pits the same 3 TSFM models against each other on financial time series (semiconductor stocks, mega-cap tech, ETFs).

Disclaimer

This is a research and educational project. It is not financial advice, gambling advice, or an invitation to wager.

All predictions are generated by statistical models and carry no guarantee of accuracy. Past backtest performance does not guarantee future results.
Prediction markets and sports betting are regulated differently across jurisdictions. It is your responsibility to verify that any wagering activity complies with the laws of your jurisdiction before placing any bets.
The authors of this project are not licensed financial advisors, bookmakers, or gambling operators. No fiduciary relationship is created by using this software.
Polymarket data is fetched via their public API for research purposes only. This project is not affiliated with, endorsed by, or sponsored by Polymarket, FIFA, or any of the model providers (Amazon, Google, IBM).
The Kelly criterion calculations and "BUY/SELL" signals are illustrative of the model's output. They are not recommendations to place any specific wager.
If you choose to bet real money based on any output from this system, you do so entirely at your own risk.

License

This project is licensed under the MIT License.

Acknowledgments

Match data: martj42/international_results
Models: Amazon Chronos-2, Google TimesFM, IBM Granite FlowState
Market data: Polymarket Gamma API

Built with Chronos-2, TimesFM 2.5, FlowState, and 50,000 Monte Carlo simulations per prediction run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

worldcup-oracle: AI vs Polymarket — Can Time Series Models Beat a $480M Prediction Market?

Current AI Predictions (TSFM Ensemble)