An AI system that predicts every aspect of the 2026 FIFA World Cup using time series foundation models (Chronos-2, TimesFM 2.5, FlowState), then compares predictions against Polymarket odds ($525M+ volume) to find mispriced markets.
Last updated: April 8, 2026 | Tournament: June 11 - July 19, 2026 | 48 teams, 104 matches
| Rank | Team | AI Win % | Polymarket | Edge | Signal |
|---|---|---|---|---|---|
| 1 | Spain | 32.2% | 16.0% | +16.2% | STRONG BUY |
| 2 | France | 12.9% | 13.5% | -0.6% | — |
| 3 | Argentina | 11.9% | 9.0% | +2.8% | BUY |
| 4 | England | 6.0% | 11.3% | -5.4% | STRONG SELL |
| 5 | Ecuador | 5.4% | 0.9% | +4.6% | BUY |
| 6 | Brazil | 3.0% | 8.6% | -5.6% | STRONG SELL |
| 7 | Mexico | 2.7% | 1.1% | +1.6% | — |
| 8 | Norway | 2.6% | 2.8% | -0.2% | — |
| 9 | Colombia | 2.6% | 1.7% | +0.9% | — |
| 10 | Morocco | 2.5% | 1.7% | +0.9% | — |
| 11 | Netherlands | 2.0% | 3.4% | -1.3% | — |
| 12 | Japan | 1.9% | 2.4% | -0.5% | — |
| 13 | Turkey | 1.9% | 0.8% | +1.1% | — |
| 14 | Portugal | 1.9% | 7.0% | -5.2% | STRONG SELL |
| 15 | Croatia | 1.6% | 1.1% | +0.6% | — |
| 16 | Germany | 1.5% | 5.5% | -4.0% | SELL |
| 17 | Canada | 1.3% | 0.5% | +0.8% | — |
| 18 | Switzerland | 1.0% | 0.9% | +0.1% | — |
| 19 | Uruguay | 1.0% | 1.1% | -0.2% | — |
| 20 | Paraguay | 1.0% | 0.4% | +0.5% | — |
| 21 | Senegal | 0.6% | 0.8% | -0.1% | — |
| 22 | Belgium | 0.5% | 1.9% | -1.4% | — |
| 23-48 | Others | <0.5% each | — | — | — |
| Team | AI | Polymarket | Edge | Direction | Kelly | Models Agree | Signal |
|---|---|---|---|---|---|---|---|
| Spain | 32.2% | 16.0% | +16.2% | BUY | 9.7% | 4/4 | STRONG EDGE |
| Brazil | 3.0% | 8.6% | -5.6% | SELL | — | 4/4 | STRONG EDGE |
| England | 6.0% | 11.3% | -5.4% | SELL | — | 4/4 | STRONG EDGE |
| Portugal | 1.9% | 7.0% | -5.2% | SELL | — | 4/4 | STRONG EDGE |
| Ecuador | 5.4% | 0.9% | +4.6% | BUY | 2.3% | 4/4 | edge |
| Germany | 1.5% | 5.5% | -4.0% | SELL | — | 4/4 | edge |
| Argentina | 11.9% | 9.0% | +2.8% | BUY | 1.6% | 4/4 | edge |
STRONG EDGE = absolute edge > 5 percentage points AND all 4 models agree on direction.
| Team | Ensemble | Polymarket | Chronos-2 | TimesFM-2.5 | FlowState | Elo Baseline |
|---|---|---|---|---|---|---|
| Spain | 32.2% | 16.0% | 32.5% | 33.5% | 30.6% | 33.6% |
| France | 12.9% | 13.5% | 12.2% | 12.7% | 13.6% | 12.0% |
| Argentina | 11.9% | 9.0% | 11.7% | 11.2% | 12.8% | 10.9% |
| England | 6.0% | 11.3% | 5.5% | 6.9% | 5.5% | 6.9% |
| Ecuador | 5.4% | 0.9% | 5.4% | 5.2% | 5.7% | 5.2% |
| Brazil | 3.0% | 8.6% | 3.5% | 2.7% | 2.9% | 2.4% |
| Mexico | 2.7% | 1.1% | 2.5% | 2.7% | 2.9% | 2.8% |
| Norway | 2.6% | 2.8% | 2.6% | 2.7% | 2.4% | 2.8% |
| Colombia | 2.6% | 1.7% | 3.1% | 2.4% | 2.3% | 2.7% |
| Morocco | 2.5% | 1.7% | 2.6% | 2.4% | 2.5% | 2.4% |
Before trusting the model, we backtested on the 2014, 2018, and 2022 World Cups using only data available before each tournament. All backtests use the correct 32-team format with the official FIFA bracket structure (1A vs 2B, etc.).
| Model | 2014 Brazil | 2018 Russia | 2022 Qatar | Avg Brier | Avg BSS |
|---|---|---|---|---|---|
| Chronos-2 | 0.0250 (#3) | 0.0347 (>5) | 0.0192 (#2) | 0.0263 | +0.131 |
| TimesFM-2.5 | 0.0250 (#3) | 0.0352 (>5) | 0.0195 (#2) | 0.0266 | +0.122 |
| FlowState | 0.0252 (#3) | 0.0351 (>5) | 0.0196 (#2) | 0.0266 | +0.120 |
| Elo Baseline | 0.0249 (#3) | 0.0351 (>5) | 0.0201 (#2) | 0.0267 | +0.118 |
| Uniform (random) | 0.0303 | 0.0303 | 0.0303 | 0.0303 | 0.000 |
Brier Score: lower is better. BSS (Brier Skill Score): higher is better, 0 = random. (#N) = actual champion's rank in model's top-5 predictions.
| Model | 2014 Germany | 2018 France | 2022 Argentina | Score |
|---|---|---|---|---|
| Chronos-2 | #3 | >5 | #2 | 2/3 |
| TimesFM-2.5 | #3 | >5 | #2 | 2/3 |
| FlowState | #3 | >5 | #2 | 2/3 |
| Elo Baseline | #3 | >5 | #2 | 2/3 |
Key findings:
- All backtests use the correct 32-team format with the official FIFA bracket (1A vs 2B, etc.)
- All models correctly identified the champion in their top 3 for 2/3 tournaments
- 2018 was the hardest: France was Elo-ranked outside the top 5 pre-tournament; all models have negative BSS
- Chronos-2 is the best TSFM model across all 3 tournaments (avg BSS +0.131)
- TSFMs provide modest but consistent improvement over pure Elo (+0.131 vs +0.118)
Historical Matches (49K+ since 1990)
|
[Elo Engine] --> Per-team Elo time series (weekly, 260 weeks)
|
|--- [Chronos-2 (120M params)] --> Elo forecast (20 weeks) --+
|--- [TimesFM 2.5 (200M params)] --> Elo forecast (20 weeks) --+
|--- [FlowState (9.1M params)] --> Elo forecast (20 weeks) --+
| |
| +----------------------------------------------------------+
| v
| [Bradley-Terry Bridge] --> P(win/draw/loss) per match
| |
| v
| [Equal-Weight Ensemble] --> Final match probabilities
|
+--- [XGBoost (match-level)] --> Direct P(win/draw/loss) ----> [Ensemble]
|
v
[Monte Carlo Simulator]
50,000 tournament runs
|
v
P(champion) per team
|
v
[Edge Detector]
vs Polymarket odds
+ Kelly bet sizing
- Historical matches: martj42/international_results — 49,287 matches since 1872, current through March 2026
- Polymarket: Gamma API — real-time odds from $525M+ prediction market
| Model | Params | Type | Source |
|---|---|---|---|
| Chronos-2 | 120M | Probabilistic (21 sample paths) | Amazon |
| TimesFM 2.5 | 200M | Point + 10 quantiles | |
| FlowState | 9.1M | Point + 9 quantiles | IBM |
| XGBoost | ~50K | Match-level classifier | Baseline |
All models run on CPU only (32GB RAM, no GPU). Memory managed via load-one-at-a-time pattern with gc.collect().
- Elo as TSFM input: Elo ratings form genuine continuous time series with trends, mean-reversion, and noise — similar to financial data the TSFMs were trained on
- Bradley-Terry bridge: Converts continuous Elo forecasts into discrete match outcome probabilities using the Davidson (1970) draw model
- Uncertainty propagation: TSFM quantile forecasts (q10/q90) are sampled and propagated through match predictions
- Home advantage: +80 Elo for USA/Canada/Mexico in their host country matches
- Half-Kelly sizing: Conservative bet sizing to manage risk
An edge is the difference between the AI's probability and Polymarket's implied probability. A STRONG EDGE requires:
- Absolute edge > 5 percentage points
- At least 3 of 4 models agree on the direction
Bet sizing uses the Kelly criterion with a half-Kelly cap for safety.
- Daily: Fetch Polymarket odds, log movements
- Weekly (Mondays): Re-run TSFM models, update predictions, regenerate plots
- Planned: After each match day, update Elo with actual results, re-simulate remaining bracket, score previous predictions
- Current status: Falls back to the Phase A pipeline. See
pipeline/matchday_run.pyfor the TODO list.
worldcup-oracle/
├── config.py # All constants: 48 teams, 12 groups, parameters
├── data/
│ ├── fetcher_matches.py # Download international match results
│ ├── fetcher_polymarket.py # Polymarket Gamma API client
│ ├── elo.py # Elo rating engine
│ └── feature_engineering.py # Per-team time series features
├── models/
│ ├── chronos2_sports.py # Chronos-2 wrapper
│ ├── timesfm_sports.py # TimesFM 2.5 wrapper
│ ├── flowstate_sports.py # FlowState wrapper
│ └── xgboost_sports.py # XGBoost match-level classifier
├── prediction/
│ ├── strength_forecaster.py # Level 1: TSFM Elo forecasting
│ ├── match_predictor.py # Level 2: Bradley-Terry bridge
│ ├── tournament_simulator.py # Level 3: Monte Carlo (50K sims)
│ └── ensemble.py # Multi-model ensemble
├── markets/
│ ├── edge_detector.py # AI vs market comparison + Kelly
│ ├── polymarket_tracker.py # Daily odds snapshots
│ └── odds_converter.py # Probability/odds math
├── evaluation/
│ ├── backtester.py # Backtest on 2014/2018/2022 WCs
│ └── metrics.py # Brier score, log loss, calibration
├── visualization/ # All chart generators
├── pipeline/
│ ├── daily_run.py # Phase A: pre-tournament pipeline
│ └── matchday_run.py # Phase B: during-tournament (stub)
├── tests/ # 53 tests, all passing
├── requirements.txt # Python dependencies
└── results/
├── predictions/ # Current AI predictions
├── odds_history/ # Polymarket odds snapshots
├── edges/ # Edge detection reports
├── evaluations/ # Backtest results
└── plots/ # Generated charts
This project reuses model infrastructure from fin-forecast-arena — a benchmarking arena that pits the same 3 TSFM models against each other on financial time series (semiconductor stocks, mega-cap tech, ETFs).
This is a research and educational project. It is not financial advice, gambling advice, or an invitation to wager.
- All predictions are generated by statistical models and carry no guarantee of accuracy. Past backtest performance does not guarantee future results.
- Prediction markets and sports betting are regulated differently across jurisdictions. It is your responsibility to verify that any wagering activity complies with the laws of your jurisdiction before placing any bets.
- The authors of this project are not licensed financial advisors, bookmakers, or gambling operators. No fiduciary relationship is created by using this software.
- Polymarket data is fetched via their public API for research purposes only. This project is not affiliated with, endorsed by, or sponsored by Polymarket, FIFA, or any of the model providers (Amazon, Google, IBM).
- The Kelly criterion calculations and "BUY/SELL" signals are illustrative of the model's output. They are not recommendations to place any specific wager.
- If you choose to bet real money based on any output from this system, you do so entirely at your own risk.
This project is licensed under the MIT License.
- Match data: martj42/international_results
- Models: Amazon Chronos-2, Google TimesFM, IBM Granite FlowState
- Market data: Polymarket Gamma API
Built with Chronos-2, TimesFM 2.5, FlowState, and 50,000 Monte Carlo simulations per prediction run.






