Most prediction platforms give you one model's opinion and call it analysis. Eroteme runs 4 of the most capable AI models on Earth against every Polymarket market — Claude, GPT, Gemini, and Grok — and distils their outputs into a single consensus prediction. One number. One bet slip. Published on-chain before the outcome is known.
This post explains exactly how that works. No hand-waving. No black boxes.
The Pipeline: Signal to Bet Slip
Every prediction follows the same 4-stage pipeline:
1. Signal Ingestion — We pull structured data from the market and its surrounding context. 2. 4-Model Ensemble — Each AI model independently analyses the signals and produces a probability. 3. Consensus Score — The 4 outputs converge into one prediction with a confidence percentage. 4. Bet Slip Published — The final prediction goes on-chain. Timestamped. Immutable. Auditable.
No human edits the output between stages 2 and 4. The system produces the prediction, and the prediction ships.
Stage 1: Signal Ingestion
Before any model runs, the system collects 5 signal types for every market.
Sharp Odds Movement — Sudden shifts in Polymarket pricing that indicate informed money entering a position. A market that moves from 45% to 62% in 90 minutes is telling you something. We capture the velocity, magnitude, and timing of every move.
Form Index — Historical performance data for the entities involved. In a political market, this means polling trends, incumbency rates, and prior election margins. In sports, recent results, head-to-head records, and home/away splits. Numbers, not narratives.
Base Rates — How often does this type of event actually happen? Incumbents win US presidential elections roughly 66% of the time. Home teams in the Premier League win about 46% of matches. Base rates anchor the prediction before any market-specific signal enters the model.
News Sentiment — Real-time scanning of news feeds, press conferences, and official statements. The system scores sentiment as positive, negative, or neutral — and weights recent coverage more heavily than older stories. A single breaking story can shift the sentiment signal within minutes.
Crowd Wisdom — The Polymarket price itself encodes the aggregate belief of thousands of traders. We treat this as a signal, not as ground truth. The crowd is often right. But when the crowd diverges from the other 4 signals, that divergence is information.
All 5 signal types feed into every model simultaneously.
Stage 2: The 4-Model Ensemble
Here is what happens under the hood. Each model receives the same structured prompt containing the same 5 signals. Each model produces an independent probability estimate for the market outcome.
The 4 models are:
- Claude (Anthropic) — Strong on nuanced reasoning, calibration, and structured analysis
- GPT (OpenAI) — Broad knowledge base, strong pattern matching across domains
- Gemini (Google) — Multimodal reasoning, strong on data-heavy markets
- Grok (xAI) — Real-time information access, strong on fast-moving narratives
We chose these 4 because they fail differently. When one model has a blind spot, the others tend to catch it. That is the entire point of an ensemble — you want diverse failure modes, not correlated errors.
Each model returns a probability (e.g., "Arsenal 74%") and a reasoning chain explaining which signals drove the estimate. The models do not see each other's outputs. There is no negotiation. 4 independent assessments, produced in parallel.
Stage 3: From 4 Numbers to 1 Consensus
This is the part most people get wrong. The consensus is not a simple average. It is not a majority vote. It is a convergence score.
Here is the difference. If 3 models say 72%, 74%, 71% and 1 model says 38%, a simple average gives you 63.75%. That number is misleading — it buries the dissent and overstates certainty. Our system treats this as a split signal: strong 3-model agreement with 1 material dissent.
The convergence score accounts for 3 factors:
Agreement Spread — How tightly do the 4 probabilities cluster? A spread of 71-74% is tight alignment. A spread of 38-81% is wide divergence. Tighter clusters produce higher confidence.
Signal Alignment — Do the underlying signals point in the same direction across models? If all 4 models cite sharp odds movement and form index as their primary drivers, signal alignment is high. If one model leans heavily on news sentiment while others dismiss it, alignment drops.
Dissent Weight — When a model disagrees, we measure how far it diverges and which signals drive the disagreement. A 5-point difference on a noisy signal is low-weight dissent. A 30-point difference driven by a concrete data point (a confirmed injury, a policy announcement) is high-weight dissent that compresses the confidence score.
The output is one prediction with a confidence percentage. That percentage tells you two things: the predicted outcome and how strongly the ensemble agrees on it. A prediction at 82% confidence means the models converged tightly around that outcome with strong signal alignment. A prediction at 58% confidence means the models lean one direction but disagree on key signals.
Worked Example: Arsenal vs. Tottenham, March 2026
Here is a real bet slip walkthrough.
Market: Arsenal to beat Tottenham — Premier League, March 15, 2026.
Signals collected:
- Sharp odds movement: Arsenal price shortened from 1.85 to 1.55 in the 4 hours before kick-off
- Form index: Arsenal W4-D1-L0 in last 5; Tottenham W2-D1-L2
- Base rate: Home team wins in North London derbies — 52% over last 20 years
- News sentiment: Negative — late injury doubt on Saka flagged across 3 sources
- Crowd wisdom: Polymarket priced Arsenal at 61%
Model outputs:
- Claude: Arsenal 76%
- GPT: Arsenal 79%
- Gemini: Arsenal 74%
- Grok: Arsenal 72%
Convergence analysis:
- Agreement spread: 72-79% — tight cluster, 7-point range
- Signal alignment: 4/4 models cited form index and sharp odds as primary drivers
- Dissent: No material dissent. All 4 models acknowledged the Saka injury doubt but weighted it as low-impact (bench player available, Saka listed as probable starter)
Published bet slip:
- Prediction: Arsenal
- Confidence: 78%
- Signals aligned: 4/5 (sharp odds, form, base rate, crowd)
- Signals dissenting: 1/5 (news sentiment — Saka injury doubt)
Arsenal won 2-1. Saka started and played 83 minutes. The bet slip was published on-chain 47 minutes before kick-off. You can verify the timestamp. The prediction, the confidence score, and the outcome — all recorded. Nothing edited after the fact.
Why Public Accuracy Tracking Matters
Anyone can claim 74% accuracy. Claims are cheap. Eroteme publishes every prediction with a timestamp before the outcome resolves. Every outcome — correct or incorrect — feeds into a public accuracy record.
We measure accuracy with the Brier score, the standard metric for probabilistic prediction quality. Here is how it works:
For a single prediction, the Brier score = (predicted probability - actual outcome)^2. If you predict Arsenal at 78% confidence and Arsenal wins (outcome = 1), your Brier score for that prediction is (0.78 - 1)^2 = 0.048. Lower is better. A perfect prediction scores 0. A coin flip scores 0.25.
Brier scores matter because they punish overconfidence and underconfidence equally. If you predict 95% on everything, you look great on the wins and terrible on the losses. The Brier score captures both. It rewards calibration — the ability to say 70% and be right roughly 70% of the time.
Our full accuracy record, including Brier scores broken down by market category and time period, is available at /blog/eroteme-ai-accuracy-record. Every number is verifiable against the on-chain bet slips.
What Happens When the AI Gets It Wrong
The system gets predictions wrong. That is guaranteed in probabilistic forecasting. A 78% prediction means roughly 1 in 5 times, the other outcome happens. That is not a failure of the model — that is the model working correctly.
What matters is the pattern over hundreds of predictions. Does the system beat the market price? Does a 70% prediction resolve correctly about 70% of the time? Are the Brier scores improving over time?
We publish the misses alongside the hits. You can see exactly which signals led the ensemble astray and which model, if any, dissented correctly. If you want to trade against the AI's predictions, we built a guide for that: How to Bet Against AI Predictions.
No Black Boxes
Every prediction Eroteme publishes includes the confidence percentage, the signal breakdown, the model agreement spread, and any dissenting signals. Before the outcome. On-chain. Timestamped.
We do not ask you to trust the AI. We ask you to verify the record.
Ready to Bet With — or Against — the AI?
4 AI models analyse every market. One consensus prediction. Back the AI or fade it — P2P betting in USDC with no house edge.