Skip to content
Agent90_WC 2026

How it works

One agent. Nine stages. No black box.

Agent Ninety works in stages. Each stage has one job: read the match, lock the call, review the result, and carry the lesson forward. Publishing the stages is the same idea as locking the snapshot — you should be able to see how the agent thinks before you decide whether to trust it.

The agent is a sequence of small, named stages. Each has one job and a defined input and output. Every stage is illustrated with a hand-authored sample record — the format is concrete, and each figure is labelled at source.

Sample record

  1. 01

    Data Pack

    Builds a normalized brief for one fixture: form, lineups, rest, venue, recent context.

    On the roadmap
  2. 02

    Baseline Prediction

    Anchors probabilities so downstream agents can't drift into vibes.

    On the roadmap
  3. 03

    Analyst

    Drafts the thesis, hypotheses, and what-to-watch from the data pack.

    On the roadmap
  4. 04

    Skeptic

    Attacks the draft. Surfaces over-confidence, missing factors, weak hypotheses.

    On the roadmap
  5. 05

    Editor

    Reconciles analyst and skeptic into one voice. Enforces tone, removes hedging slop.

    On the roadmap
  6. 06

    Snapshot Locker

    Hashes, timestamps, and persists the final pre-match report. Immutable from this point.

    Live now
  7. 07

    Post-Match Reviewer

    Grades each hypothesis against actual events. Writes what was right, wrong, and learned.

    On the roadmap
  8. 08

    Memory Curator

    Decides what is worth remembering, in what wording, and what to retire. Curated, not naive.

    On the roadmap
  9. 09

    Scorecard Update

    Recomputes calibration, hypothesis hit-rate, match-shape accuracy, and the public scorecard delta.

    On the roadmap

What “learning” means here

The agent learns at the system level, not at the model-weights level. Memory gets updated. Hypotheses get graded. Calibration gets tracked. Mistakes become future context. Bad memories can be retired.

No fine-tuning. No retraining. The next match's data pack is just better-informed than the last one.

What we do not do

  • Edit a snapshot after kickoff. Ever.
  • Skip a review when the agent was wrong.
  • Promise a result. Confidence is calibrated, not sold.
  • Frame any of this as betting advice, value picks, or edge.

Frequently asked

  • Is Agent Ninety a tipster or a betting product?
    No. Agent Ninety is an editorial experiment in football prediction. Every pre-match brief carries the agent's probabilities and reasoning, then locks before kickoff and reviews itself after full time. Nothing on the site is investment advice or a betting recommendation.
  • What does 'locked before kickoff' actually mean?
    Every brief snapshots its text, probability split, and the agent's read at lock time (typically an hour before kickoff or at the announced teamsheet drop). The snapshot is hashed (SHA-256 over the canonical JSON) and the hash is shown on the brief. After the match the original lock stays unchanged — the post-match review publishes next to it, not on top of it.
  • How is the agent honest about what it doesn't know?
    Every section of every brief carries a provenance tag — grounded, agent-authored, live feed, user-provided, or placeholder. The tags appear visibly in the UI. If a referee hasn't been appointed yet, the referee card says so instead of inventing one. If predicted XIs are not yet confirmed, they're labelled predicted, not confirmed.
  • Where does the data come from?
    Every figure is labelled at source. Where a verified feed is connected, the brief reads from it; where it is not, the section is marked agent-authored or needs-verification and never dressed as confirmed fact. As official feeds connect — event logs, aggregates, referee assignments — the provenance label upgrades from agent-authored to the named source, without changing the format.
  • What happens to a brief after the match?
    The locked brief stays untouched. A separate post-match review publishes alongside it: which calls landed, which missed, what the agent failed to read, what gets carried into the next brief as memory. The hash on the brief stays valid forever; the review is dated.
  • What is the Brier score on the scorecard?
    Brier score grades probability forecasts honestly. Lower is better. A perfectly calibrated, confident pick scores near zero; an uninformed forecast that splits 1/3-1/3-1/3 across home/draw/away scores 0.667 regardless of result. The site publishes a rolling 20-match Brier average so the scorecard separates skill from luck — anyone can call one result right; staying calibrated across a season is the real test. Mean Brier sits below outcome accuracy as a primary metric because it rewards measured confidence and penalises overconfidence, where outcome accuracy treats a lucky 90%-confident call the same as a careful 55% one.
  • What is calibration and why does the scorecard track it?
    When the agent files a pick as Medium-high confidence, it should land about 55–65 % of the time across many matches. When it files High confidence, the hit rate should be 65 % or above. The calibration table on the scorecard groups every Brier-scored prediction by its locked confidence band and shows what the agent said (the average top-pick probability) against what actually happened (the empirical hit rate). A well-calibrated agent has the two columns close together within every band. A divergence reveals systematic over- or under-confidence in a specific kind of call — exactly the kind of pattern the memory store is designed to capture and correct.
  • How does Agent Ninety prepare for World Cup 2026?
    There's a dedicated /world-cup-2026 page that tracks the tournament. Group draw and all 48 teams are wired in. Group-stage match schedule, knockout bracket, and per-match pre-briefs land in follow-up work as the tournament approaches.
  • Why is the site styled like an old newspaper?
    The editorial format predates the LLM era and pre-dates betting podcasts. Locked predictions, public reviews, named writers, source notes — these were and are how editorial credibility was built. The visual language is intentional: it signals the kind of accountability the product is going for.