NBA data intake & prop modeling — project report
This report summarizes our current build state, what we have proven in the truth layer, and the remaining decision point (odds/props markets) before we can compute actionable “edges” for NBA betting.
Latest release
- Truth layer is now season-to-date capable: we backfilled 387 games across 55 dates with 0 missing box scores and 0 missing play-by-play in our coverage reports.
- Idempotent ingest + resume verified: re-running the same date ranges does not duplicate rows and a
--resumerun can safely pick up where a prior run left off. - We are now ready for market ingestion: the next gating decision is selecting a provider for odds + player props (and, ideally, historical line movement / closing lines).
Where we are in the process
Phase 6 (Market + availability ingestion). Last gate passed: Gate 4C — Observability + coverage reporting. Next gate: Gate 4A — Idempotent snapshots. Active ritualset reference: sportsbetting_ritualset_v2_activated_v1.
In plain terms: schedule/box/PBP data capture is stable at scale. The next work is adding market “truth” (lines, prices, books, timestamps) so we can measure EV/CLV and move from “projections” to “bet decisions.”
Fast map of what we built
- truth_poc.py pulls (a) schedule, (b) box score, and (c) play-by-play, and writes to a local SQLite DB.
- coverage_report.py reads the DB and outputs coverage metrics (games seen, missing box, missing PBP), per date and overall.
- Run logs + primary keys enforce re-run safety (idempotency) and support long backfills via
--resume.
Pending decision point
To compute edges for FanDuel-style markets, we still need stable access to:
- Game lines: moneyline, spread, total (and ideally alternates)
- Player props: points, assists, rebounds, threes, combos (PRA, etc.)
- Historical snapshots: open/close, line movement, book-by-book pricing (for CLV + backtesting)
Past releases
Release 2025-12-16 (bundle v2) — expand
NBA data intake & prop modeling — project report
A readable surface for our NBA sports betting build: what we’re collecting, what’s working today, and how we’ll turn it into a prop-evaluation engine.
What this report is
This is a lightweight, developer-friendly project surface for our NBA sports betting work. It’s written for two audiences at once:
- Beginners: what these betting terms mean, what data we’re collecting, and what we’re trying to achieve.
- Developers (Client): how the pipeline is shaped, what tables exist, and how we’ll turn data into repeatable prop evaluation.
We’re intentionally starting with a small “truth-layer” fetcher that proves we can collect stable game/player/event data and store it in a canonical format. Once that foundation is stable, we scale collection and add market/odds snapshots.
Beginner primer: what is a “player prop”?
A player prop is a bet on a player’s stat line rather than the final score. Examples:
- Points: “Player X over 24.5 points”
- Assists: “Player Y at least 8 assists”
- Rebounds: “Player Z under 10.5 rebounds”
Sportsbooks publish a line (e.g., 24.5 points) and an odds price (how much you win relative to your stake). Our job is to estimate the player’s distribution of outcomes and decide if the offered line/price implies a probability that’s worse than reality.
Where we are today
Already working
- Fetches a date’s NBA games (schedule/game IDs)
- Fetches traditional box score player lines per game
- Optionally fetches play-by-play (PBP) event logs
- Writes results into a local SQLite database with canonical tables
Actively hardening
- PBP idempotency (safe re-runs without UNIQUE constraint errors)
- Stable identifiers across multiple upstream sources
- Versioned “contract manifests” so core functions don’t drift silently
How this becomes betting advice
We are building toward a repeatable loop:
- Collect truth data: game context, player minutes/production, and event detail (optional).
- Add market context: sportsbook lines/odds for each prop market, captured over time.
- Model outcomes: project minutes + per-minute rates + adjustments (pace, role, opponent, rest) to produce an outcome distribution.
- Compare to the line: convert odds to implied probability; compute expected value (EV).
- Track quality: measure closing line value (CLV) and ROI over a meaningful sample.
Early on, we’ll use simple, interpretable models and gradually add complexity only when it improves performance and stability.
Glossary (quick)
- Line
- The threshold (e.g., 24.5 points) the bet is evaluated against.
- Odds price
- The payout format (often American odds like -110 / +120). Converts to implied probability.
- Implied probability
- The probability the sportsbook is charging for (after their margin).
- Edge
- Our estimated probability minus implied probability.
- EV (expected value)
- Average expected profit per bet if our probability estimate is correct.
- CLV
- Whether we beat the closing line/price (often a better long-run signal than short-run wins).
Important note
This project is about building a rigorous, testable approach to prop evaluation. Nothing here is a guarantee of profit; variance is real, and any strategy must be tested over enough volume to matter.