Echo Angel Studio Sports Betting NBA data intake · truth_poc.py
Sports Betting — Cloudflare Report

Fetcher deep dive — truth_poc.py

The script that captures NBA schedule, box scores, and play-by-play into a canonical SQLite “truth” database.

Current state

Release 2025-12-18 (CT) Record sportsbetting_Full_Record_v38 → v39 Phase Phase 6 (Market + availability ingestion)
  • PBP source: nba_api PlayByPlayV3 (tested import + sample frame).
  • Box score: BoxScoreTraditionalV3 (V2 may still exist for fallback, but V3 is the forward path).
  • Backfill support: --start-date, --end-date, and --resume for long runs.
  • Idempotency: primary keys ensure safe re-run; inserts use “ignore on conflict” semantics.

Key CLI args

  • --date YYYY-MM-DD (single day) or --start-date/--end-date (range)
  • --max-games N limits games per day (useful while testing)
  • --pbp-mode nba_api (current stable) and --sleep seconds between calls
  • --db path.sqlite output database file
  • --resume skips games already captured in earlier runs for the same DB

What “idempotent” means here

If you run the same date range twice, the second run should not add duplicates. We enforce this with table primary keys:

  • canonical_schedule_rest: (captured_time_utc, game_id, source)
  • canonical_box_score: (captured_time_utc, game_id, team_id, player_id, source)
  • canonical_pbp: (captured_time_utc, game_id, event_num, source)

Practical impact: you can interrupt, resume, or rerun after patching without corrupting the DB.

Past releases

Release 2025-12-16 (bundle v2) — expand
Sports Betting — Cloudflare Report

Fetcher deep dive — truth_poc.py

Exactly what the current script does, what it writes, and what’s coming next to connect data → props → EV.

Fetcher overview (truth_poc.py)

This script is our proof-of-collection. It makes small, polite requests, normalizes fields, and writes a durable SQLite snapshot.

Design goals:

  • Auditable: small surface area, clear outputs.
  • Joinable: stable IDs so we can attach odds/props later.
  • Versionable: core functions are contract-hashed so changes are explicit.

Data points: what we can compute immediately

From schedule + box score alone we can produce:

  • Minutes trend (season vs last 5)
  • Per-minute production rates (PTS/min, AST/min, REB/min)
  • Basic volatility estimates (game-to-game variance)

With PBP (optional), we can later derive:

  • Event-based pace proxies and possession timing
  • Foul trouble and substitution timing features
  • Shot profile context (if action subtypes are present)

Known issues and hardening

  • PBP uniqueness / reruns: if you rerun the same game into the same DB with the same capture timestamp, you can hit UNIQUE constraint conflicts. Our hardening patches focus on idempotency and safer event IDs.
  • Endpoint churn: upstream NBA endpoints deprecate versions over time; we’re moving to v3 endpoints when possible and keeping raw_json for schema resilience.
  • Scaling: for multi-season backfills we’ll add chunking, retries with jitter, and a resume ledger so interrupted runs continue safely.

Next: odds & props ingestion

To advise bets, we must ingest the sportsbook side too:

  • Prop markets (points, assists, rebounds, 3PM, combos, etc.)
  • Line (e.g., 24.5 points)
  • Odds price (e.g., -110)
  • Timestamped snapshots so we can measure movement and closing line value (CLV)

We prefer licensed/normalized market data feeds when possible; scraping is a fallback and requires extra stability work.