Skip to content

Data Retention — Implementation

The prune_old_data scheduler job runs once every 24 hours and enforces these retention windows:

Table Retention Reason
news_articles + sentiment_scores 90 days Sentiment features look back at most 30 days
stock_prices 180 days RF training requires 120 days minimum; 180 gives buffer
predictions 90 days Historical predictions beyond ~3 months have no dashboard use
trending_topics 30 days Trend data is only meaningful at short time horizons

Gotchas

  • sentiment_scores must be deleted before news_articles — there is no ondelete=CASCADE on the FK, so deleting articles first would leave orphaned sentiment rows and violate the constraint
  • Retention windows are defined in config.py as RETENTION_*_DAYS constants — adjust there, not in the job code
  • TrendingTopic accumulates one row per compute_trends run (every 30 min) — the 30-day retention is critical to prevent unbounded growth