Data Retention — Implementation¶
The prune_old_data scheduler job runs once every 24 hours and enforces these retention windows:
| Table | Retention | Reason |
|---|---|---|
news_articles + sentiment_scores |
90 days | Sentiment features look back at most 30 days |
stock_prices |
180 days | RF training requires 120 days minimum; 180 gives buffer |
predictions |
90 days | Historical predictions beyond ~3 months have no dashboard use |
trending_topics |
30 days | Trend data is only meaningful at short time horizons |
Gotchas¶
sentiment_scoresmust be deleted beforenews_articles— there is noondelete=CASCADEon the FK, so deleting articles first would leave orphaned sentiment rows and violate the constraint- Retention windows are defined in
config.pyasRETENTION_*_DAYSconstants — adjust there, not in the job code TrendingTopicaccumulates one row percompute_trendsrun (every 30 min) — the 30-day retention is critical to prevent unbounded growth