Skip to content

ML Prediction — Implementation

Current State

Working

  • RF model training and inference across all 3 horizons (1d, 7d, 30d)
  • Models pickled to model_cache/rf_{horizon}.pkl
  • Rule-based prediction fallback when no trained model exists
  • Volatility analyzer: realized vol (5/20/60-day), GARCH(1,1) forecast, K-means regime; feeds 3 ML features
  • Buffett scorecard with vol-adjusted safety margin

Working (continued)

  • Accuracy-gated retraining: job_check_retrain() runs every 6 hours, computes rolling 30-day 1d accuracy, and retrains only if accuracy < 55% (or insufficient evaluated samples)
  • Prediction evaluation: job_evaluate_predictions() runs every 24 hours, scores elapsed predictions against realized returns and writes to PredictionResult

Missing

  • No backtesting

Key Gotchas

  • FEATURE_ORDER is order-critical (predictor.py). The vector is 24 features: 4 price/volume indicators, 4 technical indicators, 9 sentiment stats (avg/count/std × 3 windows), 3 volatility features, and 4 fundamental features (eps_growth, revenue_growth, debt_equity_ratio, has_fundamentals). Adding or removing a feature requires updating three places in sync: build_features(), FEATURE_ORDER, and the features list in train_model(). Delete stale .pkl files in model_cache/ after any feature change — the loader does not validate feature count.

  • build_features() returns None when fewer than 10 price rows exist. Every call site must handle None explicitly — it is not an error, just insufficient data.

  • get_latest_predictions() uses a subquery to get the most recent prediction per ticker. Always use this function rather than a raw ORDER BY — it handles one-row-per-ticker grouping correctly.

  • Model cold start — new tickers need 120 days of price history before RF training is possible; rule-based fallback runs until then.

  • TrendingTopic accumulates one row per scheduler run — the compute_trends job (every 30 min) appends rather than upserts. The API deduplicates on read, keeping the newest row per sector. Daily prune keeps this table bounded (30-day retention).

Adding a New ML Feature

  1. Add computation to build_features() in predictor.py
  2. Add the key to FEATURE_ORDER at the correct position
  3. Add the value in the same position to the features list in train_model()
  4. Delete model_cache/rf_*.pkl — stale pickles will silently use wrong feature counts
  5. Update docs/spec/ml-prediction.md feature vector table

Volatility Regime Warning

Shown on Stock Detail above the ML predictions section. Only displayed when K-means regime is 2 (high). Calls compute_volatility_profile() from models/volatility_analyzer.py on each page load (120-day window).

Buffett Analysis

Tab at the bottom of Stock Detail page, sourced from models/buffett.py: - Safety score adjusted: −10 pts in high-vol regime, +5 pts in low-vol regime - Vol regime badge: color-coded (green/grey/orange), includes GARCH forecast and expanding/contracting vol indicator when vol_ratio is outside the 0.85–1.15 range - Returns None when fewer than 20 price rows available; dashboard shows info message

Adding a New Sector Keyword

Add to config.py:SECTOR_KEYWORDS with both English and Korean terms.