ML Prediction — Implementation¶
Current State¶
Working¶
- RF model training and inference across all 3 horizons (
1d,7d,30d) - Models pickled to
model_cache/rf_{horizon}.pkl - Rule-based prediction fallback when no trained model exists
- Volatility analyzer: realized vol (5/20/60-day), GARCH(1,1) forecast, K-means regime; feeds 3 ML features
- Buffett scorecard with vol-adjusted safety margin
Working (continued)¶
- Accuracy-gated retraining:
job_check_retrain()runs every 6 hours, computes rolling 30-day 1d accuracy, and retrains only if accuracy < 55% (or insufficient evaluated samples) - Prediction evaluation:
job_evaluate_predictions()runs every 24 hours, scores elapsed predictions against realized returns and writes toPredictionResult
Missing¶
- No backtesting
Key Gotchas¶
-
FEATURE_ORDERis order-critical (predictor.py). The vector is 24 features: 4 price/volume indicators, 4 technical indicators, 9 sentiment stats (avg/count/std × 3 windows), 3 volatility features, and 4 fundamental features (eps_growth, revenue_growth, debt_equity_ratio, has_fundamentals). Adding or removing a feature requires updating three places in sync:build_features(),FEATURE_ORDER, and thefeatureslist intrain_model(). Delete stale.pklfiles inmodel_cache/after any feature change — the loader does not validate feature count. -
build_features()returnsNonewhen fewer than 10 price rows exist. Every call site must handleNoneexplicitly — it is not an error, just insufficient data. -
get_latest_predictions()uses a subquery to get the most recent prediction per ticker. Always use this function rather than a raw ORDER BY — it handles one-row-per-ticker grouping correctly. -
Model cold start — new tickers need 120 days of price history before RF training is possible; rule-based fallback runs until then.
-
TrendingTopicaccumulates one row per scheduler run — thecompute_trendsjob (every 30 min) appends rather than upserts. The API deduplicates on read, keeping the newest row per sector. Daily prune keeps this table bounded (30-day retention).
Adding a New ML Feature¶
- Add computation to
build_features()inpredictor.py - Add the key to
FEATURE_ORDERat the correct position - Add the value in the same position to the
featureslist intrain_model() - Delete
model_cache/rf_*.pkl— stale pickles will silently use wrong feature counts - Update
docs/spec/ml-prediction.mdfeature vector table
Volatility Regime Warning¶
Shown on Stock Detail above the ML predictions section. Only displayed when K-means regime is 2 (high). Calls compute_volatility_profile() from models/volatility_analyzer.py on each page load (120-day window).
Buffett Analysis¶
Tab at the bottom of Stock Detail page, sourced from models/buffett.py:
- Safety score adjusted: −10 pts in high-vol regime, +5 pts in low-vol regime
- Vol regime badge: color-coded (green/grey/orange), includes GARCH forecast and expanding/contracting vol indicator when vol_ratio is outside the 0.85–1.15 range
- Returns None when fewer than 20 price rows available; dashboard shows info message
Adding a New Sector Keyword¶
Add to config.py:SECTOR_KEYWORDS with both English and Korean terms.