ML Prediction — Spec¶
Model¶
- Algorithm: Random Forest
- Config: 200 trees, max_depth=8, random_state=42, n_jobs=-1
- One model per horizon, pickled to
model_cache/rf_{horizon}.pkl - Shared across all tickers — individual ticker models would have insufficient training samples
Horizons¶
Three prediction horizons:
| Horizon | Forward return thresholds |
|---|---|
1d |
>+1% = up, <-1% = down, else neutral |
7d |
same thresholds |
30d |
same thresholds |
Classes¶
up: forward return > +1%down: forward return < -1%neutral: otherwise
Minimum Data Requirements¶
- Training: 30 samples minimum, 120 days of price history
- Inference: 10 price rows minimum —
build_features()returnsNoneotherwise - Fallback:
_rule_based_prediction()inpredictor.pywhen no trained model exists
Feature Vector¶
FEATURE_ORDER in predictor.py defines the 24-feature vector. Order is critical — scikit-learn RF stores no column names.
| # | Feature | Description |
|---|---|---|
| 1 | mom_5d |
5-day price momentum (%) |
| 2 | mom_20d |
20-day price momentum (%) |
| 3 | rsi |
14-period RSI |
| 4 | vol_ratio |
Latest volume / 10-day avg volume |
| 5 | obv_trend |
On-Balance Volume trend over 10 bars (%) |
| 6 | stoch_k |
Stochastic %K over 14 periods |
| 7 | macd_hist |
MACD histogram (EMA12 − EMA26 − signal9) |
| 8 | bb_pct |
Bollinger Band %B (20-period, 2σ) |
| 9–11 | sent_avg/count/std_3d |
Sentiment stats over 3-day window |
| 12–14 | sent_avg/count/std_7d |
Sentiment stats over 7-day window |
| 15–17 | sent_avg/count/std_30d |
Sentiment stats over 30-day window |
| 18 | garch_vol |
GARCH(1,1) 1-day-ahead annualized vol (or rv20 fallback) |
| 19 | realized_vol_ratio |
rv5 / rv20 — short-term vol expanding vs. contracting |
| 20 | vol_regime |
K-means vol regime: 0=low, 1=normal, 2=high |
| 21 | eps_growth |
YoY EPS growth from Fundamentals table (0.0 if unavailable) |
| 22 | revenue_growth |
YoY revenue growth from Fundamentals table (0.0 if unavailable) |
| 23 | debt_equity_ratio |
Total debt / total equity from Fundamentals table (0.0 if unavailable) |
| 24 | has_fundamentals |
1.0 if a Fundamentals row exists for this ticker, else 0.0 |
Volatility Analysis¶
Separate models/volatility_analyzer.py computes:
- Realized volatility at 5/20/60-day windows
- GARCH(1,1) 1-day-ahead forecast
- K-means volatility regime (0=low, 1=normal, 2=high)
These feed the last 3 ML features and power the high-volatility warning on Stock Detail.
Buffett Scorecard¶
models/buffett.py computes a value-investing scorecard alongside predictions:
- Moat: stability + sentiment quality
- Growth: momentum + sentiment trend
- Safety: BB/RSI margin of safety, vol-adjusted
- Verdicts: STRONG BUY / BUY / HOLD / WATCH / AVOID based on overall (45% moat, 25% growth, 30% safety)
- Returns None when fewer than 20 price rows are available
Constraints¶
FEATURE_ORDERis order-critical — training and inference must use the same 24-feature list- Stale
.pklfiles must be deleted after any feature change build_features()returnsNone(not partial data) on insufficient data — callers must handleNone