Skip to content

ML Prediction — Spec

Model

  • Algorithm: Random Forest
  • Config: 200 trees, max_depth=8, random_state=42, n_jobs=-1
  • One model per horizon, pickled to model_cache/rf_{horizon}.pkl
  • Shared across all tickers — individual ticker models would have insufficient training samples

Horizons

Three prediction horizons:

Horizon Forward return thresholds
1d >+1% = up, <-1% = down, else neutral
7d same thresholds
30d same thresholds

Classes

  • up: forward return > +1%
  • down: forward return < -1%
  • neutral: otherwise

Minimum Data Requirements

  • Training: 30 samples minimum, 120 days of price history
  • Inference: 10 price rows minimum — build_features() returns None otherwise
  • Fallback: _rule_based_prediction() in predictor.py when no trained model exists

Feature Vector

FEATURE_ORDER in predictor.py defines the 24-feature vector. Order is critical — scikit-learn RF stores no column names.

# Feature Description
1 mom_5d 5-day price momentum (%)
2 mom_20d 20-day price momentum (%)
3 rsi 14-period RSI
4 vol_ratio Latest volume / 10-day avg volume
5 obv_trend On-Balance Volume trend over 10 bars (%)
6 stoch_k Stochastic %K over 14 periods
7 macd_hist MACD histogram (EMA12 − EMA26 − signal9)
8 bb_pct Bollinger Band %B (20-period, 2σ)
9–11 sent_avg/count/std_3d Sentiment stats over 3-day window
12–14 sent_avg/count/std_7d Sentiment stats over 7-day window
15–17 sent_avg/count/std_30d Sentiment stats over 30-day window
18 garch_vol GARCH(1,1) 1-day-ahead annualized vol (or rv20 fallback)
19 realized_vol_ratio rv5 / rv20 — short-term vol expanding vs. contracting
20 vol_regime K-means vol regime: 0=low, 1=normal, 2=high
21 eps_growth YoY EPS growth from Fundamentals table (0.0 if unavailable)
22 revenue_growth YoY revenue growth from Fundamentals table (0.0 if unavailable)
23 debt_equity_ratio Total debt / total equity from Fundamentals table (0.0 if unavailable)
24 has_fundamentals 1.0 if a Fundamentals row exists for this ticker, else 0.0

Volatility Analysis

Separate models/volatility_analyzer.py computes: - Realized volatility at 5/20/60-day windows - GARCH(1,1) 1-day-ahead forecast - K-means volatility regime (0=low, 1=normal, 2=high)

These feed the last 3 ML features and power the high-volatility warning on Stock Detail.

Buffett Scorecard

models/buffett.py computes a value-investing scorecard alongside predictions: - Moat: stability + sentiment quality - Growth: momentum + sentiment trend - Safety: BB/RSI margin of safety, vol-adjusted - Verdicts: STRONG BUY / BUY / HOLD / WATCH / AVOID based on overall (45% moat, 25% growth, 30% safety) - Returns None when fewer than 20 price rows are available

Constraints

  • FEATURE_ORDER is order-critical — training and inference must use the same 24-feature list
  • Stale .pkl files must be deleted after any feature change
  • build_features() returns None (not partial data) on insufficient data — callers must handle None