ML Prediction — Spec¶

Model¶

Algorithm: Random Forest
One model trained per horizon (1d / 7d / 30d)
Shared across all tickers — individual ticker models would have insufficient training samples

Horizons¶

Three prediction horizons:

Horizon	Forward return thresholds
`1d`	>+1% = up, <-1% = down, else neutral
`7d`	same thresholds
`30d`	same thresholds

Classes¶

up: forward return > +1%
down: forward return < -1%
neutral: otherwise

Minimum Data Requirements¶

Training: 30 samples minimum, 120 days of price history
Inference: 10 price rows minimum — build_features() returns None otherwise
Fallback: _rule_based_prediction() in predictor.py when no trained model exists

Feature Vector¶

FEATURE_ORDER in predictor.py defines the 29-feature vector. Order is critical — scikit-learn RF stores no column names.

#	Feature	Description
1	`mom_5d`	5-day price momentum (%)
2	`mom_20d`	20-day price momentum (%)
3	`rsi`	14-period RSI
4	`vol_ratio`	Latest volume / 10-day avg volume
5	`obv_trend`	On-Balance Volume trend over 10 bars (%)
6	`stoch_k`	Stochastic %K over 14 periods
7	`macd_hist`	MACD histogram (EMA12 − EMA26 − signal9)
8	`bb_pct`	Bollinger Band %B (20-period, 2σ)
9–11	`sent_avg/count/std_3d`	Sentiment stats over 3-day window
12–14	`sent_avg/count/std_7d`	Sentiment stats over 7-day window
15–17	`sent_avg/count/std_30d`	Sentiment stats over 30-day window
18	`rv20`	20-day annualized realized volatility (consistent between training and inference)
19	`realized_vol_ratio`	rv5 / rv20 — short-term vol expanding vs. contracting
20	`vol_regime`	K-means vol regime: 0=low, 1=normal, 2=high
21	`eps_growth`	YoY EPS growth from `Fundamentals` table (0.0 if unavailable)
22	`revenue_growth`	YoY revenue growth from `Fundamentals` table (0.0 if unavailable)
23	`debt_equity_ratio`	Total debt / total equity from `Fundamentals` table (0.0 if unavailable)
24	`has_fundamentals`	1.0 if a `Fundamentals` row exists for this ticker, else 0.0
25	`vix_level`	CBOE VIX / 50, clipped 0–1 (US only; KR defaults to 0.3 ≈ VIX 15)
26	`yield_spread_norm`	(10Y − 3M Treasury yield) / 3, clipped −1 to 1 (US only; KR defaults 0.0)
27	`eps_surprise_pct`	Most recent earnings surprise: (eps_actual − eps_estimate) / \|eps_estimate\|, clipped −2 to 2. Sourced from `EarningsEvent` table. 0.0 for KR tickers or when no reported earnings exist. Training uses bisect to enforce look-ahead safety (only events ≤ training bar date).
28	`days_to_earnings`	Calendar days until the next scheduled earnings release, clipped to [0, 90]. Earnings announcement dates are public information so no look-ahead concern. Defaults to 90 for KR tickers or when no future event is scheduled. Same-day earnings (`earnings_date >= now`) correctly returns 0.
29	`pre_earnings_flag`	1.0 if `days_to_earnings ≤ 14`, else 0.0. Captures the earnings anticipation window (IV crush, heightened volatility) that precedes a release.

Volatility Analysis¶

Separate models/volatility_analyzer.py computes: - Realized volatility at 5/20/60-day windows - GARCH(1,1) 1-day-ahead forecast - K-means volatility regime (0=low, 1=normal, 2=high)

These feed the last 3 ML features and power the high-volatility warning on Stock Detail.

SHAP Explainability¶

explain_prediction(ticker, market, horizon, db) in predictor.py returns per-feature SHAP contributions for the current prediction. Requires shap>=0.45.0 (in requirements.txt).

Uses shap.TreeExplainer — zero retraining, works directly against pickled RF models
Returns contributions for the predicted class only (not all classes)
Contributions sorted by |shap| descending; top 15 features returned
FEATURE_LABELS dict maps each FEATURE_ORDER key to a human-readable label
API: GET /predictions/{ticker}/explain?horizon=1d — requires auth; returns 503 if model unavailable
Frontend: "Why?" toggle button per horizon on the Stock Detail ML Prediction card; expands an inline horizontal bar chart (green = pushes toward predicted direction, red = opposing)

Buffett Scorecard¶

models/buffett.py computes a value-investing scorecard alongside predictions: - Moat: stability + sentiment quality - Growth: momentum + sentiment trend - Safety: BB/RSI margin of safety, vol-adjusted - Verdicts: STRONG BUY / BUY / HOLD / WATCH / AVOID based on overall (45% moat, 25% growth, 30% safety) - Returns None when fewer than 20 price rows are available

Constraints¶

FEATURE_ORDER is order-critical — training and inference must use the same 29-feature list
Stale .pkl files must be deleted after any feature change
build_features() returns None (not partial data) on insufficient data — callers must handle None
shap is a required dependency for the explain endpoint; the endpoint returns 503 gracefully if not installed