Sentiment Analysis — Implementation¶
How It Works¶
sentiment/analyzer.py provides:
- analyze_english_batch() — batched FinBERT inference (VADER fallback if model not loaded)
- analyze_korean_batch() — batched KR-FinBERT inference (keyword lexicon fallback)
- score_unscored_articles() — finds all NewsArticle rows without a SentimentScore, scores them in batches of 500
Confidence for both models is max(p_pos, p_neg, p_neu) from the softmax distribution.
KO keyword-lexicon fallback dampens by evidence count: abs(score) × (count / (count + 5)).
Scoring Pipeline¶
score_unscored_articles() runs:
1. After fetch_us_news (every 15 min)
2. After fetch_kr_news (every 30 min)
3. Inside run_predictions before training (safety net)
Note: fetch_stocktwits was a fourth trigger but is currently disabled — see docs/spec/radar.md.
Articles are typically scored within 15–30 minutes of ingestion. There is no real-time path.
Gotchas¶
model_used: "none"indicates empty/unscored text — not a failure. Articles with no text produce a neutral score withmodel_used="none".- Batch size 500 in the while-loop in
score_unscored_articles()— if the unscored backlog is large (e.g. after a restart), this runs multiple iterations until cleared. - Missing sentiment defaults to
0.0inbuild_features()— articles that haven't been scored yet don't block inference; they just contribute a neutral signal.
How to Extend¶
Add a new sentiment model¶
- Load the model in
analyzer.py(guard with try/except so the fallback still works if the model isn't installed) - Add the scoring logic to the appropriate
analyze_*_batch()function - Update the
model_usedstring constant - Update
docs/spec/sentiment-analysis.md