Skip to content

Sentiment Analysis — Implementation

How It Works

sentiment/analyzer.py provides: - analyze_english_batch() — batched FinBERT inference (VADER fallback if model not loaded) - analyze_korean_batch() — batched KR-FinBERT inference (keyword lexicon fallback) - score_unscored_articles() — finds all NewsArticle rows without a SentimentScore, scores them in batches of 500

Confidence for both models is max(p_pos, p_neg, p_neu) from the softmax distribution.
KO keyword-lexicon fallback dampens by evidence count: abs(score) × (count / (count + 5)).

Scoring Pipeline

score_unscored_articles() runs: 1. After fetch_us_news (every 15 min) 2. After fetch_kr_news (every 30 min) 3. Inside run_predictions before training (safety net)

Note: fetch_stocktwits was a fourth trigger but is currently disabled — see docs/spec/radar.md.

Articles are typically scored within 15–30 minutes of ingestion. There is no real-time path.

Gotchas

  • model_used: "none" indicates empty/unscored text — not a failure. Articles with no text produce a neutral score with model_used="none".
  • Batch size 500 in the while-loop in score_unscored_articles() — if the unscored backlog is large (e.g. after a restart), this runs multiple iterations until cleared.
  • Missing sentiment defaults to 0.0 in build_features() — articles that haven't been scored yet don't block inference; they just contribute a neutral signal.

How to Extend

Add a new sentiment model

  1. Load the model in analyzer.py (guard with try/except so the fallback still works if the model isn't installed)
  2. Add the scoring logic to the appropriate analyze_*_batch() function
  3. Update the model_used string constant
  4. Update docs/spec/sentiment-analysis.md