Skip to content

Sentiment Analysis — Spec

Overview

Bilingual sentiment scoring for news articles and social posts. A unified score range is used across both languages.

Score Range

  • Range: -1.0 (very negative) to +1.0 (very positive)
  • Labels:
  • positive: score ≥ 0.15
  • negative: score ≤ -0.15
  • neutral: otherwise

English Scoring

  • Primary model: ProsusAI/finbert (FinBERT)
  • Fallback: VADER + finance lexicon
  • model_used value: "en_finbert" (primary) or "vader_finance" (fallback)

Korean Scoring

  • Primary model: snunlp/KR-FinBert-SC
  • Fallback: keyword lexicon
  • model_used value: "kr_finbert" (primary) or "ko_lexicon" (fallback)
  • Confidence is evidence-damped: abs(score) × (count / (count + 5)) — low keyword matches produce low confidence even if all matches are one-sided
  • model_used value "none" for empty/unscored text

Scoring Schedule

  • Batch scoring runs after every news fetch job (fetch_us_news, fetch_kr_news)
  • Also runs inside run_predictions before training as a safety net
  • Articles may sit unscored for up to 15–30 minutes after ingestion

Constraints

  • Articles without a SentimentScore row are left unscored until score_unscored_articles() runs
  • Features with no sentiment data default to 0.0 (neutral) in build_features()
  • model_used: "none" indicates empty text — not a scoring failure