Sentiment Analysis — Spec¶
Overview¶
Bilingual sentiment scoring for news articles and social posts. A unified score range is used across both languages.
Score Range¶
- Range: -1.0 (very negative) to +1.0 (very positive)
- Labels:
positive: score ≥ 0.15negative: score ≤ -0.15neutral: otherwise
English Scoring¶
- Primary model:
ProsusAI/finbert(FinBERT) - Fallback: VADER + finance lexicon
model_usedvalue:"en_finbert"(primary) or"vader_finance"(fallback)
Korean Scoring¶
- Primary model:
snunlp/KR-FinBert-SC - Fallback: keyword lexicon
model_usedvalue:"kr_finbert"(primary) or"ko_lexicon"(fallback)- Confidence is evidence-damped:
abs(score) × (count / (count + 5))— low keyword matches produce low confidence even if all matches are one-sided model_usedvalue"none"for empty/unscored text
Scoring Schedule¶
- Batch scoring runs after every news fetch job (
fetch_us_news,fetch_kr_news) - Also runs inside
run_predictionsbefore training as a safety net - Articles may sit unscored for up to 15–30 minutes after ingestion
Constraints¶
- Articles without a
SentimentScorerow are left unscored untilscore_unscored_articles()runs - Features with no sentiment data default to
0.0(neutral) inbuild_features() model_used: "none"indicates empty text — not a scoring failure