Architecture — Spec¶

Data Flow¶

Finnhub API ─────────→ scraper/finnhub.py ──────────┐
NAVER Finance ──────→ scraper/naver_finance.py ──────┼──→ NewsArticle (DB)
YouTube Channels ───→ scraper/youtube.py ────────────┤
StockTwits API ─────→ scraper/stocktwits.py [DISABLED] ─────────┘│
                                                                │
yfinance ───────────→ data/stock_fetcher.py ──────────┐        ▼
FinanceDataReader ──→ data/stock_fetcher.py ──────────┼──→ StockPrice (DB)
                                                      │         │
                                                      │         ▼
                                   NewsArticle → sentiment/analyzer.py → SentimentScore (DB)
                                       ↓                                          │
                                Azure Translator ←───────────────────────────────┤
                                (optional: translate headlines)                  │
                                                  StockPrice + SentimentScore ──┤
                                                                                ▼
                                                               models/predictor.py → Prediction (DB)
                                                               models/trend_analyzer.py → TrendingTopic (DB)
                                                                                │
                                                                                ▼
                          api/main.py (FastAPI :8000) ← reads DB directly
                            │  all routes under /api prefix
                            │  serves React SPA from frontend/dist (production)
                                    ↓
                          React frontend (Vite :5173 dev / served from :8000 prod)
                          — TanStack Query, TanStack Table, Recharts, shadcn/ui, lucide-react, date-fns, Tailwind

M-STOCK screenshots → portfolio/parser.py (vision) → PortfolioSnapshot + PortfolioHolding (DB)
                                                                                ↑
                                                               api/routes/portfolio.py → React MyPortfolio page

Ticker (DB) ──────────────────────────────────────────────────────────────────────→ all pipeline consumers

All pipeline consumers (scrapers, price fetcher, predictor, scheduler) read the active ticker list from the Ticker table via db/tickers.py at runtime — not from config.py. Adding a ticker in the Manage Tickers UI takes effect on the next pipeline run with no restart.

Module Dependencies¶

Module	Depends on	Notes
`config.py`	`.env` via dotenv	No internal deps; holds API keys, thresholds, schedules, and seed ticker data
`db/database.py`	`config.py`	SQLAlchemy engine + session factory; seeds `Ticker` table from config on `init_db()`
`db/models.py`	`db/database.py`	ORM models; `utcnow()` defined here
`db/tickers.py`	`db/models.py`	`get_active_tickers()`, `get_ticker_map()` — single source of truth for live ticker lists
`data/stock_fetcher.py`	`db/`	US via yfinance; KR via FinanceDataReader; caller provides ticker list
`scraper/finnhub.py`	`db/`, `config.py`	Finnhub REST API; 60 req/min free tier; reads ticker list from DB
`scraper/naver_finance.py`	`db/`	BeautifulSoup scrape; reads ticker list from DB
`scraper/youtube.py`	`db/`, `config.py`	YouTube transcript fetch; channel list from `YOUTUBE_CHANNELS` in `config.py`
`scraper/stocktwits.py`	`db/`	StockTwits public stream — DISABLED (API restricted); code preserved for future rework
`scraper/translator.py`	`db/`, `config.py`	Azure Translator batch translation with DB caching
`sentiment/analyzer.py`	`db/`	EN: `ProsusAI/finbert` (VADER fallback); KO: `snunlp/KR-FinBert-SC` (keyword lexicon fallback)
`models/volatility_analyzer.py`	`db/`	Realized vol (5/20/60-day), GARCH(1,1) 1-day forecast, K-means volatility regime
`models/predictor.py`	`db/`, `config.py`, `models/volatility_analyzer.py`	RF training + inference + rule-based fallback
`models/buffett.py`	`db/`, `models/volatility_analyzer.py`	Value-investing scorecard
`models/trend_analyzer.py`	`db/`, `config.py`	Sector/keyword trend aggregation, ticker heat ranking, trending alerts
`scheduler/jobs.py`	all modules above	8 APScheduler `BackgroundScheduler` jobs
`portfolio/parser.py`	`db/models.py`, `config.py`, `google-generativeai` SDK	Gemini 2.5 Flash vision parser for Mirae Asset screenshots
`db/auth.py`	`db/models.py`	Auth helpers: bcrypt verification, `UserSession` token creation/validation
`api/main.py`	`api/auth.py`, `api/routes/*`, `db/`	FastAPI app; all routers mounted under `/api` prefix; CORS origins configurable via `CORS_ORIGINS` env var; serves React SPA from `frontend/dist` when present
`api/auth.py`	`db/auth.py`, `db/`	`/auth/login`, `/auth/me`, `/auth/logout`
`api/routes/`	domain modules	`prices`, `news`, `sentiment`, `predictions`, `trends`, `portfolio`, `tickers`, `jobs`, `admin`
`frontend/src/App.tsx`	React Router, TanStack Query, `api/auth.ts`	SPA entry point; `AuthGuard`; routes all 9 pages
`frontend/src/api/`	`client.ts`, domain modules	Thin `fetch` wrapper; domain modules call FastAPI endpoints
`frontend/src/pages/`	React components	9 pages
`main.py`	`scheduler/`, `api/`, `db/`	Init DB → seed tickers → initial load → start scheduler → launch FastAPI

Scheduler Job Interactions¶

Jobs share the same DB session factory and run independently at fixed intervals. Each job queries Ticker at runtime so newly added tickers are included automatically.

Job ID	Interval	Writes	Reads
`fetch_us_news`	15 min	`NewsArticle`, `SentimentScore`	Finnhub API
~~`fetch_stocktwits`~~	disabled	—	StockTwits public API restricted; job commented out
`fetch_kr_news`	30 min	`NewsArticle`, `SentimentScore`	NAVER HTML
`update_prices`	1 hour	`StockPrice`	yfinance / FDR
`compute_trends`	30 min	`TrendingTopic`	`NewsArticle`
`run_predictions`	6 hours	`Prediction`	`StockPrice`, `SentimentScore` (inference only)
`check_retrain`	6 hours	`rf_*.pkl`	`Prediction`, `PredictionResult` (accuracy-gated)
`fetch_youtube`	6 hours	`NewsArticle`, `SentimentScore`	YouTube transcript API
`evaluate_predictions`	24 hours	`PredictionResult`	`Prediction`, `StockPrice`
`fetch_fundamentals`	24 hours	`FundamentalData`	financialdatasets.ai API
`prune_old_data`	24 hours	deletes from all tables	all tables

News fetch jobs call score_unscored_articles() in a batched while-loop (batch_size=500) immediately after each ingestion cycle.

Key Design Decisions¶

DB-backed ticker registry — the Ticker table is the single source of truth; config.py holds seed data only; all runtime consumers read from DB via db/tickers.py
Soft delete on tickers — active=False excludes from pipeline while preserving historical data
DATABASE_URL env var selects the backend — production uses Supabase PostgreSQL; falls back to SQLite when unset (local dev only)
One RF model per horizon, shared across all tickers — individual ticker models would have insufficient training samples
Rule-based fallback — prediction pipeline works on day one before any trained model exists
URL as unique key on news_articles — prevents duplicates across repeated scrape cycles
Ticker + market discriminator pattern — always filter by both ticker and market in cross-market queries
Batch sentiment scoring — scoring runs once per prediction cycle rather than inline during ingestion
No .KS suffix — FinanceDataReader accepts bare KRX codes natively
Translation cache in DB — title_en and title_ko avoid repeated API calls
Token-based session auth — UserSession rows; FastAPI sets httponly; samesite=lax cookie on login
Retention-based pruning — prune_old_data daily; retention windows tuned so ML always has sufficient history