TRIPLE — Consensus Scoring
Meta-model that aggregates votes from all 12 sub-models using empirical hit-rate priors, tier weights, and multiplicative regime factors. v2 (priors-based) beats the original W-formula by Brier −15% / log-loss −18%.
updated 2026-04-28The setup
TRIPLE is not a trade pattern — it’s the meta-model that turns the 12 individual sub-model votes into a single bull/bear probability and a final HUD line. It is what you actually trade off when the chart paints TRIPLE BULL / TRIPLE BEAR.
The current version (v2, shipped 2026-04-28) replaced the original W-formula with an empirical priors-based scorer. The v1 formula was catastrophically under-confident at the low-probability bucket — it predicted 21% on a cohort that actually hit 48%. v2 uses calibrated priors per (symbol, direction), tier weights, and multiplicative regime factors, and it scores materially better on every calibration metric we measure.
Pipeline
- Collect votes — every sub-model active for the symbol contributes its
(direction, probability)and a tier label (1 / 2 / 3) - Apply empirical priors — fallback chain
by_sym_dir(n ≥ 30) → by_dir(n ≥ 50) → pooledso each sub-model contributes its real historical hit rate, not its self-reported probability - Tier weight — TIER1 = 3.0, TIER2 = 2.0, TIER3 = 1.0
- Regime factors — multiplicative adjustments (e.g. VWAP regime, post-11 break, regime extreme), floored at 0.1× to avoid flipping signs
- Combine — weighted log-odds aggregation across bull and bear candidate pools
- Output —
p_bull,p_bear, plus a confidence band (HIGH ≥ 0.7, MEDIUM 0.55–0.7, LOW 0.4–0.55)
v1 → v2 calibration
Measured on the 106 resolved TRIPLE W rows in the live database:
| Metric | v1 (W formula) | v2 (priors-based) | Δ |
|---|---|---|---|
| Brier score | 0.300 | 0.255 | −15% |
| Log-loss | 0.854 | 0.703 | −18% |
| Reliability (low-prob bucket) | predicted 21% / actual 48% | predicted 36% / actual 41% | much closer |
v2 is still under-confident at the low end — but the gap is now within sampling noise, not the 27-point chasm v1 had.
Implementation
The scorer lives in smc_analysis.py as compute_v2_consensus(). Core inputs cached at module load (_v2_load_priors()); fallback chain handled in _v2_model_wr(); regime adjustments in _v2_regime_factors(). v2 overrides v1’s W output silently if anything fails — that means rollback is automatic if the priors database goes missing.
Why Tier 1
It’s the final line you trade. Every other model on the site is an input to this one.
History
- 2026-04-28 — v2 shipped to production; v1 retained as silent fallback.
- 2026-04-28 — 25-rule canonicalizer wired into
track_predictions.py:add_line()to prevent future model-name drift; 4,250 historical rows migrated. - Future — re-run
backtest_consensus_v2.pyweekly as new TRIPLE W rows resolve. Re-tune priors monthly.