model

TRIPLE — Consensus Scoring

Meta-model that aggregates votes from all 12 sub-models using empirical hit-rate priors, tier weights, and multiplicative regime factors. v2 (priors-based) beats the original W-formula by Brier −15% / log-loss −18%.

updated 2026-04-28

tier 1 both meta consensusmetatriplescoring

Available on MNQ MES MYM MCL MGC SI MBT

The setup

TRIPLE is not a trade pattern — it’s the meta-model that turns the 12 individual sub-model votes into a single bull/bear probability and a final HUD line. It is what you actually trade off when the chart paints TRIPLE BULL / TRIPLE BEAR.

The current version (v2, shipped 2026-04-28) replaced the original W-formula with an empirical priors-based scorer. The v1 formula was catastrophically under-confident at the low-probability bucket — it predicted 21% on a cohort that actually hit 48%. v2 uses calibrated priors per (symbol, direction), tier weights, and multiplicative regime factors, and it scores materially better on every calibration metric we measure.

Pipeline

Collect votes — every sub-model active for the symbol contributes its (direction, probability) and a tier label (1 / 2 / 3)
Apply empirical priors — fallback chain by_sym_dir(n ≥ 30) → by_dir(n ≥ 50) → pooled so each sub-model contributes its real historical hit rate, not its self-reported probability
Tier weight — TIER1 = 3.0, TIER2 = 2.0, TIER3 = 1.0
Regime factors — multiplicative adjustments (e.g. VWAP regime, post-11 break, regime extreme), floored at 0.1× to avoid flipping signs
Combine — weighted log-odds aggregation across bull and bear candidate pools
Output — p_bull, p_bear, plus a confidence band (HIGH ≥ 0.7, MEDIUM 0.55–0.7, LOW 0.4–0.55)

v1 → v2 calibration

Measured on the 106 resolved TRIPLE W rows in the live database:

Metric	v1 (W formula)	v2 (priors-based)	Δ
Brier score	0.300	0.255	−15%
Log-loss	0.854	0.703	−18%
Reliability (low-prob bucket)	predicted 21% / actual 48%	predicted 36% / actual 41%	much closer

v2 is still under-confident at the low end — but the gap is now within sampling noise, not the 27-point chasm v1 had.

Implementation

The scorer lives in smc_analysis.py as compute_v2_consensus(). Core inputs cached at module load (_v2_load_priors()); fallback chain handled in _v2_model_wr(); regime adjustments in _v2_regime_factors(). v2 overrides v1’s W output silently if anything fails — that means rollback is automatic if the priors database goes missing.

Why Tier 1

It’s the final line you trade. Every other model on the site is an input to this one.

History

2026-04-28 — v2 shipped to production; v1 retained as silent fallback.
2026-04-28 — 25-rule canonicalizer wired into track_predictions.py:add_line() to prevent future model-name drift; 4,250 historical rows migrated.
Future — re-run backtest_consensus_v2.py weekly as new TRIPLE W rows resolve. Re-tune priors monthly.