model

TRIPLE — Consensus Scoring

Meta-model that aggregates votes from all 12 sub-models using empirical hit-rate priors, tier weights, and multiplicative regime factors. v2 (priors-based) beats the original W-formula by Brier −15% / log-loss −18%.

updated 2026-04-28
tier 1 both meta consensusmetatriplescoring
TRIPLE Consensus — pipeline overview

The setup

TRIPLE is not a trade pattern — it’s the meta-model that turns the 12 individual sub-model votes into a single bull/bear probability and a final HUD line. It is what you actually trade off when the chart paints TRIPLE BULL / TRIPLE BEAR.

The current version (v2, shipped 2026-04-28) replaced the original W-formula with an empirical priors-based scorer. The v1 formula was catastrophically under-confident at the low-probability bucket — it predicted 21% on a cohort that actually hit 48%. v2 uses calibrated priors per (symbol, direction), tier weights, and multiplicative regime factors, and it scores materially better on every calibration metric we measure.

Pipeline

  1. Collect votes — every sub-model active for the symbol contributes its (direction, probability) and a tier label (1 / 2 / 3)
  2. Apply empirical priors — fallback chain by_sym_dir(n ≥ 30) → by_dir(n ≥ 50) → pooled so each sub-model contributes its real historical hit rate, not its self-reported probability
  3. Tier weight — TIER1 = 3.0, TIER2 = 2.0, TIER3 = 1.0
  4. Regime factors — multiplicative adjustments (e.g. VWAP regime, post-11 break, regime extreme), floored at 0.1× to avoid flipping signs
  5. Combine — weighted log-odds aggregation across bull and bear candidate pools
  6. Outputp_bull, p_bear, plus a confidence band (HIGH ≥ 0.7, MEDIUM 0.55–0.7, LOW 0.4–0.55)

v1 → v2 calibration

Measured on the 106 resolved TRIPLE W rows in the live database:

Metricv1 (W formula)v2 (priors-based)Δ
Brier score0.3000.255−15%
Log-loss0.8540.703−18%
Reliability (low-prob bucket)predicted 21% / actual 48%predicted 36% / actual 41%much closer

v2 is still under-confident at the low end — but the gap is now within sampling noise, not the 27-point chasm v1 had.

Implementation

The scorer lives in smc_analysis.py as compute_v2_consensus(). Core inputs cached at module load (_v2_load_priors()); fallback chain handled in _v2_model_wr(); regime adjustments in _v2_regime_factors(). v2 overrides v1’s W output silently if anything fails — that means rollback is automatic if the priors database goes missing.

Why Tier 1

It’s the final line you trade. Every other model on the site is an input to this one.

History