Touch rate ≠ close direction — the audit that demoted three Tier 1s in 24 hours
Three flagship signals fired at 80%+ for years. An audit against the right metric — close-direction at RTH, not excursion-touch — demoted two to Tier 2 magnets and killed a third entirely. The methodology bug, the fixes, and the rule of thumb.
updated 2026-04-21The question the old backtests were answering
Every flagship on this board used to ship with probabilities from a backtest that looked roughly like:
hit = (df.high >= target) | (df.max_favorable_excursion >= target)
That tells you: did price ever touch the target during the window?
That is a touch rate / magnet rate. It is not “does the session close in this direction.” When you draw a line on a chart labeled AMD BEAR 92%, a trader reading it assumes the 92% is a directional conviction. In every one of these three models, it wasn’t.
Three models audited, three different failure modes
Over 24 hours we re-ran the audit against a proper close-direction metric (close at RTH 16:00 vs entry, baseline BULL 53.5% / BEAR 46.5%):
Model 14 — AMD
Touch rate 88–92% ✓. Close direction: 47–53%, pure noise. Demoted Tier 1 → Tier 2, but the touch rate is real so we kept it as a magnet. Re-audit later recovered a proper 65% touch-at-120-min number with the right conditioning — see AMD.
Model 9 — LON SWEEP H / L
Touch rate 77–88% ✓. Close direction:
| Cohort | Touch % | Close % | Edge |
|---|---|---|---|
| LON SWEEP H, bull_signals ≥ 3 | 90.5% | 54.9% BULL | +1.4% |
| LON SWEEP L, bear_signals ≥ 3 | 86.4% | 50.0% BEAR | +3.5% |
+1.4pt of edge dressed up as 90%. Demoted Tier 1 → Tier 2 magnet. Probabilities kept as valid touch rates.
Model 9 — LON REV MID
This one was worse. The model fired a reversion line at 58–61% after a sweep, betting price would revert to session midpoint and the day would close opposite the sweep.
| Cohort | n | Touch mid | Close in signal direction |
|---|---|---|---|
| Swept H → LON REV MID BEAR @ 58% | 926 | 82.4% | 30.2% BEAR (−16pt anti-predictive) |
| Swept L → LON REV MID BULL @ 61% | 869 | 85.5% | 35.3% BULL (−18pt anti-predictive) |
The signal was actively betting the wrong way. Sweeps predict continuation, not reversion. Deleted from both .cs files.
Model 13 — PWH/PWL RETEST
The largest single bug found.
| Cohort | Current prob | Reality | Delta |
|---|---|---|---|
| BULL RETEST, RTH | 62% | 55.4% | −7pt |
| BEAR RETEST, RTH | 72% | 40.9% | −31pt, ANTI-predictive |
| BULL RETEST, overnight | 56% | 35% | no edge |
| BEAR RETEST, overnight | 64% | 28–41% | no edge |
BEAR RTH RETEST at 40.9% means 59% of signals reverse. Three of four cohorts were disabled. Only BULL RTH survives at 55%.
The BREAK state (price closes past PWH/L without retest) was verified clean at 99.5–100% touch and stayed Tier 1.
Running tally after 24 hours
| Model | Status | Prob change |
|---|---|---|
| 14 AMD | Tier 1 → 2 magnet | kept 88–92% as touch |
| 9 LON SWEEP H/L | Tier 1 → 2 magnet | kept 77–88% as touch |
| 9 LON REV MID | Deleted | 58/61% was anti-predictive |
| 13 PWH/L RETEST | 3 of 4 cohorts disabled | only BULL RTH 55% survives |
| 10 OB MID MAGNET | Recalibrated | 92 → 70 |
| 10 OB BULL/BEAR | Validated ✓ | kept 82–83% (real +18–21pt close-dir edge) |
| 10 OB STRETCH B | Validated ✓ | 64–66% touch |
The OB family was the positive surprise: its 10:30 positioning rule filters for days with genuine directional bias on top of the touch rate. 18–21pt of real close-direction edge above baseline.
Why this happened
Three reinforcing problems:
df.max_favorable_excursion >= targetis a deceptively intuitive primitive. It answers “did price get there” not “did the session resolve there.”- Live visualizations reward touch rates. A chart line that gets tagged 90% of the time feels predictive even when the close direction is a coin flip.
- Nobody re-audits a flagship. Once AMD shipped at 92%, nothing forced a second look until we built the close-direction script as part of a different investigation.
The rule of thumb
Any backtest claim above 70% must be re-measured as:
“What fraction of sessions closed on the signal side at a fixed horizon (RTH close, or +N minutes)?”
If the original code used any of these patterns, assume it’s a magnet/touch metric until proven otherwise:
.any()max_favorable_excursionever_touchedcontinued_after_retest(high >= target) | (low <= target)
A magnet signal is still useful — it tells you where price is likely to visit. But it is a different question from “which way does the day close” and should never be presented with the same probability label without explicit distinction. We now tag every model page as either close-direction, liquidity-magnet, or liquidity-sweep (continuation) to keep the two questions visually separated.
What shipped
audit_lon_sweep_close.py,audit_pwhl_retest_close.py,audit_ob_range_close.pyas permanent audit tooling- Tier demotions + line deletion in
PredictionModel.cs/PredictionModelWeb.cs - Every new flagship (VWAP REGIME, SB CONT, 0809 HOLD, PRELON SWEEP) ships with close-direction numbers as the primary metric, touch rate only as a secondary
- Model-page
categorytaxonomy so readers never confuse the two
The 24-hour audit was the single most valuable day of work on this project. The stack lost three “Tier 1” signals. What remains is honest.