Phase 11b — Pennant strategy backtest across 5 detection scenarios¶

Stock-only backtest of the production pennant strategy applied to each of the five detection criteria scenarios from Phases 11a / 11a-2 / 11a-3. Detection events came from the cached parquets under ab_test/; no detection runs were re-executed. No production code, config, or data modified.

Equity curves

1. Setup¶

Five scenarios — Baseline (current production: pennant 5–15, flagpole 1–10), V1 (pennant 10–20, flagpole 1–5), V2 (pennant 7–17, flagpole 1–5), V3 (pennant 6–17, flagpole 1–3), V4 (pennant 6–17, flagpole 1–2) — were each simulated independently. Each scenario started with $10,000 cash, took $500 nominal per trade at the anchor close, sold half on a +15 % move, trailed the remainder by 3 % from the in-trade high, hard-stopped at −7 % from entry, and time-stopped after 30 trading days. The regime gate skipped entries on days where SPY < SPY 200-SMA and VIX > 35 (240 such days in the calendar). Friction: $0.50 per fill commission, ±5 bp slippage. Cash was held un-invested; fractional shares allowed; concurrency capped only by available cash. Detection events were taken in chronological order; if cash on the entry date was < $500 the candidate was skipped ("no_cash"), not back-filled.

All five scenarios share the same strategy mechanics, prices, calendar, and regime gate — the only difference is the detection parameters that produced the event population.

2. End-of-year equity¶

Year	Baseline	V1	V2	V3	V4
2007 start	$10,000	$10,000	$10,000	$10,000	$10,000
2007 EOY	$10,956	$9,988	$10,687	$10,306	$10,320
2008 EOY	$7,864	$8,831	$9,426	$9,106	$9,476
2009 EOY	$9,640	$9,915	$10,939	$8,249	$9,959
2010 EOY	$11,936	$12,042	$12,662	$10,584	$11,873
2011 EOY	$10,896	$12,362	$11,575	$10,278	$11,614
2012 EOY	$11,441	$14,533	$12,352	$11,129	$11,833
2013 EOY	$18,692	$20,071	$18,989	$17,391	$16,766
2014 EOY	$19,978	$20,398	$20,382	$18,291	$17,898
2015 EOY	$19,413	$20,280	$20,620	$17,612	$17,995
2016 EOY	$21,353	$21,036	$23,630	$19,702	$19,301
2017 EOY	$25,242	$25,010	$27,536	$22,403	$21,323
2018 EOY	$23,851	$23,711	$26,683	$21,732	$21,433
2019 EOY	$23,874	$23,927	$26,818	$20,664	$20,556
2020 EOY	$29,101	$25,870	$28,795	$23,098	$20,919
2021 EOY	$39,232	$31,005	$35,856	$29,278	$26,185
2022 EOY	$33,488	$28,418	$32,553	$26,605	$24,725
2023 EOY	$33,089	$28,206	$33,035	$26,840	$25,060
2024 EOY	$36,010	$27,667	$32,107	$28,545	$25,336
2025 EOY	$39,865	$31,565	$36,779	$31,334	$28,504
2026 YTD (2026-05-11)	$40,398	$30,858	$35,651	$31,240	$28,615

3. Headline metrics¶

Scenario	Final equity	Total return	CAGR	Max DD	Sharpe	Trades	Skipped no-cash	Win rate
Baseline	$40,398	+304 %	7.50 %	−38.1 %	0.543	8,842	6,573	43.7 %
V1	$30,858	+209 %	6.01 %	−20.4 %	0.609	4,533	591	45.1 %
V2	$35,651	+257 %	6.80 %	−21.4 %	0.617	6,108	1,272	44.7 %
V3	$31,240	+212 %	6.08 %	−32.3 %	0.547	5,200	1,070	45.0 %
V4	$28,615	+186 %	5.59 %	−21.2 %	0.580	4,101	511	45.4 %

Regime gate blocked 24 – 106 entries per scenario (0.5 – 1.2 % of candidates) — the gate is a small effect at this timescale.

4. Interpretation¶

Absolute return: Baseline wins, then V2. Baseline finishes at $40,398 vs V2's $35,651, with V1 and V3 closely tied around $31k and V4 last at $28,615. Baseline simply gets to take more trades — 8,842 filled trades vs V2's 6,108, vs V4's 4,101 — and on $500 fixed sizing that compounds into more raw return. Roughly $9.5k of Baseline's $10.4k lead over V4 traces directly to taking 4,741 more entries.

Risk-adjusted: V2 wins, then V1. Baseline pays for its higher volume with the worst drawdown in the field (−38 %, vs V1's −20 % and V2's −21 %) and the lowest Sharpe (0.543). V2 captures most of Baseline's compounding (CAGR 6.80 % vs 7.50 %) with about half the drawdown — Sharpe 0.617 is the best of the five. V1 is close behind at 0.609 with the tightest drawdown (−20.4 %). V3 lands at Baseline's Sharpe but with a noticeably worse drawdown than V2/V4.

Cash utilization tells the story. Baseline's 6,573 no-cash skips (42 % of candidates!) show capital is constantly fully deployed — buying every $500 slot it can. V2 skips only 1,272 (17 %), V4 skips just 511 (11 %). The variant detectors send fewer candidates so cash sits idle more often; that idle cash is the drag against Baseline on absolute return, but it's also the buffer that produces the smaller drawdowns.

Year-by-year, the ranking flips repeatedly. In 2008 the order was V4 > V2 > V3 > V1 > Baseline — the variants survived the crash with less capital deployed in falling stocks. By 2012, V1 led the field (+45 % vs Baseline +14 %). V2 had the lead from 2016 through 2020 (stronger pole-quality screen helped through choppy mid-2010s). The 2021–2024 bull rotated Baseline back to the front as raw count mattered more than per-trade quality. V3 has trailed V1 for most of the run and never recovered the early 2009 drawdown gap. V4 has trailed throughout, consistent with the Phase 11a-3 finding that flagpole.max = 2 trims too aggressively.

Win rate alone is not the discriminator. All five scenarios sit between 43.7 % and 45.4 % — a 1.7 pp spread. V4 has the highest win rate but the lowest CAGR, because its trades pay less per win (consistent with §6 of the Phase 11a-3 report: V4's 30-day endpoint mean is the only one below baseline). The lever moving the equity curve here is trade count × average profit, not win-rate.

The decision the numbers frame is whether to optimise for absolute compounding (Baseline) or for risk-adjusted compounding (V2). V2 gives up ~$4.7k in final equity over 19 years (~$250/year) in exchange for cutting peak-to-trough drawdown nearly in half. El Don decides.

Artifacts under ab_test/: backtest.py, plot_backtest.py, backtest.log, backtest_summary.json, backtest_eoy.csv, plus backtest_trades_<scenario>.parquet and backtest_equity_<scenario>.parquet for all 5 scenarios.

Chart: charts/pennant_strategy_backtest_equity_curves_2026-05-11.png (300 dpi).