PEN-TEST-003 — Phase 11a-3 Pennant criteria A/B (V3 + V4)¶

Field	Value
Test ID	PEN-TEST-003
Date	2026-05-11
Strategy	(detection-only)
Cohorts produced	DET-V3-2026-05-11, DET-V4-2026-05-11
Cohort consumed	DET-BASELINE-2026-05-11 (for comparison)
Status	complete

Purpose¶

Continuation of PEN-TEST-001 / -002, pushing flagpole.max_duration_bars tighter (5 → 3 → 2). V3 keeps pennant 6–17 / flagpole 1–3; V4 tightens further to flagpole 1–2.

Method¶

run_v3_v4.py runs the same harness for V3 and V4 in sequence, each producing its own events + outcomes parquet. analyze_v3_v4.py also computes V3∩V4 overlap (which events appear in both cohorts) for diagnostic purposes.

Headline¶

V3 (flagpole 1–3): 5,200 events — clean cut. V4 (flagpole 1–2): 4,101 events — too aggressive. The V4 cohort's 30-day endpoint mean is below baseline despite tighter selection. The practical floor is V3; V4 trims too aggressively to deliver a quality lift.

Files in this directory¶

run_v3_v4.py — harness driver for both V3 and V4
analyze_v3_v4.py — statistics + V3∩V4 overlap analysis
run_v3_v4.log, run_v3_v4.stdout.log — run logs
summary_v3.json, summary_v4.json — headline JSONs
summary_v3_v4_overlap.json — overlap diagnostics
report.md → ../../reports/Pennant/pennant_criteria_ab_test_v3_v4_2026-05-11.md

Cohort outputs¶

Pennant/cohorts/DET-V3-2026-05-11/{events,outcomes}.parquet
Pennant/cohorts/DET-V4-2026-05-11/{events,outcomes}.parquet