Skip to content

PEN-TEST-003 — Phase 11a-3 Pennant criteria A/B (V3 + V4)

Field Value
Test ID PEN-TEST-003
Date 2026-05-11
Strategy (detection-only)
Cohorts produced DET-V3-2026-05-11, DET-V4-2026-05-11
Cohort consumed DET-BASELINE-2026-05-11 (for comparison)
Status complete

Purpose

Continuation of PEN-TEST-001 / -002, pushing flagpole.max_duration_bars tighter (5 → 3 → 2). V3 keeps pennant 6–17 / flagpole 1–3; V4 tightens further to flagpole 1–2.

Method

run_v3_v4.py runs the same harness for V3 and V4 in sequence, each producing its own events + outcomes parquet. analyze_v3_v4.py also computes V3∩V4 overlap (which events appear in both cohorts) for diagnostic purposes.

Headline

V3 (flagpole 1–3): 5,200 events — clean cut. V4 (flagpole 1–2): 4,101 events — too aggressive. The V4 cohort's 30-day endpoint mean is below baseline despite tighter selection. The practical floor is V3; V4 trims too aggressively to deliver a quality lift.

Files in this directory

  • run_v3_v4.py — harness driver for both V3 and V4
  • analyze_v3_v4.py — statistics + V3∩V4 overlap analysis
  • run_v3_v4.log, run_v3_v4.stdout.log — run logs
  • summary_v3.json, summary_v4.json — headline JSONs
  • summary_v3_v4_overlap.json — overlap diagnostics
  • report.md../../reports/Pennant/pennant_criteria_ab_test_v3_v4_2026-05-11.md

Cohort outputs

  • Pennant/cohorts/DET-V3-2026-05-11/{events,outcomes}.parquet
  • Pennant/cohorts/DET-V4-2026-05-11/{events,outcomes}.parquet
  • F-003 — V4 trims too aggressively; V3 is the practical floor for flagpole.max_duration_bars.