A/B Test Plan Template (Free, With Examples)

A good A/B test plan answers five questions before you launch: what you believe, how you’ll measure it, how much traffic you need, how long it runs, and what result would make you ship. Copy the template below, fill in the blanks, and you have a documented, reviewable experiment.

Copy & paste template
A/B TEST PLAN
=================================

1. EXPERIMENT NAME
   <short, descriptive name>

2. HYPOTHESIS (If / Then / Because)
   IF we <change>
   THEN <metric> will <increase/decrease>
   BECAUSE <user insight / reasoning>

3. PRIMARY METRIC
   <the one metric that decides the test>

4. GUARDRAIL METRICS
   <metrics that must NOT get worse, e.g. revenue, refunds, bounce>

5. TARGET AUDIENCE / SEGMENT
   <who sees this test>

6. VARIATIONS
   Control:  <current experience>
   Variant:  <proposed change>

7. SAMPLE SIZE & DURATION
   Baseline conversion rate: __%
   Minimum detectable effect (MDE): __%
   Significance: 95%   Power: 80%
   Required visitors per variation: ____
   Estimated duration: __ days (min 1-2 business cycles)

8. SUCCESS CRITERIA
   Ship the variant if <primary metric> improves by >= <MDE>
   at 95% significance with no guardrail regression.

9. RISKS & ROLLBACK
   <what could go wrong, how you'll roll back>

10. RESULT & DECISION (post-test)
   Outcome: <won / lost / inconclusive>
   Decision: <ship / iterate / kill>
   Learning: <what you now know>

What every A/B test plan needs

Skip any of these and the test gets hard to trust or impossible to act on:

SectionWhy it matters
HypothesisForces a falsifiable prediction, not a vague “let’s try this.”
Primary metricOne metric decides the test — prevents cherry-picking afterward.
Guardrail metricsCatches a “win” that quietly hurts revenue or retention.
Sample size & durationDecided up front so you don’t stop early and fool yourself.
Success criteriaThe ship/kill rule, agreed before you see results.

Example: a filled-in plan

Here is the template applied to a checkout test:

  • Name: Add trust badges to checkout
  • Hypothesis: If we add payment-security trust badges near the pay button, then checkout completion will increase, because exit-survey data shows payment-safety concerns.
  • Primary metric: Checkout completion rate
  • Guardrails: Average order value, refund rate
  • Baseline / MDE: 3% completion, detect a 10% relative lift
  • Sample size: ~51,000 visitors per variation (use the sample size calculator)
  • Success criteria: Ship if completion improves ≥10% at 95% significance with no guardrail regression.

How to fill each section well

Write the hypothesis in If/Then/Because form — the structure forces you to name a real user insight. Full walkthrough: how to write an A/B test hypothesis. For sample size and duration, don’t guess; the math depends on your baseline rate and the minimum detectable effect you care about, and tests should run at least one to two full business cycles. Prioritize which planned test to run first with ICE scoring.

Validate the plan before you run it

A plan tells you the test is well-designed — it doesn’t tell you the idea is good. Before committing weeks of traffic, run the variant through an AI prediction: synthetic personas evaluate control vs. variant and return a Run, Iterate, or Kill verdict in about a minute. Run the live test on the ideas that pass.

Build your plan automatically

AB Test Plan generates the hypothesis, metrics, and sample size for you — then predicts the outcome.

Start Free