NEW Browse AI tools across categories — updated daily. See what's new →
★ Featured Product Management

Experiment Designer

Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor.

Version1.0.0
LicenseMIT
Token count~783
UpdatedJun 4, 2026

Install

Quick install

via npx skills · works with 57+ agents
npx skills add https://github.com/alirezarezvani/claude-skills/tree/main/product-team/skills/experiment-designer
Or pick agent:
npx skills add alirezarezvani/claude-skills --skill experiment-designer --agent claude-code
npx skills add alirezarezvani/claude-skills --skill experiment-designer --agent cursor
npx skills add alirezarezvani/claude-skills --skill experiment-designer --agent codex
npx skills add alirezarezvani/claude-skills --skill experiment-designer --agent opencode
npx skills add alirezarezvani/claude-skills --skill experiment-designer --agent github-copilot
npx skills add alirezarezvani/claude-skills --skill experiment-designer --agent windsurf
More install options

Shorthand — useful for multi-skill repos:

npx skills add alirezarezvani/claude-skills --skill experiment-designer

Manual — clone the repo and drop the folder into your agent's skills directory:

git clone https://github.com/alirezarezvani/claude-skills.git
cp -r claude-skills/product-team/skills/experiment-designer ~/.claude/skills/
How to use: Once installed, ask your agent to "use the experiment-designer skill" or describe what you want (e.g. "Use when planning product experiments, writing testable hypotheses, estimating s"). Requires Node.js 18+.

Experiment Designer

Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions.

When To Use

Use this skill for:


  • A/B and multivariate experiment planning

  • Hypothesis writing and success criteria definition

  • Sample size and minimum detectable effect planning

  • Experiment prioritization with ICE scoring

  • Reading statistical output for product decisions

Core Workflow

  1. Write hypothesis in If/Then/Because format
  • If we change [intervention]
  • Then [metric] will change by [expected direction/magnitude]
  • Because [behavioral mechanism]
  1. Define metrics before running test
  • Primary metric: single decision metric
  • Guardrail metrics: quality/risk protection
  • Secondary metrics: diagnostics only
  1. Estimate sample size
  • Baseline conversion or baseline mean
  • Minimum detectable effect (MDE)
  • Significance level (alpha) and power

Use:

python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute

  1. Prioritize experiments with ICE
  • Impact: potential upside
  • Confidence: evidence quality
  • Ease: cost/speed/complexity

ICE Score = (Impact Confidence Ease) / 10

  1. Launch with stopping rules
  • Decide fixed sample size or fixed duration in advance
  • Avoid repeated peeking without proper method
  • Monitor guardrails continuously
  1. Interpret results
  • Statistical significance is not business significance
  • Compare point estimate + confidence interval to decision threshold
  • Investigate novelty effects and segment heterogeneity

Hypothesis Quality Checklist

  • [ ] Contains explicit intervention and audience
  • [ ] Specifies measurable metric change
  • [ ] States plausible causal reason
  • [ ] Includes expected minimum effect
  • [ ] Defines failure condition

Common Experiment Pitfalls

  • Underpowered tests leading to false negatives
  • Running too many simultaneous changes without isolation
  • Changing targeting or implementation mid-test
  • Stopping early on random spikes
  • Ignoring sample ratio mismatch and instrumentation drift
  • Declaring success from p-value without effect-size context

Statistical Interpretation Guardrails

  • p-value < alpha indicates evidence against null, not guaranteed truth.
  • Confidence interval crossing zero/no-effect means uncertain directional claim.
  • Wide intervals imply low precision even when significant.
  • Use practical significance thresholds tied to business impact.

See:


  • references/experiment-playbook.md

  • references/statistics-reference.md

Tooling

scripts/sample_size_calculator.py

Computes required sample size (per variant and total) from:


  • baseline rate

  • MDE (absolute or relative)

  • significance level (alpha)

  • statistical power

Example:

python3 scripts/sample_size_calculator.py \
  --baseline-rate 0.10 \
  --mde 0.015 \
  --mde-type absolute \
  --alpha 0.05 \
  --power 0.8

SKILL.md source

---
name: experiment-designer
description: Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor.
---

# Experiment Designer

Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions.

## When To Use

Use this skill for:
- A/B and multivariate experiment planning
- Hypothesis writing and success criteria definition
- Sample size and minimum detectable effect planning
- Experiment prioritization with ICE scoring
- Reading statistical output for product decisions

## Core Workflow

1. Write hypothesis in If/Then/Because format
- If we change `[intervention]`
- Then `[metric]` will change by `[expected direction/magnitude]`
- Because `[behavioral mechanism]`

2. Define metrics before running test
- Primary metric: single decision metric
- Guardrail metrics: quality/risk protection
- Secondary metrics: diagnostics only

3. Estimate sample size
- Baseline conversion or baseline mean
- Minimum detectable effect (MDE)
- Significance level (alpha) and power

Use:
```bash
python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute
```

4. Prioritize experiments with ICE
- Impact: potential upside
- Confidence: evidence quality
- Ease: cost/speed/complexity

ICE Score = (Impact * Confidence * Ease) / 10

5. Launch with stopping rules
- Decide fixed sample size or fixed duration in advance
- Avoid repeated peeking without proper method
- Monitor guardrails continuously

6. Interpret results
- Statistical significance is not business significance
- Compare point estimate + confidence interval to decision threshold
- Investigate novelty effects and segment heterogeneity

## Hypothesis Quality Checklist

- [ ] Contains explicit intervention and audience
- [ ] Specifies measurable metric change
- [ ] States plausible causal reason
- [ ] Includes expected minimum effect
- [ ] Defines failure condition

## Common Experiment Pitfalls

- Underpowered tests leading to false negatives
- Running too many simultaneous changes without isolation
- Changing targeting or implementation mid-test
- Stopping early on random spikes
- Ignoring sample ratio mismatch and instrumentation drift
- Declaring success from p-value without effect-size context

## Statistical Interpretation Guardrails

- p-value < alpha indicates evidence against null, not guaranteed truth.
- Confidence interval crossing zero/no-effect means uncertain directional claim.
- Wide intervals imply low precision even when significant.
- Use practical significance thresholds tied to business impact.

See:
- `references/experiment-playbook.md`
- `references/statistics-reference.md`

## Tooling

### `scripts/sample_size_calculator.py`

Computes required sample size (per variant and total) from:
- baseline rate
- MDE (absolute or relative)
- significance level (alpha)
- statistical power

Example:
```bash
python3 scripts/sample_size_calculator.py \
  --baseline-rate 0.10 \
  --mde 0.015 \
  --mde-type absolute \
  --alpha 0.05 \
  --power 0.8
```

Related skills 6