Experiment Designer
Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor.
Install
Quick install
npx skills add https://github.com/alirezarezvani/claude-skills/tree/main/product-team/skills/experiment-designernpx skills add alirezarezvani/claude-skills --skill experiment-designer --agent claude-codenpx skills add alirezarezvani/claude-skills --skill experiment-designer --agent cursornpx skills add alirezarezvani/claude-skills --skill experiment-designer --agent codexnpx skills add alirezarezvani/claude-skills --skill experiment-designer --agent opencodenpx skills add alirezarezvani/claude-skills --skill experiment-designer --agent github-copilotnpx skills add alirezarezvani/claude-skills --skill experiment-designer --agent windsurfMore install options
Shorthand — useful for multi-skill repos:
npx skills add alirezarezvani/claude-skills --skill experiment-designerManual — clone the repo and drop the folder into your agent's skills directory:
git clone https://github.com/alirezarezvani/claude-skills.gitcp -r claude-skills/product-team/skills/experiment-designer ~/.claude/skills/Experiment Designer
Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions.
When To Use
Use this skill for:
- A/B and multivariate experiment planning
- Hypothesis writing and success criteria definition
- Sample size and minimum detectable effect planning
- Experiment prioritization with ICE scoring
- Reading statistical output for product decisions
Core Workflow
- Write hypothesis in If/Then/Because format
- If we change
[intervention] - Then
[metric]will change by[expected direction/magnitude] - Because
[behavioral mechanism]
- Define metrics before running test
- Primary metric: single decision metric
- Guardrail metrics: quality/risk protection
- Secondary metrics: diagnostics only
- Estimate sample size
- Baseline conversion or baseline mean
- Minimum detectable effect (MDE)
- Significance level (alpha) and power
Use:
python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute
- Prioritize experiments with ICE
- Impact: potential upside
- Confidence: evidence quality
- Ease: cost/speed/complexity
ICE Score = (Impact Confidence Ease) / 10
- Launch with stopping rules
- Decide fixed sample size or fixed duration in advance
- Avoid repeated peeking without proper method
- Monitor guardrails continuously
- Interpret results
- Statistical significance is not business significance
- Compare point estimate + confidence interval to decision threshold
- Investigate novelty effects and segment heterogeneity
Hypothesis Quality Checklist
- [ ] Contains explicit intervention and audience
- [ ] Specifies measurable metric change
- [ ] States plausible causal reason
- [ ] Includes expected minimum effect
- [ ] Defines failure condition
Common Experiment Pitfalls
- Underpowered tests leading to false negatives
- Running too many simultaneous changes without isolation
- Changing targeting or implementation mid-test
- Stopping early on random spikes
- Ignoring sample ratio mismatch and instrumentation drift
- Declaring success from p-value without effect-size context
Statistical Interpretation Guardrails
- p-value < alpha indicates evidence against null, not guaranteed truth.
- Confidence interval crossing zero/no-effect means uncertain directional claim.
- Wide intervals imply low precision even when significant.
- Use practical significance thresholds tied to business impact.
See:
references/experiment-playbook.mdreferences/statistics-reference.md
Tooling
scripts/sample_size_calculator.py
Computes required sample size (per variant and total) from:
- baseline rate
- MDE (absolute or relative)
- significance level (alpha)
- statistical power
Example:
python3 scripts/sample_size_calculator.py \
--baseline-rate 0.10 \
--mde 0.015 \
--mde-type absolute \
--alpha 0.05 \
--power 0.8
SKILL.md source
--- name: experiment-designer description: Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor. --- # Experiment Designer Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions. ## When To Use Use this skill for: - A/B and multivariate experiment planning - Hypothesis writing and success criteria definition - Sample size and minimum detectable effect planning - Experiment prioritization with ICE scoring - Reading statistical output for product decisions ## Core Workflow 1. Write hypothesis in If/Then/Because format - If we change `[intervention]` - Then `[metric]` will change by `[expected direction/magnitude]` - Because `[behavioral mechanism]` 2. Define metrics before running test - Primary metric: single decision metric - Guardrail metrics: quality/risk protection - Secondary metrics: diagnostics only 3. Estimate sample size - Baseline conversion or baseline mean - Minimum detectable effect (MDE) - Significance level (alpha) and power Use: ```bash python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute ``` 4. Prioritize experiments with ICE - Impact: potential upside - Confidence: evidence quality - Ease: cost/speed/complexity ICE Score = (Impact * Confidence * Ease) / 10 5. Launch with stopping rules - Decide fixed sample size or fixed duration in advance - Avoid repeated peeking without proper method - Monitor guardrails continuously 6. Interpret results - Statistical significance is not business significance - Compare point estimate + confidence interval to decision threshold - Investigate novelty effects and segment heterogeneity ## Hypothesis Quality Checklist - [ ] Contains explicit intervention and audience - [ ] Specifies measurable metric change - [ ] States plausible causal reason - [ ] Includes expected minimum effect - [ ] Defines failure condition ## Common Experiment Pitfalls - Underpowered tests leading to false negatives - Running too many simultaneous changes without isolation - Changing targeting or implementation mid-test - Stopping early on random spikes - Ignoring sample ratio mismatch and instrumentation drift - Declaring success from p-value without effect-size context ## Statistical Interpretation Guardrails - p-value < alpha indicates evidence against null, not guaranteed truth. - Confidence interval crossing zero/no-effect means uncertain directional claim. - Wide intervals imply low precision even when significant. - Use practical significance thresholds tied to business impact. See: - `references/experiment-playbook.md` - `references/statistics-reference.md` ## Tooling ### `scripts/sample_size_calculator.py` Computes required sample size (per variant and total) from: - baseline rate - MDE (absolute or relative) - significance level (alpha) - statistical power Example: ```bash python3 scripts/sample_size_calculator.py \ --baseline-rate 0.10 \ --mde 0.015 \ --mde-type absolute \ --alpha 0.05 \ --power 0.8 ```
Related skills 6
to-prd
Turn the current conversation context into a PRD and publish it to the project issue tracker. Use when user wants to create a PRD from the current context.
to-issues
Break a plan, spec, or PRD into independently-grabbable issues on the project issue tracker using tracer-bullet vertical slices. Use when user wants to convert a plan into issues, create implementation tickets, or break down work into issues.
Boardroom
/cs:boardroom <brief> — 6-phase multi-role deliberation across the C-suite with Phase 2 isolation, critic pre-screen, and synthesis. Outputs a board memo.
Brief
/cs:brief <topic> — Generate a one-page strategy brief from an office-hours intake. First step in the strategic sprint pipeline.
C Level Agents
Founder-mode executive team. 8 cs-* C-suite agents (CFO, CMO, CRO, CPO, COO, CHRO, CISO, Chief of Staff) and 17 /cs:* slash commands for forcing-question office hours, multi-role boardroom delibera...
Caio Review
/cs:caio-review <plan> — Eval-demanding Chief AI Officer interrogation of any plan that involves AI: model selection, risk classification, cost economics, or AI hiring.