★ Featured Development

Run

Run a single experiment iteration. Edit the target file, evaluate, keep or discard.

Authoralirezarezvani

Version1.0.0

LicenseMIT

Token count~609

UpdatedJun 4, 2026

Install

Quick install

via npx skills · works with 57+ agents

npx skills add https://github.com/alirezarezvani/claude-skills/tree/main/engineering/autoresearch-agent/skills/run

Or pick agent:

npx skills add alirezarezvani/claude-skills --skill run --agent claude-code

npx skills add alirezarezvani/claude-skills --skill run --agent cursor

npx skills add alirezarezvani/claude-skills --skill run --agent codex

npx skills add alirezarezvani/claude-skills --skill run --agent opencode

npx skills add alirezarezvani/claude-skills --skill run --agent github-copilot

npx skills add alirezarezvani/claude-skills --skill run --agent windsurf

More install options

Shorthand — useful for multi-skill repos:

npx skills add alirezarezvani/claude-skills --skill run

Manual — clone the repo and drop the folder into your agent's skills directory:

git clone https://github.com/alirezarezvani/claude-skills.git

cp -r claude-skills/engineering/autoresearch-agent/skills/run ~/.claude/skills/

How to use: Once installed, ask your agent to "use the run skill" or describe what you want (e.g. "Run a single experiment iteration. Edit the target file, evaluate, keep or disca"). Requires Node.js 18+.

/ar:run — Single Experiment Iteration

Run exactly ONE experiment iteration: review history, decide a change, edit, commit, evaluate.

Usage

/ar:run engineering/api-speed              # Run one iteration
/ar:run                                     # List experiments, let user pick

What It Does

Step 1: Resolve experiment

If no experiment specified, run python {skill_path}/scripts/setup_experiment.py --list and ask the user to pick.

Step 2: Load context

# Read experiment config
cat .autoresearch/{domain}/{name}/config.cfg

# Read strategy and constraints
cat .autoresearch/{domain}/{name}/program.md

# Read experiment history
cat .autoresearch/{domain}/{name}/results.tsv

# Checkout the experiment branch
git checkout autoresearch/{domain}/{name}

Step 3: Decide what to try

Review results.tsv:

What changes were kept? What pattern do they share?

What was discarded? Avoid repeating those approaches.

What crashed? Understand why.

How many runs so far? (Escalate strategy accordingly)

Strategy escalation:

Runs 1-5: Low-hanging fruit (obvious improvements)

Runs 6-15: Systematic exploration (vary one parameter)

Runs 16-30: Structural changes (algorithm swaps)

Runs 30+: Radical experiments (completely different approaches)

Step 4: Make ONE change

Edit only the target file specified in config.cfg. Change one thing. Keep it simple.

Step 5: Commit and evaluate

git add {target}
git commit -m "experiment: {short description of what changed}"

python {skill_path}/scripts/run_experiment.py \
  --experiment {domain}/{name} --single

Step 6: Report result

Read the script output. Tell the user:

KEEP: "Improvement! {metric}: {value} ({delta} from previous best)"

DISCARD: "No improvement. {metric}: {value} vs best {best}. Reverted."

CRASH: "Evaluation failed: {reason}. Reverted."

Step 7: Self-improvement check

After every 10th experiment (check results.tsv line count), update the Strategy section of program.md with patterns learned.

Rules

ONE change per iteration. Don't change 5 things at once.
NEVER modify the evaluator (evaluate.py). It's ground truth.
Simplicity wins. Equal performance with simpler code is an improvement.
No new dependencies.

SKILL.md source

---
name: run
description: Run a single experiment iteration. Edit the target file, evaluate, keep or discard.
---

# /ar:run — Single Experiment Iteration

Run exactly ONE experiment iteration: review history, decide a change, edit, commit, evaluate.

## Usage

```
/ar:run engineering/api-speed              # Run one iteration
/ar:run                                     # List experiments, let user pick
```

## What It Does

### Step 1: Resolve experiment

If no experiment specified, run `python {skill_path}/scripts/setup_experiment.py --list` and ask the user to pick.

### Step 2: Load context

```bash
# Read experiment config
cat .autoresearch/{domain}/{name}/config.cfg

# Read strategy and constraints
cat .autoresearch/{domain}/{name}/program.md

# Read experiment history
cat .autoresearch/{domain}/{name}/results.tsv

# Checkout the experiment branch
git checkout autoresearch/{domain}/{name}
```

### Step 3: Decide what to try

Review results.tsv:
- What changes were kept? What pattern do they share?
- What was discarded? Avoid repeating those approaches.
- What crashed? Understand why.
- How many runs so far? (Escalate strategy accordingly)

**Strategy escalation:**
- Runs 1-5: Low-hanging fruit (obvious improvements)
- Runs 6-15: Systematic exploration (vary one parameter)
- Runs 16-30: Structural changes (algorithm swaps)
- Runs 30+: Radical experiments (completely different approaches)

### Step 4: Make ONE change

Edit only the target file specified in config.cfg. Change one thing. Keep it simple.

### Step 5: Commit and evaluate

```bash
git add {target}
git commit -m "experiment: {short description of what changed}"

python {skill_path}/scripts/run_experiment.py \
  --experiment {domain}/{name} --single
```

### Step 6: Report result

Read the script output. Tell the user:
- **KEEP**: "Improvement! {metric}: {value} ({delta} from previous best)"
- **DISCARD**: "No improvement. {metric}: {value} vs best {best}. Reverted."
- **CRASH**: "Evaluation failed: {reason}. Reverted."

### Step 7: Self-improvement check

After every 10th experiment (check results.tsv line count), update the Strategy section of program.md with patterns learned.

## Rules

- ONE change per iteration. Don't change 5 things at once.
- NEVER modify the evaluator (evaluate.py). It's ground truth.
- Simplicity wins. Equal performance with simpler code is an improvement.
- No new dependencies.

Related skills 6

caveman

★ Featured

Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra, wenyan-lite, wenyan-full, wenyan-ultra. Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", "be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested.

juliusbrussee 167k

Development

secure-linux-web-hosting

★ Featured

Use when setting up, hardening, or reviewing a cloud server for self-hosting, including DNS, SSH, firewalls, Nginx, static-site hosting, reverse-proxying an app, HTTPS with Let's Encrypt or ACME clients, safe HTTP-to-HTTPS redirects, or optional post-launch network tuning such as BBR.

xixu-me 155k

Development

readme-i18n

★ Featured

Use when the user wants to translate a repository README, make a repo multilingual, localize docs, add a language switcher, internationalize the README, or update localized README variants in a GitHub-style repository.

xixu-me 155k

Development

lark-shared

★ Featured

Use when first setting up lark-cli, running auth login, switching user/bot identity (--as), handling permission denied or scope errors, needing to update lark-cli, or seeing _notice in JSON output.

larksuite 155k

Development

improve-codebase-architecture

★ Featured

Find deepening opportunities in a codebase, informed by the domain language in CONTEXT.md and the decisions in docs/adr/. Use when the user wants to improve architecture, find refactoring opportunities, consolidate tightly-coupled modules, or make a codebase more testable and AI-navigable.

mattpocock 151k

Development

paper-context-resolver

★ Featured

Optional RigorPilot helper for README-first deep learning repo reproduction. Use only when the README and repository files leave a narrow reproduction-critical gap and the task is to resolve a specific paper detail such as dataset split, preprocessing, evaluation protocol, checkpoint mapping, or runtime assumption from primary paper sources while recording conflicts. Do not use for general paper summary, repo scanning, environment setup, command execution, title-only paper lookup, or replacin...

lllllllama 127k

Development