NEW Browse AI tools across categories — updated daily. See what's new →
★ Featured Development

Setup

Set up a new autoresearch experiment interactively. Collects domain, target file, eval command, metric, direction, and evaluator.

Version1.0.0
LicenseMIT
Token count~713
UpdatedJun 4, 2026

Install

Quick install

via npx skills · works with 57+ agents
npx skills add https://github.com/alirezarezvani/claude-skills/tree/main/engineering/autoresearch-agent/skills/setup
Or pick agent:
npx skills add alirezarezvani/claude-skills --skill setup --agent claude-code
npx skills add alirezarezvani/claude-skills --skill setup --agent cursor
npx skills add alirezarezvani/claude-skills --skill setup --agent codex
npx skills add alirezarezvani/claude-skills --skill setup --agent opencode
npx skills add alirezarezvani/claude-skills --skill setup --agent github-copilot
npx skills add alirezarezvani/claude-skills --skill setup --agent windsurf
More install options

Shorthand — useful for multi-skill repos:

npx skills add alirezarezvani/claude-skills --skill setup

Manual — clone the repo and drop the folder into your agent's skills directory:

git clone https://github.com/alirezarezvani/claude-skills.git
cp -r claude-skills/engineering/autoresearch-agent/skills/setup ~/.claude/skills/
How to use: Once installed, ask your agent to "use the setup skill" or describe what you want (e.g. "Set up a new autoresearch experiment interactively. Collects domain, target file"). Requires Node.js 18+.

/ar:setup — Create New Experiment

Set up a new autoresearch experiment with all required configuration.

Usage

/ar:setup                                    # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list                             # Show existing experiments
/ar:setup --list-evaluators                  # Show available evaluators

What It Does

If arguments provided

Pass them directly to the setup script:

python {skill_path}/scripts/setup_experiment.py \
  --domain {domain} --name {name} \
  --target {target} --eval "{eval_cmd}" \
  --metric {metric} --direction {direction} \
  [--evaluator {evaluator}] [--scope {scope}]

If no arguments (interactive mode)

Collect each parameter one at a time:

  1. Domain — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
  2. Name — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
  3. Target file — Ask: "Which file to optimize?" Verify it exists.
  4. Eval command — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
  5. Metric — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
  6. Direction — Ask: "Is lower or higher better?"
  7. Evaluator (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
  8. Scope — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"

Then run setup_experiment.py with the collected parameters.

Listing

# Show existing experiments
python {skill_path}/scripts/setup_experiment.py --list

# Show available evaluators
python {skill_path}/scripts/setup_experiment.py --list-evaluators

Built-in Evaluators

| Name | Metric | Use Case |
|------|--------|----------|
| benchmark_speed | p50_ms (lower) | Function/API execution time |
| benchmark_size | size_bytes (lower) | File, bundle, Docker image size |
| test_pass_rate | pass_rate (higher) | Test suite pass percentage |
| build_speed | build_seconds (lower) | Build/compile/Docker build time |
| memory_usage | peak_mb (lower) | Peak memory during execution |
| llm_judge_content | ctr_score (higher) | Headlines, titles, descriptions |
| llm_judge_prompt | quality_score (higher) | System prompts, agent instructions |
| llm_judge_copy | engagement_score (higher) | Social posts, ad copy, emails |

After Setup

Report to the user:


  • Experiment path and branch name

  • Whether the eval command worked and the baseline metric

  • Suggest: "Run /ar:run {domain}/{name} to start iterating, or /ar:loop {domain}/{name} for autonomous mode."

SKILL.md source

---
name: setup
description: Set up a new autoresearch experiment interactively. Collects domain, target file, eval command, metric, direction, and evaluator.
---

# /ar:setup — Create New Experiment

Set up a new autoresearch experiment with all required configuration.

## Usage

```
/ar:setup                                    # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list                             # Show existing experiments
/ar:setup --list-evaluators                  # Show available evaluators
```

## What It Does

### If arguments provided

Pass them directly to the setup script:

```bash
python {skill_path}/scripts/setup_experiment.py \
  --domain {domain} --name {name} \
  --target {target} --eval "{eval_cmd}" \
  --metric {metric} --direction {direction} \
  [--evaluator {evaluator}] [--scope {scope}]
```

### If no arguments (interactive mode)

Collect each parameter one at a time:

1. **Domain** — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
2. **Name** — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
3. **Target file** — Ask: "Which file to optimize?" Verify it exists.
4. **Eval command** — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
5. **Metric** — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
6. **Direction** — Ask: "Is lower or higher better?"
7. **Evaluator** (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
8. **Scope** — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"

Then run `setup_experiment.py` with the collected parameters.

### Listing

```bash
# Show existing experiments
python {skill_path}/scripts/setup_experiment.py --list

# Show available evaluators
python {skill_path}/scripts/setup_experiment.py --list-evaluators
```

## Built-in Evaluators

| Name | Metric | Use Case |
|------|--------|----------|
| `benchmark_speed` | `p50_ms` (lower) | Function/API execution time |
| `benchmark_size` | `size_bytes` (lower) | File, bundle, Docker image size |
| `test_pass_rate` | `pass_rate` (higher) | Test suite pass percentage |
| `build_speed` | `build_seconds` (lower) | Build/compile/Docker build time |
| `memory_usage` | `peak_mb` (lower) | Peak memory during execution |
| `llm_judge_content` | `ctr_score` (higher) | Headlines, titles, descriptions |
| `llm_judge_prompt` | `quality_score` (higher) | System prompts, agent instructions |
| `llm_judge_copy` | `engagement_score` (higher) | Social posts, ad copy, emails |

## After Setup

Report to the user:
- Experiment path and branch name
- Whether the eval command worked and the baseline metric
- Suggest: "Run `/ar:run {domain}/{name}` to start iterating, or `/ar:loop {domain}/{name}` for autonomous mode."

Related skills 6

caveman

★ Featured

Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra, wenyan-lite, wenyan-full, wenyan-ultra. Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", "be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested.

juliusbrussee 167k
Development

secure-linux-web-hosting

★ Featured

Use when setting up, hardening, or reviewing a cloud server for self-hosting, including DNS, SSH, firewalls, Nginx, static-site hosting, reverse-proxying an app, HTTPS with Let's Encrypt or ACME clients, safe HTTP-to-HTTPS redirects, or optional post-launch network tuning such as BBR.

xixu-me 155k
Development

readme-i18n

★ Featured

Use when the user wants to translate a repository README, make a repo multilingual, localize docs, add a language switcher, internationalize the README, or update localized README variants in a GitHub-style repository.

xixu-me 155k
Development

lark-shared

★ Featured

Use when first setting up lark-cli, running auth login, switching user/bot identity (--as), handling permission denied or scope errors, needing to update lark-cli, or seeing _notice in JSON output.

larksuite 155k
Development

improve-codebase-architecture

★ Featured

Find deepening opportunities in a codebase, informed by the domain language in CONTEXT.md and the decisions in docs/adr/. Use when the user wants to improve architecture, find refactoring opportunities, consolidate tightly-coupled modules, or make a codebase more testable and AI-navigable.

mattpocock 151k
Development

paper-context-resolver

★ Featured

Optional RigorPilot helper for README-first deep learning repo reproduction. Use only when the README and repository files leave a narrow reproduction-critical gap and the task is to resolve a specific paper detail such as dataset split, preprocessing, evaluation protocol, checkpoint mapping, or runtime assumption from primary paper sources while recording conflicts. Do not use for general paper summary, repo scanning, environment setup, command execution, title-only paper lookup, or replacin...

lllllllama 127k
Development