Browser Automation
Use when the user asks to automate browser tasks, scrape websites, fill forms, capture screenshots, extract structured data from web pages, or build web automation workflows. NOT for testing — use ...
Install
Quick install
npx skills add https://github.com/alirezarezvani/claude-skills/tree/main/engineering/skills/browser-automationnpx skills add alirezarezvani/claude-skills --skill browser-automation --agent claude-codenpx skills add alirezarezvani/claude-skills --skill browser-automation --agent cursornpx skills add alirezarezvani/claude-skills --skill browser-automation --agent codexnpx skills add alirezarezvani/claude-skills --skill browser-automation --agent opencodenpx skills add alirezarezvani/claude-skills --skill browser-automation --agent github-copilotnpx skills add alirezarezvani/claude-skills --skill browser-automation --agent windsurfMore install options
Shorthand — useful for multi-skill repos:
npx skills add alirezarezvani/claude-skills --skill browser-automationManual — clone the repo and drop the folder into your agent's skills directory:
git clone https://github.com/alirezarezvani/claude-skills.gitcp -r claude-skills/engineering/skills/browser-automation ~/.claude/skills/Browser Automation - POWERFUL
Overview
The Browser Automation skill provides comprehensive tools and knowledge for building production-grade web automation workflows using Playwright. This skill covers data extraction, form filling, screenshot capture, session management, and anti-detection patterns for reliable browser automation at scale.
When to use this skill:
- Scraping structured data from websites (tables, listings, search results)
- Automating multi-step browser workflows (login, fill forms, download files)
- Capturing screenshots or PDFs of web pages
- Extracting data from SPAs and JavaScript-heavy sites
- Building repeatable browser-based data pipelines
When NOT to use this skill:
- Writing browser tests or E2E test suites — use playwright-pro instead
- Testing API endpoints — use api-test-suite-builder instead
- Load testing or performance benchmarking — use performance-profiler instead
Why Playwright over Selenium or Puppeteer:
- Auto-wait built in — no explicit
sleep()orwaitForElement()needed for most actions - Multi-browser from one API — Chromium, Firefox, WebKit with zero config changes
- Network interception — block ads, mock responses, capture API calls natively
- Browser contexts — isolated sessions without spinning up new browser instances
- Codegen —
playwright codegenrecords your actions and generates scripts - Async-first — Python async/await for high-throughput scraping
Core Competencies
1. Web Scraping Patterns
Selector priority (most to least reliable):
data-testid,data-id, or custom data attributes — stable across redesigns#idselectors — unique but may change between deploys- Semantic selectors:
article,nav,main,section— resilient to CSS changes - Class-based:
.product-card,.price— brittle if classes are generated (e.g., CSS modules) - Positional:
nth-child(),nth-of-type()— last resort, breaks on layout changes
Use XPath only when CSS cannot express the relationship (e.g., ancestor traversal, text-based selection).
Pagination strategies: next-button, URL-based (?page=N), infinite scroll, load-more button. See [data_extraction_recipes.md](references/data_extraction_recipes.md) for complete pagination handlers and scroll patterns.
2. Form Filling & Multi-Step Workflows
Break multi-step forms into discrete functions per step. Each function fills fields, clicks "Next"/"Continue", and waits for the next step to load (URL change or DOM element).
Key patterns: login flows, multi-page forms, file uploads (including drag-and-drop zones), native and custom dropdown handling. See [playwright_browser_api.md](references/playwright_browser_api.md) for complete API reference on fill(), select_option(), set_input_files(), and expect_file_chooser().
3. Screenshot & PDF Capture
- Full page:
await page.screenshot(path="full.png", full_page=True) - Element:
await page.locator("div.chart").screenshot(path="chart.png") - PDF (Chromium only):
await page.pdf(path="out.pdf", format="A4", print_background=True) - Visual regression: Take screenshots at known states, store baselines in version control with naming:
{page}_{viewport}_{state}.png
See [playwright_browser_api.md](references/playwright_browser_api.md) for full screenshot/PDF options.
4. Structured Data Extraction
Core extraction patterns:
- Tables to JSON — Extract
<thead>headers and<tbody>rows into dictionaries - Listings to arrays — Map repeating card elements using a field-selector map (supports
::attr()for attributes) - Nested/threaded data — Recursive extraction for comments with replies, category trees
See [data_extraction_recipes.md](references/data_extraction_recipes.md) for complete extraction functions, price parsing, data cleaning utilities, and output format helpers (JSON, CSV, JSONL).
5. Cookie & Session Management
- Save/restore cookies:
context.cookies()andcontext.add_cookies() - Full storage state (cookies + localStorage):
context.storage_state(path="state.json")to save,browser.new_context(storage_state="state.json")to restore
Best practice: Save state after login, reuse across scraping sessions. Check session validity before starting a long job — make a lightweight request to a protected page and verify you are not redirected to login. See [playwright_browser_api.md](references/playwright_browser_api.md) for cookie and storage state API details.
6. Anti-Detection Patterns
Modern websites detect automation through multiple vectors. Apply these in priority order:
- WebDriver flag removal — Remove
navigator.webdriver = truevia init script (critical) - Custom user agent — Rotate through real browser UAs; never use the default headless UA
- Realistic viewport — Set 1920x1080 or similar real-world dimensions (default 800x600 is a red flag)
- Request throttling — Add
random.uniform()delays between actions - Proxy support — Per-browser or per-context proxy configuration
See [anti_detection_patterns.md](references/anti_detection_patterns.md) for the complete stealth stack: navigator property hardening, WebGL/canvas fingerprint evasion, behavioral simulation (mouse movement, typing speed, scroll patterns), proxy rotation strategies, and detection self-test URLs.
7. Dynamic Content Handling
- SPA rendering: Wait for content selectors (
wait_for_selector), not the page load event - AJAX/Fetch waiting: Use
page.expect_response("**/api/data*")to intercept and wait for specific API calls - Shadow DOM: Playwright pierces open Shadow DOM with
>>operator:page.locator("custom-element >> .inner-class") - Lazy-loaded images: Scroll elements into view with
scroll_into_view_if_needed()to trigger loading
See [playwright_browser_api.md](references/playwright_browser_api.md) for wait strategies, network interception, and Shadow DOM details.
8. Error Handling & Retry Logic
- Retry with backoff: Wrap page interactions in retry logic with exponential backoff (e.g., 1s, 2s, 4s)
- Fallback selectors: On
TimeoutError, try alternative selectors before failing - Error-state screenshots: Capture
page.screenshot(path="error-state.png")on unexpected failures for debugging - Rate limit detection: Check for HTTP 429 responses and respect
Retry-Afterheaders
See [anti_detection_patterns.md](references/anti_detection_patterns.md) for the complete exponential backoff implementation and rate limiter class.
Workflows
Workflow 1: Single-Page Data Extraction
Scenario: Extract product data from a single page with JavaScript-rendered content.
Steps:
- Launch browser in headed mode during development (
headless=False), switch to headless for production - Navigate to URL and wait for content selector
- Extract data using
query_selector_allwith field mapping - Validate extracted data (check for nulls, expected types)
- Output as JSON
async def extract_single_page(url, selectors):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 ..."
)
page = await context.new_page()
await page.goto(url, wait_until="networkidle")
data = await extract_listings(page, selectors["container"], selectors["fields"])
await browser.close()
return data
Workflow 2: Multi-Page Scraping with Pagination
Scenario: Scrape search results across 50+ pages.
Steps:
- Launch browser with anti-detection settings
- Navigate to first page
- Extract data from current page
- Check if "Next" button exists and is enabled
- Click next, wait for new content to load (not just navigation)
- Repeat until no next page or max pages reached
- Deduplicate results by unique key
- Write output incrementally (don't hold everything in memory)
async def scrape_paginated(base_url, selectors, max_pages=100):
all_data = []
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await (await browser.new_context()).new_page()
await page.goto(base_url)
for page_num in range(max_pages):
items = await extract_listings(page, selectors["container"], selectors["fields"])
all_data.extend(items)
next_btn = page.locator(selectors["next_button"])
if await next_btn.count() == 0 or await next_btn.is_disabled():
break
await next_btn.click()
await page.wait_for_selector(selectors["container"])
await human_delay(800, 2000)
await browser.close()
return all_data
Workflow 3: Authenticated Workflow Automation
Scenario: Log into a portal, navigate a multi-step form, download a report.
Steps:
- Check for existing session state file
- If no session, perform login and save state
- Navigate to target page using saved session
- Fill multi-step form with provided data
- Wait for download to trigger
- Save downloaded file to target directory
async def authenticated_workflow(credentials, form_data, download_dir):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
state_file = "session_state.json"
# Restore or create session
if os.path.exists(state_file):
context = await browser.new_context(storage_state=state_file)
else:
context = await browser.new_context()
page = await context.new_page()
await login(page, credentials["url"], credentials["user"], credentials["pass"])
await context.storage_state(path=state_file)
page = await context.new_page()
await page.goto(form_data["target_url"])
# Fill form steps
for step_fn in [fill_step_1, fill_step_2]:
await step_fn(page, form_data)
# Handle download
async with page.expect_download() as dl_info:
await page.click("button:has-text('Download Report')")
download = await dl_info.value
await download.save_as(os.path.join(download_dir, download.suggested_filename))
await browser.close()
Tools Reference
| Script | Purpose | Key Flags | Output |
|--------|---------|-----------|--------|
| scraping_toolkit.py | Generate Playwright scraping script skeleton | --url, --selectors, --paginate, --output | Python script or JSON config |
| form_automation_builder.py | Generate form-fill automation script from field spec | --fields, --url, --output | Python automation script |
| anti_detection_checker.py | Audit a Playwright script for detection vectors | --file, --verbose | Risk report with score |
All scripts are stdlib-only. Run python3 <script> --help for full usage.
Anti-Patterns
Hardcoded Waits
Bad:await page.wait_for_timeout(5000) before every action.
Good: Use wait_for_selector, wait_for_url, expect_response, or wait_for_load_state. Hardcoded waits are flaky and slow.
No Error Recovery
Bad: Linear script that crashes on first failure. Good: Wrap each page interaction in try/except. Take error-state screenshots. Implement retry with exponential backoff.Ignoring robots.txt
Bad: Scraping without checking robots.txt directives. Good: Fetch and parse robots.txt before scraping. RespectCrawl-delay. Skip disallowed paths. Add your bot name to User-Agent if running at scale.
Storing Credentials in Scripts
Bad: Hardcoding usernames and passwords in Python files. Good: Use environment variables,.env files (gitignored), or a secrets manager. Pass credentials via CLI arguments.
No Rate Limiting
Bad: Hammering a site with 100 requests/second. Good: Add random delays between requests (1-3s for polite scraping). Monitor for 429 responses. Implement exponential backoff.Selector Fragility
Bad: Relying on auto-generated class names (.css-1a2b3c) or deep nesting (div > div > div > span:nth-child(3)).
Good: Use data attributes, semantic HTML, or text-based locators. Test selectors in browser DevTools first.
Not Cleaning Up Browser Instances
Bad: Launching browsers without closing them, leading to resource leaks. Good: Always usetry/finally or async context managers to ensure browser.close() is called.
Running Headed in Production
Bad: Usingheadless=False in production/CI.
Good: Develop with headed mode for debugging, deploy with headless=True. Use environment variable to toggle: headless = os.environ.get("HEADLESS", "true") == "true".
Cross-References
- playwright-pro — Browser testing skill. Use for E2E tests, test assertions, test fixtures. Browser Automation is for data extraction and workflow automation, not testing.
- api-test-suite-builder — When the website has a public API, hit the API directly instead of scraping the rendered page. Faster, more reliable, less detectable.
- performance-profiler — If your automation scripts are slow, profile the bottlenecks before adding concurrency.
- env-secrets-manager — For securely managing credentials used in authenticated automation workflows.
SKILL.md source
---
name: browser-automation
description: Use when the user asks to automate browser tasks, scrape websites, fill forms, capture screenshots, extract structured data from web pages, or build web automation workflows. NOT for testing — use ...
---
# Browser Automation - POWERFUL
## Overview
The Browser Automation skill provides comprehensive tools and knowledge for building production-grade web automation workflows using Playwright. This skill covers data extraction, form filling, screenshot capture, session management, and anti-detection patterns for reliable browser automation at scale.
**When to use this skill:**
- Scraping structured data from websites (tables, listings, search results)
- Automating multi-step browser workflows (login, fill forms, download files)
- Capturing screenshots or PDFs of web pages
- Extracting data from SPAs and JavaScript-heavy sites
- Building repeatable browser-based data pipelines
**When NOT to use this skill:**
- Writing browser tests or E2E test suites — use **playwright-pro** instead
- Testing API endpoints — use **api-test-suite-builder** instead
- Load testing or performance benchmarking — use **performance-profiler** instead
**Why Playwright over Selenium or Puppeteer:**
- **Auto-wait built in** — no explicit `sleep()` or `waitForElement()` needed for most actions
- **Multi-browser from one API** — Chromium, Firefox, WebKit with zero config changes
- **Network interception** — block ads, mock responses, capture API calls natively
- **Browser contexts** — isolated sessions without spinning up new browser instances
- **Codegen** — `playwright codegen` records your actions and generates scripts
- **Async-first** — Python async/await for high-throughput scraping
## Core Competencies
### 1. Web Scraping Patterns
**Selector priority (most to least reliable):**
1. `data-testid`, `data-id`, or custom data attributes — stable across redesigns
2. `#id` selectors — unique but may change between deploys
3. Semantic selectors: `article`, `nav`, `main`, `section` — resilient to CSS changes
4. Class-based: `.product-card`, `.price` — brittle if classes are generated (e.g., CSS modules)
5. Positional: `nth-child()`, `nth-of-type()` — last resort, breaks on layout changes
Use XPath only when CSS cannot express the relationship (e.g., ancestor traversal, text-based selection).
**Pagination strategies:** next-button, URL-based (`?page=N`), infinite scroll, load-more button. See [data_extraction_recipes.md](references/data_extraction_recipes.md) for complete pagination handlers and scroll patterns.
### 2. Form Filling & Multi-Step Workflows
Break multi-step forms into discrete functions per step. Each function fills fields, clicks "Next"/"Continue", and waits for the next step to load (URL change or DOM element).
Key patterns: login flows, multi-page forms, file uploads (including drag-and-drop zones), native and custom dropdown handling. See [playwright_browser_api.md](references/playwright_browser_api.md) for complete API reference on `fill()`, `select_option()`, `set_input_files()`, and `expect_file_chooser()`.
### 3. Screenshot & PDF Capture
- **Full page:** `await page.screenshot(path="full.png", full_page=True)`
- **Element:** `await page.locator("div.chart").screenshot(path="chart.png")`
- **PDF (Chromium only):** `await page.pdf(path="out.pdf", format="A4", print_background=True)`
- **Visual regression:** Take screenshots at known states, store baselines in version control with naming: `{page}_{viewport}_{state}.png`
See [playwright_browser_api.md](references/playwright_browser_api.md) for full screenshot/PDF options.
### 4. Structured Data Extraction
Core extraction patterns:
- **Tables to JSON** — Extract `<thead>` headers and `<tbody>` rows into dictionaries
- **Listings to arrays** — Map repeating card elements using a field-selector map (supports `::attr()` for attributes)
- **Nested/threaded data** — Recursive extraction for comments with replies, category trees
See [data_extraction_recipes.md](references/data_extraction_recipes.md) for complete extraction functions, price parsing, data cleaning utilities, and output format helpers (JSON, CSV, JSONL).
### 5. Cookie & Session Management
- **Save/restore cookies:** `context.cookies()` and `context.add_cookies()`
- **Full storage state** (cookies + localStorage): `context.storage_state(path="state.json")` to save, `browser.new_context(storage_state="state.json")` to restore
**Best practice:** Save state after login, reuse across scraping sessions. Check session validity before starting a long job — make a lightweight request to a protected page and verify you are not redirected to login. See [playwright_browser_api.md](references/playwright_browser_api.md) for cookie and storage state API details.
### 6. Anti-Detection Patterns
Modern websites detect automation through multiple vectors. Apply these in priority order:
1. **WebDriver flag removal** — Remove `navigator.webdriver = true` via init script (critical)
2. **Custom user agent** — Rotate through real browser UAs; never use the default headless UA
3. **Realistic viewport** — Set 1920x1080 or similar real-world dimensions (default 800x600 is a red flag)
4. **Request throttling** — Add `random.uniform()` delays between actions
5. **Proxy support** — Per-browser or per-context proxy configuration
See [anti_detection_patterns.md](references/anti_detection_patterns.md) for the complete stealth stack: navigator property hardening, WebGL/canvas fingerprint evasion, behavioral simulation (mouse movement, typing speed, scroll patterns), proxy rotation strategies, and detection self-test URLs.
### 7. Dynamic Content Handling
- **SPA rendering:** Wait for content selectors (`wait_for_selector`), not the page load event
- **AJAX/Fetch waiting:** Use `page.expect_response("**/api/data*")` to intercept and wait for specific API calls
- **Shadow DOM:** Playwright pierces open Shadow DOM with `>>` operator: `page.locator("custom-element >> .inner-class")`
- **Lazy-loaded images:** Scroll elements into view with `scroll_into_view_if_needed()` to trigger loading
See [playwright_browser_api.md](references/playwright_browser_api.md) for wait strategies, network interception, and Shadow DOM details.
### 8. Error Handling & Retry Logic
- **Retry with backoff:** Wrap page interactions in retry logic with exponential backoff (e.g., 1s, 2s, 4s)
- **Fallback selectors:** On `TimeoutError`, try alternative selectors before failing
- **Error-state screenshots:** Capture `page.screenshot(path="error-state.png")` on unexpected failures for debugging
- **Rate limit detection:** Check for HTTP 429 responses and respect `Retry-After` headers
See [anti_detection_patterns.md](references/anti_detection_patterns.md) for the complete exponential backoff implementation and rate limiter class.
## Workflows
### Workflow 1: Single-Page Data Extraction
**Scenario:** Extract product data from a single page with JavaScript-rendered content.
**Steps:**
1. Launch browser in headed mode during development (`headless=False`), switch to headless for production
2. Navigate to URL and wait for content selector
3. Extract data using `query_selector_all` with field mapping
4. Validate extracted data (check for nulls, expected types)
5. Output as JSON
```python
async def extract_single_page(url, selectors):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 ..."
)
page = await context.new_page()
await page.goto(url, wait_until="networkidle")
data = await extract_listings(page, selectors["container"], selectors["fields"])
await browser.close()
return data
```
### Workflow 2: Multi-Page Scraping with Pagination
**Scenario:** Scrape search results across 50+ pages.
**Steps:**
1. Launch browser with anti-detection settings
2. Navigate to first page
3. Extract data from current page
4. Check if "Next" button exists and is enabled
5. Click next, wait for new content to load (not just navigation)
6. Repeat until no next page or max pages reached
7. Deduplicate results by unique key
8. Write output incrementally (don't hold everything in memory)
```python
async def scrape_paginated(base_url, selectors, max_pages=100):
all_data = []
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await (await browser.new_context()).new_page()
await page.goto(base_url)
for page_num in range(max_pages):
items = await extract_listings(page, selectors["container"], selectors["fields"])
all_data.extend(items)
next_btn = page.locator(selectors["next_button"])
if await next_btn.count() == 0 or await next_btn.is_disabled():
break
await next_btn.click()
await page.wait_for_selector(selectors["container"])
await human_delay(800, 2000)
await browser.close()
return all_data
```
### Workflow 3: Authenticated Workflow Automation
**Scenario:** Log into a portal, navigate a multi-step form, download a report.
**Steps:**
1. Check for existing session state file
2. If no session, perform login and save state
3. Navigate to target page using saved session
4. Fill multi-step form with provided data
5. Wait for download to trigger
6. Save downloaded file to target directory
```python
async def authenticated_workflow(credentials, form_data, download_dir):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
state_file = "session_state.json"
# Restore or create session
if os.path.exists(state_file):
context = await browser.new_context(storage_state=state_file)
else:
context = await browser.new_context()
page = await context.new_page()
await login(page, credentials["url"], credentials["user"], credentials["pass"])
await context.storage_state(path=state_file)
page = await context.new_page()
await page.goto(form_data["target_url"])
# Fill form steps
for step_fn in [fill_step_1, fill_step_2]:
await step_fn(page, form_data)
# Handle download
async with page.expect_download() as dl_info:
await page.click("button:has-text('Download Report')")
download = await dl_info.value
await download.save_as(os.path.join(download_dir, download.suggested_filename))
await browser.close()
```
## Tools Reference
| Script | Purpose | Key Flags | Output |
|--------|---------|-----------|--------|
| `scraping_toolkit.py` | Generate Playwright scraping script skeleton | `--url`, `--selectors`, `--paginate`, `--output` | Python script or JSON config |
| `form_automation_builder.py` | Generate form-fill automation script from field spec | `--fields`, `--url`, `--output` | Python automation script |
| `anti_detection_checker.py` | Audit a Playwright script for detection vectors | `--file`, `--verbose` | Risk report with score |
All scripts are stdlib-only. Run `python3 <script> --help` for full usage.
## Anti-Patterns
### Hardcoded Waits
**Bad:** `await page.wait_for_timeout(5000)` before every action.
**Good:** Use `wait_for_selector`, `wait_for_url`, `expect_response`, or `wait_for_load_state`. Hardcoded waits are flaky and slow.
### No Error Recovery
**Bad:** Linear script that crashes on first failure.
**Good:** Wrap each page interaction in try/except. Take error-state screenshots. Implement retry with exponential backoff.
### Ignoring robots.txt
**Bad:** Scraping without checking robots.txt directives.
**Good:** Fetch and parse robots.txt before scraping. Respect `Crawl-delay`. Skip disallowed paths. Add your bot name to User-Agent if running at scale.
### Storing Credentials in Scripts
**Bad:** Hardcoding usernames and passwords in Python files.
**Good:** Use environment variables, `.env` files (gitignored), or a secrets manager. Pass credentials via CLI arguments.
### No Rate Limiting
**Bad:** Hammering a site with 100 requests/second.
**Good:** Add random delays between requests (1-3s for polite scraping). Monitor for 429 responses. Implement exponential backoff.
### Selector Fragility
**Bad:** Relying on auto-generated class names (`.css-1a2b3c`) or deep nesting (`div > div > div > span:nth-child(3)`).
**Good:** Use data attributes, semantic HTML, or text-based locators. Test selectors in browser DevTools first.
### Not Cleaning Up Browser Instances
**Bad:** Launching browsers without closing them, leading to resource leaks.
**Good:** Always use `try/finally` or async context managers to ensure `browser.close()` is called.
### Running Headed in Production
**Bad:** Using `headless=False` in production/CI.
**Good:** Develop with headed mode for debugging, deploy with `headless=True`. Use environment variable to toggle: `headless = os.environ.get("HEADLESS", "true") == "true"`.
## Cross-References
- **playwright-pro** — Browser testing skill. Use for E2E tests, test assertions, test fixtures. Browser Automation is for data extraction and workflow automation, not testing.
- **api-test-suite-builder** — When the website has a public API, hit the API directly instead of scraping the rendered page. Faster, more reliable, less detectable.
- **performance-profiler** — If your automation scripts are slow, profile the bottlenecks before adding concurrency.
- **env-secrets-manager** — For securely managing credentials used in authenticated automation workflows.
Related skills 6
caveman
Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra, wenyan-lite, wenyan-full, wenyan-ultra. Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", "be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested.
secure-linux-web-hosting
Use when setting up, hardening, or reviewing a cloud server for self-hosting, including DNS, SSH, firewalls, Nginx, static-site hosting, reverse-proxying an app, HTTPS with Let's Encrypt or ACME clients, safe HTTP-to-HTTPS redirects, or optional post-launch network tuning such as BBR.
readme-i18n
Use when the user wants to translate a repository README, make a repo multilingual, localize docs, add a language switcher, internationalize the README, or update localized README variants in a GitHub-style repository.
lark-shared
Use when first setting up lark-cli, running auth login, switching user/bot identity (--as), handling permission denied or scope errors, needing to update lark-cli, or seeing _notice in JSON output.
improve-codebase-architecture
Find deepening opportunities in a codebase, informed by the domain language in CONTEXT.md and the decisions in docs/adr/. Use when the user wants to improve architecture, find refactoring opportunities, consolidate tightly-coupled modules, or make a codebase more testable and AI-navigable.
paper-context-resolver
Optional RigorPilot helper for README-first deep learning repo reproduction. Use only when the README and repository files leave a narrow reproduction-critical gap and the task is to resolve a specific paper detail such as dataset split, preprocessing, evaluation protocol, checkpoint mapping, or runtime assumption from primary paper sources while recording conflicts. Do not use for general paper summary, repo scanning, environment setup, command execution, title-only paper lookup, or replacin...