Testing & Quality skills.
Test frameworks, debugging, code review, linting.
agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. Also use...
OpenAI / develop-web-game
Build and test web games iteratively using Playwright with time-stepping
Browserbase / cookie-sync
Export cookies from local Chrome into a Browserbase persistent context
Flutter / flutter-testing-apps
Implement unit, widget, and integration tests
Trail of Bits / property-based-testing
Property-based testing for multiple languages and smart contracts
Anthropic / webapp-testing
Test local web applications using Playwright
Browserbase / browserbase-cli
CLI wrapper around the Browserbase platform
Browserbase / fetch
Fetch HTML, JSON, headers, and status codes through the Browserbase API
Browserbase / functions
Deploy browser automation scripts as serverless cloud functions
Browserbase / search
Search the web via the Browserbase API with structured results
Browserbase / ui-test
Run adversarial UI tests by analyzing git diffs in a real browser
Expo / expo-dev-client
Build and distribute Expo dev clients locally or via TestFlight
HashiCorp / terraform-test
Built-in testing framework for Terraform configurations with .tftest.hcl files
OpenAI / gh-address-comments
Address review and issue comments on open GitHub PRs via CLI
OpenAI / security-best-practices
Review code for language-specific security vulnerabilities
Datadog Labs / dd-llmo-eval-trace-rca
Root-cause LLM app failures using eval traces
Datadog Labs / dd-monitors
Manage Datadog monitors through the pup CLI
Browserbase / browser
Automate web browser interactions through natural language CLI commands
Cloudflare / workers-best-practices
Review and author Workers code against production best practices and wrangler.jsonc conventions
HashiCorp / provider-test-patterns
Acceptance test patterns for Terraform providers using terraform-plugin-testing
HashiCorp / run-acceptance-tests
Run acceptance tests for Terraform providers using Go's test runner
OpenAI / aspnet-core
Build, review, and architect ASP.NET Core apps (Blazor, MVC, Minimal APIs, etc.)
OpenAI / gh-fix-ci
Debug and fix failing GitHub Actions PR checks using log inspection
Binance / spot
Place and manage spot trading orders on Binance via API key authentication, supporting mainnet and testnet
Datadog Labs / dd-apm
Query Datadog APM data directly from your editor
Datadog Labs / dd-docs
Look up Datadog documentation via the LLM-optimized docs index
Datadog Labs / dd-llmo-eval-bootstrap
Analyze production LLM traces and generate evaluators
Datadog Labs / dd-llmo-experiment-analyzer
Analyze single or comparative LLM experiment results
Datadog Labs / dd-logs
Search, filter, and archive Datadog logs through pup CLI
Datadog Labs / dd-pup
Rust-based CLI (pup) for talking to the Datadog API
MongoDB / atlas-stream-processing
Build, operate, and debug Atlas Stream Processing pipelines with Kafka, S3, and Lambda integrations
Trail of Bits / differential-review
Security-focused diff review with git history analysis
Trail of Bits / modern-python
Modern Python tooling with uv, ruff, ty, and pytest best practices
Trail of Bits / semgrep-rule-variant-creator
Port existing Semgrep rules to new target languages with test-driven validation
Trail of Bits / testing-handbook-skills
Testing Handbook skills: fuzzers, static analysis, sanitizers
grill-me
Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
grill-with-docs
Grilling session that challenges your plan against the existing domain model, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise. Use when user wants to stress-test a plan against their project's language and documented decisions.
minimal-run-and-audit
RigorPilot trusted execution and reporting skill for README-first deep learning repo reproduction. Use when the task is specifically to capture or normalize evidence from the selected smoke test or documented inference or evaluation command and write standardized `repro_outputs/` files, including patch notes when repository files changed. Do not use for training execution, initial repo intake, generic environment setup, paper lookup, target selection, hidden scientific-meaning changes, or end...
polish
Performs a final quality pass fixing alignment, spacing, consistency, and micro-detail issues before shipping. Use when the user mentions polish, finishing touches, pre-launch review, something looks off, or wants to go from good to great.
critique
Evaluate design from a UX perspective, assessing visual hierarchy, information architecture, emotional resonance, cognitive load, and overall quality with quantitative scoring, persona-based testing, automated anti-pattern detection, and actionable feedback. Use when the user asks to review, critique, evaluate, or give feedback on a design or component.
audit
Run technical quality checks across accessibility, performance, theming, responsive design, and anti-patterns. Generates a scored report with P0-P3 severity ratings and actionable plan. Use when the user wants an accessibility check, performance audit, or technical quality review.
quieter
Tones down visually aggressive or overstimulating designs, reducing intensity while preserving quality. Use when the user mentions too bold, too loud, overwhelming, aggressive, garish, or wants a calmer, more refined aesthetic.
video-outpainting
Video outpainting on RunComfy via the `runcomfy` CLI — extend the spatial canvas of a video, change aspect ratio (9:16 vertical to 16:9 horizontal or vice versa), add environment beyond the original frame while preserving the central action. Routes prompt-shaped spatial extension through Wan 2-7 edit-video and points the agent at dedicated ComfyUI outpaint workflows when seam quality matters for hero delivery. Triggers on "video outpaint", "video outpainting", "extend video canvas", "expand v...
elevenlabs-music-generation
Generate full songs and instrumental tracks with ElevenLabs Music on RunComfy via the `runcomfy` CLI. ElevenLabs Music turns a style description plus structured lyrics into studio-quality 44.1 kHz stereo audio — 5 seconds to 5 minutes — with section-level control (Intro / Verse / Chorus / Bridge), multilingual vocals, and commercial-friendly output. Generate a backing track, a full vocal song, a jingle, a podcast intro, a game loop, or an instrumental bed. Calls `runcomfy run elevenlabs/eleve...
hyperframes-cli
HyperFrames CLI dev loop — `npx hyperframes` for scaffolding (init), validation (lint, inspect), preview, render, and environment troubleshooting (doctor, browser, info, upgrade). Use when running any of these commands or troubleshooting the HyperFrames build/render environment. For asset preprocessing commands (`tts`, `transcribe`, `remove-background`), invoke the `hyperframes-media` skill instead.
playwright-cli
Automate browser interactions, test web pages and work with Playwright tests.
documentation-writer
Diátaxis Documentation Expert. An expert technical writer specializing in creating high-quality software documentation, guided by the principles and structure of the Diátaxis technical documentation authoring framework.
prd
Generate high-quality Product Requirements Documents (PRDs) for software systems and AI-powered features. Includes executive summaries, user stories, technical specifications, and risk analysis.
playwright-generate-test
Generate a Playwright test based on a scenario using Playwright MCP
javascript-typescript-jest
Best practices for writing JavaScript/TypeScript tests using Jest, including mocking strategies, test structure, and common patterns.
pytest-coverage
Run pytest tests with coverage, discover lines missing coverage, and increase coverage to 100%.
java-junit
Get best practices for JUnit 5 unit testing, including data-driven tests
prompt-builder
Guide users through creating high-quality GitHub Copilot prompts with proper structure, tools, and best practices.
csharp-xunit
Get best practices for XUnit unit testing, including data-driven tests
breakdown-plan
Issue Planning and Automation prompt that generates comprehensive project plans with Epic > Feature > Story/Enabler > Test hierarchy, dependencies, priorities, and automated tracking.
breakdown-test
Test Planning and Quality Assurance prompt that generates comprehensive test strategies, task breakdowns, and quality validation plans for GitHub projects.
csharp-nunit
Get best practices for NUnit unit testing, including data-driven tests
quasi-coder
Expert 10x engineer skill for interpreting and implementing code from shorthand, quasi-code, and natural language descriptions. Use when collaborators provide incomplete code snippets, pseudo-code, or descriptions with potential typos or incorrect terminology. Excels at translating non-technical or semi-technical descriptions into production-quality code.
csharp-mstest
Get best practices for MSTest 3.x/4.x unit testing, including modern assertion APIs and data-driven tests
csharp-tunit
Get best practices for TUnit unit testing, including data-driven tests
data-visualization
Create effective data visualizations with Python (matplotlib, seaborn, plotly). Use when building charts, choosing the right chart type for a dataset, creating publication-quality figures, or applying design principles like accessibility and color theory.
claude-md-improver
Audit and improve CLAUDE.md files in repositories. Use when user asks to check, audit, update, improve, or fix CLAUDE.md files. Scans for all CLAUDE.md files, evaluates quality against templates, outputs quality report, then makes targeted updates. Also use when the user mentions "CLAUDE.md maintenance" or "project memory optimization".
tech-debt
Identify, categorize, and prioritize technical debt. Trigger with "tech debt", "technical debt audit", "what should we refactor", "code health", or when the user asks about code quality, refactoring priorities, or maintenance backlog.
testing-strategy
Design test strategies and test plans. Trigger with "how should we test", "test strategy for", "write tests for", "test plan", "what tests do we need", or when the user needs help with testing approaches, coverage, or test architecture.
explore-data
Profile and explore a dataset to understand its shape, quality, and patterns. Use when encountering a new table or file, checking null rates and column distributions, spotting data quality issues like duplicates or suspicious values, or deciding which dimensions and metrics to analyze.
create-viz
Create publication-quality visualizations with Python. Use when turning query results or a DataFrame into a chart, selecting the right chart type for a trend or comparison, generating a plot for a report or presentation, or needing an interactive chart with hover and zoom.
playwright-dev
Explains how to develop Playwright - add APIs, MCP tools, CLI commands, and vendor dependencies.
dev
Development workflows for the playwright-cli repository. Use when the user asks about rolling dependencies, releasing, or other repo maintenance tasks.
research-synthesis
Synthesize user research into themes, insights, and recommendations. Use when you have interview transcripts, survey results, usability test notes, support tickets, or NPS responses that need to be distilled into patterns, user segments, and prioritized next steps.
gtm-product-led-growth
Build self-serve acquisition and expansion motions. Use when deciding PLG vs sales-led, optimizing activation, driving freemium conversion, building growth equations, or recognizing when product complexity demands human touch. Includes the parallel test where sales-led won 10x on revenue.
gtm-0-to-1-launch
Launch new products from idea to first customers. Use when launching products, finding early adopters, building launch week playbooks, diagnosing why adoption stalls, or learning that press coverage does not equal growth. Includes the three-layer diagnosis, the 2-week experiment cycle, and the launch that got 50K impressions and 12 signups.
earnings-analysis
Create professional equity research earnings update reports (8-12 pages, 3,000-5,000 words) analyzing quarterly results for companies already under coverage. Fast-turnaround format focusing on beat/miss analysis, key metrics, updated estimates, and revised thesis. Includes 1-3 summary tables and 8-12 charts. Use when user requests "earnings update", "quarterly update", "earnings analysis", "Q1/Q2/Q3/Q4 results", or post-earnings report.
email-sequence
Design and draft multi-email sequences with full copy, timing, branching logic, exit conditions, and performance benchmarks. Use when building onboarding, lead nurture, re-engagement, win-back, or product launch flows, when you need a complete drip campaign with A/B test suggestions, or when mapping a sequence end-to-end with a flow diagram.
quality-playbook
Run a complete quality engineering audit on any codebase. Derives behavioral requirements from the code, generates spec-traced functional tests, runs a three-pass code review with regression tests, executes a multi-model spec audit (Council of Three), and produces a consolidated bug report with TDD-verified patches. Finds the 35% of real defects that structural code review alone cannot catch. Works with any language. Trigger on 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to...
hygiene
Use when making code changes to ensure they pass VS Code's hygiene checks. Covers the pre-commit hook, unicode restrictions, string quoting rules, copyright headers, indentation, formatting, ESLint, and stylelint. Run the hygiene check before declaring work complete.
update-screenshots
Download screenshot baselines from the latest CI run and commit them. Use when asked to update, accept, or refresh component screenshot baselines from CI, or after the screenshot-test GitHub Action reports differences. This skill should be run as a subagent.
threat-model-analyst
Full STRIDE-A threat model analysis and incremental update skill for repositories and systems. Supports two modes: (1) Single analysis — full STRIDE-A threat model of a repository, producing architecture overviews, DFD diagrams, STRIDE-A analysis, prioritized findings, and executive assessments. (2) Incremental analysis — takes a previous threat model report as baseline, compares the codebase at the latest (or a given commit), and produces an updated report with change tracking (new, resolved...
arize-prompt-optimization
Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.
arize-dataset
Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.
arize-experiment
Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.
math-olympiad
Solve competition math problems (IMO, Putnam, USAMO, AIME) with adversarial verification that catches the errors self-verification misses. Activates when asked to 'solve this IMO problem', 'prove this olympiad inequality', 'verify this competition proof', 'find a counterexample', 'is this proof correct', or for any problem with 'IMO', 'Putnam', 'USAMO', 'olympiad', or 'competition math' in it. Uses pure reasoning (no tools) — then a fresh-context adversarial verifier attacks the proof using s...
initiating-coverage
Create institutional-quality equity research initiation reports through a 5-task workflow. Tasks must be executed individually with verified prerequisites - (1) company research, (2) financial modeling, (3) valuation analysis, (4) chart generation, (5) final report assembly. Each task produces specific deliverables (markdown docs, Excel models, charts, or DOCX reports). Tasks 3-5 have dependencies on earlier tasks.
audit-xls
Audit a spreadsheet for formula accuracy, errors, and common mistakes. Scopes to a selected range, a single sheet, or the entire model (including financial-model integrity checks like BS balance, cash tie-out, and logic sanity). Triggers on "audit this sheet", "check my formulas", "find formula errors", "QA this spreadsheet", "sanity check this", "debug model", "model check", "model won't balance", "something's off in my model", "model review".
hatch-pet
Create, repair, validate, visually QA, and package Codex-compatible animated pets and pet spritesheets from character art, generated images, company or prospect brand cues, or visual references. Use when a user wants a lightweight-worker Codex pet workflow, a non-pixel custom pet style, a prospect or company mascot pet, or a full 8x9 animated pet atlas with transparent unused cells, QA contact sheets, and pet.json packaging. This skill composes the installed $imagegen system skill for visual...
Want the ready-to-ship business bundle?
500+ agent skills, $15 one-time, lifetime access. 20 categories spanning content, marketing, sales, finance, legal, ops, SEO & more — finished deliverables, not drafts. Works with Claude Code, Codex, Cursor & every agent runtime.