Testing & Quality Skills — Free AI Agent Skills

agent-browser

★ Featured Official

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. Also use...

vercel-labs 297k

Testing & Quality

OpenAI / develop-web-game

★ Featured Official

Build and test web games iteratively using Playwright with time-stepping

OpenAI 3

AI & ML

Browserbase / cookie-sync

★ Featured Official

Export cookies from local Chrome into a Browserbase persistent context

Browserbase 2

Testing & Quality

Flutter / flutter-testing-apps

★ Featured Official

Implement unit, widget, and integration tests

Flutter 2

Mobile

Trail of Bits / property-based-testing

★ Featured Official

Property-based testing for multiple languages and smart contracts

Trail of Bits 2

Security

Anthropic / webapp-testing

★ Featured Official

Test local web applications using Playwright

Anthropic 1

Testing & Quality

Browserbase / browserbase-cli

★ Featured Official

CLI wrapper around the Browserbase platform

Browserbase 1

Testing & Quality

Browserbase / fetch

★ Featured Official

Fetch HTML, JSON, headers, and status codes through the Browserbase API

Browserbase 1

Testing & Quality

Browserbase / functions

★ Featured Official

Deploy browser automation scripts as serverless cloud functions

Browserbase 1

Testing & Quality

Browserbase / search

★ Featured Official

Search the web via the Browserbase API with structured results

Browserbase 1

Testing & Quality

Browserbase / ui-test

★ Featured Official

Run adversarial UI tests by analyzing git diffs in a real browser

Browserbase 1

Testing & Quality

Expo / expo-dev-client

★ Featured Official

Build and distribute Expo dev clients locally or via TestFlight

Expo 1

Mobile

HashiCorp / terraform-test

★ Featured Official

Built-in testing framework for Terraform configurations with .tftest.hcl files

HashiCorp 1

DevOps & Infrastructure

OpenAI / gh-address-comments

★ Featured Official

Address review and issue comments on open GitHub PRs via CLI

OpenAI 1

AI & ML

OpenAI / security-best-practices

★ Featured Official

Review code for language-specific security vulnerabilities

OpenAI 1

AI & ML

Datadog Labs / dd-llmo-eval-trace-rca

★ Featured Official

Root-cause LLM app failures using eval traces

Datadog Labs 1

DevOps & Infrastructure

Datadog Labs / dd-monitors

★ Featured Official

Manage Datadog monitors through the pup CLI

Datadog Labs 1

DevOps & Infrastructure

Browserbase / browser

★ Featured Official

Automate web browser interactions through natural language CLI commands

Browserbase

Testing & Quality

Cloudflare / workers-best-practices

★ Featured Official

Review and author Workers code against production best practices and wrangler.jsonc conventions

Cloudflare

DevOps & Infrastructure

HashiCorp / provider-test-patterns

★ Featured Official

Acceptance test patterns for Terraform providers using terraform-plugin-testing

HashiCorp

DevOps & Infrastructure

HashiCorp / run-acceptance-tests

★ Featured Official

Run acceptance tests for Terraform providers using Go's test runner

HashiCorp

DevOps & Infrastructure

OpenAI / aspnet-core

★ Featured Official

Build, review, and architect ASP.NET Core apps (Blazor, MVC, Minimal APIs, etc.)

OpenAI

AI & ML

OpenAI / gh-fix-ci

★ Featured Official

Debug and fix failing GitHub Actions PR checks using log inspection

OpenAI

AI & ML

Binance / spot

★ Featured Official

Place and manage spot trading orders on Binance via API key authentication, supporting mainnet and testnet

Binance

Finance & Crypto

Datadog Labs / dd-apm

★ Featured Official

Query Datadog APM data directly from your editor

Datadog Labs

DevOps & Infrastructure

Datadog Labs / dd-docs

★ Featured Official

Look up Datadog documentation via the LLM-optimized docs index

Datadog Labs

DevOps & Infrastructure

Datadog Labs / dd-llmo-eval-bootstrap

★ Featured Official

Analyze production LLM traces and generate evaluators

Datadog Labs

DevOps & Infrastructure

Datadog Labs / dd-llmo-experiment-analyzer

★ Featured Official

Analyze single or comparative LLM experiment results

Datadog Labs

DevOps & Infrastructure

Datadog Labs / dd-logs

★ Featured Official

Search, filter, and archive Datadog logs through pup CLI

Datadog Labs

DevOps & Infrastructure

Datadog Labs / dd-pup

★ Featured Official

Rust-based CLI (pup) for talking to the Datadog API

Datadog Labs

DevOps & Infrastructure

MongoDB / atlas-stream-processing

★ Featured Official

Build, operate, and debug Atlas Stream Processing pipelines with Kafka, S3, and Lambda integrations

MongoDB

Backend & Database

Trail of Bits / differential-review

★ Featured Official

Security-focused diff review with git history analysis

Trail of Bits

Security

Trail of Bits / modern-python

★ Featured Official

Modern Python tooling with uv, ruff, ty, and pytest best practices

Trail of Bits

Security

Trail of Bits / semgrep-rule-variant-creator

★ Featured Official

Port existing Semgrep rules to new target languages with test-driven validation

Trail of Bits

Security

Trail of Bits / testing-handbook-skills

★ Featured Official

Testing Handbook skills: fuzzers, static analysis, sanitizers

Trail of Bits

Security

grill-me

★ Featured

Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".

mattpocock 192k

Testing & Quality

grill-with-docs

★ Featured

Grilling session that challenges your plan against the existing domain model, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise. Use when user wants to stress-test a plan against their project's language and documented decisions.

mattpocock 138k

Testing & Quality

minimal-run-and-audit

★ Featured

RigorPilot trusted execution and reporting skill for README-first deep learning repo reproduction. Use when the task is specifically to capture or normalize evidence from the selected smoke test or documented inference or evaluation command and write standardized `repro_outputs/` files, including patch notes when repository files changed. Do not use for training execution, initial repo intake, generic environment setup, paper lookup, target selection, hidden scientific-meaning changes, or end...

lllllllama 127k

Testing & Quality

polish

★ Featured

Performs a final quality pass fixing alignment, spacing, consistency, and micro-detail issues before shipping. Use when the user mentions polish, finishing touches, pre-launch review, something looks off, or wants to go from good to great.

pbakaus 86k

Testing & Quality

critique

★ Featured

Evaluate design from a UX perspective, assessing visual hierarchy, information architecture, emotional resonance, cognitive load, and overall quality with quantitative scoring, persona-based testing, automated anti-pattern detection, and actionable feedback. Use when the user asks to review, critique, evaluate, or give feedback on a design or component.

pbakaus 83k

Testing & Quality

audit

★ Featured

Run technical quality checks across accessibility, performance, theming, responsive design, and anti-patterns. Generates a scored report with P0-P3 severity ratings and actionable plan. Use when the user wants an accessibility check, performance audit, or technical quality review.

pbakaus 82k

Testing & Quality

quieter

★ Featured

Tones down visually aggressive or overstimulating designs, reducing intensity while preserving quality. Use when the user mentions too bold, too loud, overwhelming, aggressive, garish, or wants a calmer, more refined aesthetic.

pbakaus 79k

Testing & Quality

video-outpainting

★ Featured

Video outpainting on RunComfy via the `runcomfy` CLI — extend the spatial canvas of a video, change aspect ratio (9:16 vertical to 16:9 horizontal or vice versa), add environment beyond the original frame while preserving the central action. Routes prompt-shaped spatial extension through Wan 2-7 edit-video and points the agent at dedicated ComfyUI outpaint workflows when seam quality matters for hero delivery. Triggers on "video outpaint", "video outpainting", "extend video canvas", "expand v...

agentspace-so 61k

Testing & Quality

elevenlabs-music-generation

★ Featured

Generate full songs and instrumental tracks with ElevenLabs Music on RunComfy via the `runcomfy` CLI. ElevenLabs Music turns a style description plus structured lyrics into studio-quality 44.1 kHz stereo audio — 5 seconds to 5 minutes — with section-level control (Intro / Verse / Chorus / Bridge), multilingual vocals, and commercial-friendly output. Generate a backing track, a full vocal song, a jingle, a podcast intro, a game loop, or an instrumental bed. Calls `runcomfy run elevenlabs/eleve...

agentspace-so 61k

Testing & Quality

hyperframes-cli

★ Featured

HyperFrames CLI dev loop — `npx hyperframes` for scaffolding (init), validation (lint, inspect), preview, render, and environment troubleshooting (doctor, browser, info, upgrade). Use when running any of these commands or troubleshooting the HyperFrames build/render environment. For asset preprocessing commands (`tts`, `transcribe`, `remove-background`), invoke the `hyperframes-media` skill instead.

heygen-com 58k

Testing & Quality

playwright-cli

Official

Automate browser interactions, test web pages and work with Playwright tests.

microsoft 39k

Testing & Quality

documentation-writer

Official

Diátaxis Documentation Expert. An expert technical writer specializing in creating high-quality software documentation, guided by the principles and structure of the Diátaxis technical documentation authoring framework.

github 19k

Testing & Quality

prd

Official

Generate high-quality Product Requirements Documents (PRDs) for software systems and AI-powered features. Includes executive summaries, user stories, technical specifications, and risk analysis.

github 18k

Testing & Quality

playwright-generate-test

Official

Generate a Playwright test based on a scenario using Playwright MCP

github 13k

Testing & Quality

javascript-typescript-jest

Official

Best practices for writing JavaScript/TypeScript tests using Jest, including mocking strategies, test structure, and common patterns.

github 11k

Testing & Quality

pytest-coverage

Official

Run pytest tests with coverage, discover lines missing coverage, and increase coverage to 100%.

github 10k

Testing & Quality

java-junit

Official

Get best practices for JUnit 5 unit testing, including data-driven tests

github 9.9k

Testing & Quality

prompt-builder

Official

Guide users through creating high-quality GitHub Copilot prompts with proper structure, tools, and best practices.

github 9.4k

Testing & Quality

csharp-xunit

Official

Get best practices for XUnit unit testing, including data-driven tests

github 9.0k

Testing & Quality

breakdown-plan

Official

Issue Planning and Automation prompt that generates comprehensive project plans with Epic > Feature > Story/Enabler > Test hierarchy, dependencies, priorities, and automated tracking.

github 8.8k

Testing & Quality

breakdown-test

Official

Test Planning and Quality Assurance prompt that generates comprehensive test strategies, task breakdowns, and quality validation plans for GitHub projects.

github 8.7k

Testing & Quality

csharp-nunit

Official

Get best practices for NUnit unit testing, including data-driven tests

github 8.5k

Testing & Quality

quasi-coder

Official

Expert 10x engineer skill for interpreting and implementing code from shorthand, quasi-code, and natural language descriptions. Use when collaborators provide incomplete code snippets, pseudo-code, or descriptions with potential typos or incorrect terminology. Excels at translating non-technical or semi-technical descriptions into production-quality code.

github 8.5k

Testing & Quality

csharp-mstest

Official

Get best practices for MSTest 3.x/4.x unit testing, including modern assertion APIs and data-driven tests

github 8.5k

Testing & Quality

csharp-tunit

Official

Get best practices for TUnit unit testing, including data-driven tests

github 8.4k

Testing & Quality

data-visualization

Official

Create effective data visualizations with Python (matplotlib, seaborn, plotly). Use when building charts, choosing the right chart type for a dataset, creating publication-quality figures, or applying design principles like accessibility and color theory.

anthropics 7.2k

Testing & Quality

claude-md-improver

Official

Audit and improve CLAUDE.md files in repositories. Use when user asks to check, audit, update, improve, or fix CLAUDE.md files. Scans for all CLAUDE.md files, evaluates quality against templates, outputs quality report, then makes targeted updates. Also use when the user mentions "CLAUDE.md maintenance" or "project memory optimization".

anthropics 5.1k

Testing & Quality

tech-debt

Official

Identify, categorize, and prioritize technical debt. Trigger with "tech debt", "technical debt audit", "what should we refactor", "code health", or when the user asks about code quality, refactoring priorities, or maintenance backlog.

anthropics 2.9k

Testing & Quality

testing-strategy

Official

Design test strategies and test plans. Trigger with "how should we test", "test strategy for", "write tests for", "test plan", "what tests do we need", or when the user needs help with testing approaches, coverage, or test architecture.

anthropics 2.8k

Testing & Quality

explore-data

Official

Profile and explore a dataset to understand its shape, quality, and patterns. Use when encountering a new table or file, checking null rates and column distributions, spotting data quality issues like duplicates or suspicious values, or deciding which dimensions and metrics to analyze.

anthropics 2.6k

Testing & Quality

create-viz

Official

Create publication-quality visualizations with Python. Use when turning query results or a DataFrame into a chart, selecting the right chart type for a trend or comparison, generating a plot for a report or presentation, or needing an interactive chart with hover and zoom.

anthropics 2.4k

Testing & Quality

playwright-dev

Official

Explains how to develop Playwright - add APIs, MCP tools, CLI commands, and vendor dependencies.

microsoft 1.9k

Testing & Quality

dev

Official

Development workflows for the playwright-cli repository. Use when the user asks about rolling dependencies, releasing, or other repo maintenance tasks.

microsoft 1.6k

Testing & Quality

research-synthesis

Official

Synthesize user research into themes, insights, and recommendations. Use when you have interview transcripts, survey results, usability test notes, support tickets, or NPS responses that need to be distilled into patterns, user segments, and prioritized next steps.

anthropics 1.5k

Testing & Quality

gtm-product-led-growth

Official

Build self-serve acquisition and expansion motions. Use when deciding PLG vs sales-led, optimizing activation, driving freemium conversion, building growth equations, or recognizing when product complexity demands human touch. Includes the parallel test where sales-led won 10x on revenue.

github 1.4k

Testing & Quality

gtm-0-to-1-launch

Official

Launch new products from idea to first customers. Use when launching products, finding early adopters, building launch week playbooks, diagnosing why adoption stalls, or learning that press coverage does not equal growth. Includes the three-layer diagnosis, the 2-week experiment cycle, and the launch that got 50K impressions and 12 signups.

github 1.3k

Testing & Quality

earnings-analysis

Official

Create professional equity research earnings update reports (8-12 pages, 3,000-5,000 words) analyzing quarterly results for companies already under coverage. Fast-turnaround format focusing on beat/miss analysis, key metrics, updated estimates, and revised thesis. Includes 1-3 summary tables and 8-12 charts. Use when user requests "earnings update", "quarterly update", "earnings analysis", "Q1/Q2/Q3/Q4 results", or post-earnings report.

anthropics 1.3k

Testing & Quality

email-sequence

Official

Design and draft multi-email sequences with full copy, timing, branching logic, exit conditions, and performance benchmarks. Use when building onboarding, lead nurture, re-engagement, win-back, or product launch flows, when you need a complete drip campaign with A/B test suggestions, or when mapping a sequence end-to-end with a flow diagram.

anthropics 1.2k

Testing & Quality

quality-playbook

Official

Run a complete quality engineering audit on any codebase. Derives behavioral requirements from the code, generates spec-traced functional tests, runs a three-pass code review with regression tests, executes a multi-model spec audit (Council of Three), and produces a consolidated bug report with TDD-verified patches. Finds the 35% of real defects that structural code review alone cannot catch. Works with any language. Trigger on 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to...

github 1.1k

Testing & Quality

hygiene

Official

Use when making code changes to ensure they pass VS Code's hygiene checks. Covers the pre-commit hook, unicode restrictions, string quoting rules, copyright headers, indentation, formatting, ESLint, and stylelint. Run the hygiene check before declaring work complete.

microsoft 1.0k

Testing & Quality

update-screenshots

Official

Download screenshot baselines from the latest CI run and commit them. Use when asked to update, accept, or refresh component screenshot baselines from CI, or after the screenshot-test GitHub Action reports differences. This skill should be run as a subagent.

microsoft 1.0k

Testing & Quality

threat-model-analyst

Official

Full STRIDE-A threat model analysis and incremental update skill for repositories and systems. Supports two modes: (1) Single analysis — full STRIDE-A threat model of a repository, producing architecture overviews, DFD diagrams, STRIDE-A analysis, prioritized findings, and executive assessments. (2) Incremental analysis — takes a previous threat model report as baseline, compares the codebase at the latest (or a given commit), and produces an updated report with change tracking (new, resolved...

github 991

Testing & Quality

arize-prompt-optimization

Official

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

github 864

Testing & Quality

arize-dataset

Official

Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.

github 835

Testing & Quality

arize-experiment

Official

Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.

github 833

Testing & Quality

math-olympiad

Official

Solve competition math problems (IMO, Putnam, USAMO, AIME) with adversarial verification that catches the errors self-verification misses. Activates when asked to 'solve this IMO problem', 'prove this olympiad inequality', 'verify this competition proof', 'find a counterexample', 'is this proof correct', or for any problem with 'IMO', 'Putnam', 'USAMO', 'olympiad', or 'competition math' in it. Uses pure reasoning (no tools) — then a fresh-context adversarial verifier attacks the proof using s...

anthropics 703

Testing & Quality

initiating-coverage

Official

Create institutional-quality equity research initiation reports through a 5-task workflow. Tasks must be executed individually with verified prerequisites - (1) company research, (2) financial modeling, (3) valuation analysis, (4) chart generation, (5) final report assembly. Each task produces specific deliverables (markdown docs, Excel models, charts, or DOCX reports). Tasks 3-5 have dependencies on earlier tasks.

anthropics 623

Testing & Quality

audit-xls

Official

Audit a spreadsheet for formula accuracy, errors, and common mistakes. Scopes to a selected range, a single sheet, or the entire model (including financial-model integrity checks like BS balance, cash tie-out, and logic sanity). Triggers on "audit this sheet", "check my formulas", "find formula errors", "QA this spreadsheet", "sanity check this", "debug model", "model check", "model won't balance", "something's off in my model", "model review".

anthropics 588

Testing & Quality

hatch-pet

Official

Create, repair, validate, visually QA, and package Codex-compatible animated pets and pet spritesheets from character art, generated images, company or prospect brand cues, or visual references. Use when a user wants a lightweight-worker Codex pet workflow, a non-pixel custom pet style, a prospect or company mascot pet, or a full 8x9 animated pet atlas with transparent unused cells, QA contact sheets, and pet.json packaging. This skill composes the installed $imagegen system skill for visual...

openai 585

Testing & Quality

Testing & Quality skills.

agent-browser

OpenAI / develop-web-game

Browserbase / cookie-sync

Flutter / flutter-testing-apps

Trail of Bits / property-based-testing

Anthropic / webapp-testing

Browserbase / browserbase-cli

Browserbase / fetch

Browserbase / functions

Browserbase / search

Browserbase / ui-test

Expo / expo-dev-client

HashiCorp / terraform-test

OpenAI / gh-address-comments

OpenAI / security-best-practices

Datadog Labs / dd-llmo-eval-trace-rca

Datadog Labs / dd-monitors

Browserbase / browser

Cloudflare / workers-best-practices

HashiCorp / provider-test-patterns

HashiCorp / run-acceptance-tests

OpenAI / aspnet-core

OpenAI / gh-fix-ci

Binance / spot

Datadog Labs / dd-apm

Datadog Labs / dd-docs

Datadog Labs / dd-llmo-eval-bootstrap

Datadog Labs / dd-llmo-experiment-analyzer

Datadog Labs / dd-logs

Datadog Labs / dd-pup

MongoDB / atlas-stream-processing

Trail of Bits / differential-review

Trail of Bits / modern-python

Trail of Bits / semgrep-rule-variant-creator

Trail of Bits / testing-handbook-skills

grill-me

grill-with-docs

minimal-run-and-audit

polish

critique

audit

quieter

video-outpainting

elevenlabs-music-generation

hyperframes-cli

playwright-cli

documentation-writer

prd

playwright-generate-test

javascript-typescript-jest

pytest-coverage

java-junit

prompt-builder

csharp-xunit

breakdown-plan

breakdown-test

csharp-nunit

quasi-coder

csharp-mstest

csharp-tunit

data-visualization

claude-md-improver

tech-debt

testing-strategy

explore-data

create-viz

playwright-dev

dev

research-synthesis

gtm-product-led-growth

gtm-0-to-1-launch

earnings-analysis

email-sequence

quality-playbook

hygiene

update-screenshots

threat-model-analyst

arize-prompt-optimization

arize-dataset