Test Skill With Subagents
Test any Claude skill using RED-GREEN-REFACTOR cycle with subagent pressure testing to verify the skill resists agent rationalization and bypass attempts
Install
Quick install
npx skills add https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/customaize-agent/skills/test-skillnpx skills add NeoLabHQ/context-engineering-kit --skill "Test Skill with Subagents" --agent claude-codenpx skills add NeoLabHQ/context-engineering-kit --skill "Test Skill with Subagents" --agent cursornpx skills add NeoLabHQ/context-engineering-kit --skill "Test Skill with Subagents" --agent codexnpx skills add NeoLabHQ/context-engineering-kit --skill "Test Skill with Subagents" --agent opencodenpx skills add NeoLabHQ/context-engineering-kit --skill "Test Skill with Subagents" --agent github-copilotnpx skills add NeoLabHQ/context-engineering-kit --skill "Test Skill with Subagents" --agent windsurfMore install options
Shorthand — useful for multi-skill repos:
npx skills add NeoLabHQ/context-engineering-kit --skill "Test Skill with Subagents"Manual — clone the repo and drop the folder into your agent's skills directory:
git clone https://github.com/NeoLabHQ/context-engineering-kit.gitcp -r context-engineering-kit/plugins/customaize-agent/skills/test-skill ~/.claude/skills/Test Skill with Subagents
Test any Claude skill using RED-GREEN-REFACTOR cycle with subagent pressure testing to verify the skill resists agent rationalization and bypass attempts
What is it?
Testing skills is just TDD applied to process documentation.
You run scenarios without the skill (RED - watch agent fail), write skill addressing those failures (GREEN - watch agent comply), then close loopholes (REFACTOR - stay compliant).
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill prevents the right failures.
REQUIRED BACKGROUND: You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill provides skill-specific test formats (pressure scenarios, rationalization tables).
Complete worked example: See examples/CLAUDE_MD_TESTING.md for a full test campaign testing CLAUDE.md documentation variants.
How to use it?
`IMPORTANT: This is a real scenario. You must choose and act.
Don't ask hypothetical questions - make the actual decision.
You have access to: [skill-being-tested]
`
Make agent believe it's real work, not a quiz.
Key Features
- Test any Claude skill using RED-GREEN-REFACTOR cycle with subagent pressure testing to verify the skill resists agent rationalization and bypass attempts
- Seamless integration with Claude's development workflow
- Comprehensive guidelines and best practices for test skill with subagentsView on GitHub
GitHub Stats
StarsForksLast UpdateAuthorNeoLabHQLicenseGPL-3.0Version1.0.0Categories
AI & MLTestingTags
skill-testingtddpressure-testingsubagentquality-assuranceFeatures
Related Skills
More from AI & MLTest Any Prompt
Universal prompt testing methodology using RED-GREEN-REFACTOR cycle with subagents, supporting A/B testing and regression testing for commands, hooks, skills, and production prompts433NeoLabHQAI & MLTesting00
Review Local Changes
Multi-agent code review system for uncommitted changes with 6 specialized reviewer roles (security, bug, quality, contract, testing, history), confidence scoring and false positive filtering433NeoLabHQDeveloper ToolsTesting00
Review Pull Request
Multi-agent PR review system with specialized reviewers, inline comments, and automatic PR description generation using GitHub CLI433NeoLabHQDeveloper ToolsTesting00
---
Source: https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/customaize-agent/skills/test-skill
Author: NeoLabHQ
License: https://www.gnu.org/licenses/gpl-3.0.html
GitHub Stars: 433
Tags: skill-testing, tdd, pressure-testing, subagent, quality-assurance
SKILL.md source
--- name: Test Skill with Subagents description: Test any Claude skill using RED-GREEN-REFACTOR cycle with subagent pressure testing to verify the skill resists agent rationalization and bypass attempts --- # Test Skill with Subagents Test any Claude skill using RED-GREEN-REFACTOR cycle with subagent pressure testing to verify the skill resists agent rationalization and bypass attempts What is it? Testing skills is just TDD applied to process documentation. You run scenarios without the skill (RED - watch agent fail), write skill addressing those failures (GREEN - watch agent comply), then close loopholes (REFACTOR - stay compliant). Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill prevents the right failures. REQUIRED BACKGROUND: You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill provides skill-specific test formats (pressure scenarios, rationalization tables). Complete worked example: See examples/CLAUDE_MD_TESTING.md for a full test campaign testing CLAUDE.md documentation variants. ## How to use it? ``` `IMPORTANT: This is a real scenario. You must choose and act. Don't ask hypothetical questions - make the actual decision. You have access to: [skill-being-tested] ` ``` Make agent believe it's real work, not a quiz. ## Key Features * Test any Claude skill using RED-GREEN-REFACTOR cycle with subagent pressure testing to verify the skill resists agent rationalization and bypass attempts * Seamless integration with Claude's development workflow * Comprehensive guidelines and best practices for test skill with subagentsView on GitHub ### GitHub Stats StarsForksLast UpdateAuthorNeoLabHQLicenseGPL-3.0Version1.0.0 ### Categories AI & MLTesting ### Tags skill-testingtddpressure-testingsubagentquality-assurance ### Features ## Related Skills More from AI & ML ### Test Any Prompt Universal prompt testing methodology using RED-GREEN-REFACTOR cycle with subagents, supporting A/B testing and regression testing for commands, hooks, skills, and production prompts 433NeoLabHQAI & MLTesting00 ### Review Local Changes Multi-agent code review system for uncommitted changes with 6 specialized reviewer roles (security, bug, quality, contract, testing, history), confidence scoring and false positive filtering 433NeoLabHQDeveloper ToolsTesting00 ### Review Pull Request Multi-agent PR review system with specialized reviewers, inline comments, and automatic PR description generation using GitHub CLI 433NeoLabHQDeveloper ToolsTesting00 --- **Source**: https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/customaize-agent/skills/test-skill **Author**: NeoLabHQ **License**: https://www.gnu.org/licenses/gpl-3.0.html **GitHub Stars**: 433 **Tags**: skill-testing, tdd, pressure-testing, subagent, quality-assurance
Related skills 6
agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction. Also use...
grill-me
Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me".
grill-with-docs
Grilling session that challenges your plan against the existing domain model, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise. Use when user wants to stress-test a plan against their project's language and documented decisions.
minimal-run-and-audit
RigorPilot trusted execution and reporting skill for README-first deep learning repo reproduction. Use when the task is specifically to capture or normalize evidence from the selected smoke test or documented inference or evaluation command and write standardized `repro_outputs/` files, including patch notes when repository files changed. Do not use for training execution, initial repo intake, generic environment setup, paper lookup, target selection, hidden scientific-meaning changes, or end...
polish
Performs a final quality pass fixing alignment, spacing, consistency, and micro-detail issues before shipping. Use when the user mentions polish, finishing touches, pre-launch review, something looks off, or wants to go from good to great.
critique
Evaluate design from a UX perspective, assessing visual hierarchy, information architecture, emotional resonance, cognitive load, and overall quality with quantitative scoring, persona-based testing, automated anti-pattern detection, and actionable feedback. Use when the user asks to review, critique, evaluate, or give feedback on a design or component.