Install
Quick install
npx skills add https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/sadd/skills/judgenpx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent claude-codenpx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent cursornpx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent codexnpx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent opencodenpx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent github-copilotnpx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent windsurfMore install options
Shorthand — useful for multi-skill repos:
npx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator"Manual — clone the repo and drop the folder into your agent's skills directory:
git clone https://github.com/NeoLabHQ/context-engineering-kit.gitcp -r context-engineering-kit/plugins/sadd/skills/judge ~/.claude/skills/LLM-as-Judge Evaluator
Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments
What is it?
Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments Built for use cases involving llm-as-judge, evaluation, context-isolation, multi-dimensional-scoring, evidence-based.
How to use it?
Install this skill in your Claude environment to enhance llm-as-judge evaluator capabilities. Once installed, Claude will automatically apply the skill's guidelines when relevant tasks are detected. You can also explicitly invoke it by referencing its name in your prompts.The full source and documentation is available on GitHub.
Key Features
- Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments
- Seamless integration with Claude's development workflow
- Comprehensive guidelines and best practices for llm-as-judge evaluatorView on GitHub
GitHub Stats
StarsForksLast UpdateAuthorNeoLabHQLicenseGPL-3.0Version1.0.0Categories
AI & MLDeveloper ToolsTags
llm-as-judgeevaluationcontext-isolationmulti-dimensional-scoringevidence-basedFeatures
Related Skills
More from AI & MLMulti-Agent Architecture Patterns
Reference guide for multi-agent architecture patterns including Supervisor/Orchestrator, Peer-to-Peer/Swarm, and Hierarchical, with context isolation principles and Claude Code implementation433NeoLabHQAI & MLDeveloper Tools00
Agent Evaluation Framework
Comprehensive Claude Code agent evaluation framework with multi-dimensional scoring, LLM-as-Judge mode, and research-backed performance variance analysis433NeoLabHQAI & MLDeveloper Tools00
Multi-Perspective Critique
Multi-perspective review system using Multi-Agent Debate and LLM-as-Judge patterns with 3 specialized judges, debate rounds, and consensus building433NeoLabHQAI & MLDeveloper Tools00
---
Source: https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/sadd/skills/judge
Author: NeoLabHQ
License: https://www.gnu.org/licenses/gpl-3.0.html
GitHub Stars: 433
Tags: llm-as-judge, evaluation, context-isolation, multi-dimensional-scoring, evidence-based
SKILL.md source
--- name: LLM-as-Judge Evaluator description: Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments --- # LLM-as-Judge Evaluator Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments What is it? Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments Built for use cases involving llm-as-judge, evaluation, context-isolation, multi-dimensional-scoring, evidence-based. ## How to use it? Install this skill in your Claude environment to enhance llm-as-judge evaluator capabilities. Once installed, Claude will automatically apply the skill's guidelines when relevant tasks are detected. You can also explicitly invoke it by referencing its name in your prompts. The full source and documentation is available on GitHub. ## Key Features * Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments * Seamless integration with Claude's development workflow * Comprehensive guidelines and best practices for llm-as-judge evaluatorView on GitHub ### GitHub Stats StarsForksLast UpdateAuthorNeoLabHQLicenseGPL-3.0Version1.0.0 ### Categories AI & MLDeveloper Tools ### Tags llm-as-judgeevaluationcontext-isolationmulti-dimensional-scoringevidence-based ### Features ## Related Skills More from AI & ML ### Multi-Agent Architecture Patterns Reference guide for multi-agent architecture patterns including Supervisor/Orchestrator, Peer-to-Peer/Swarm, and Hierarchical, with context isolation principles and Claude Code implementation 433NeoLabHQAI & MLDeveloper Tools00 ### Agent Evaluation Framework Comprehensive Claude Code agent evaluation framework with multi-dimensional scoring, LLM-as-Judge mode, and research-backed performance variance analysis 433NeoLabHQAI & MLDeveloper Tools00 ### Multi-Perspective Critique Multi-perspective review system using Multi-Agent Debate and LLM-as-Judge patterns with 3 specialized judges, debate rounds, and consensus building 433NeoLabHQAI & MLDeveloper Tools00 --- **Source**: https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/sadd/skills/judge **Author**: NeoLabHQ **License**: https://www.gnu.org/licenses/gpl-3.0.html **GitHub Stars**: 433 **Tags**: llm-as-judge, evaluation, context-isolation, multi-dimensional-scoring, evidence-based
Related skills 6
running-claude-code-via-litellm-copilot
Use when routing Claude Code through a local LiteLLM proxy to GitHub Copilot, reducing direct Anthropic spend, configuring ANTHROPIC_BASE_URL or ANTHROPIC_MODEL overrides, or troubleshooting Copilot proxy setup failures such as model-not-found, no localhost traffic, or GitHub 401/403 auth errors.
skills-cli
Use when users ask to discover, install, list, check, update, remove, back up, restore, sync, or initialize Agent Skills, mention `bunx skills`, `npx skills`, `skills.sh`, or `skills-lock.json`, ask "find a skill for X", or want help extending agent capabilities with installable skills.
repo-intake-and-plan
Narrow RigorPilot helper for README-first deep learning repo reproduction. Use when the task is specifically to scan a repository, read the README and common project files, extract documented commands, classify inference, evaluation, and training candidates, and return the smallest trustworthy reproduction plan to the main orchestrator. Do not use for environment setup, asset download, command execution, final reporting, paper lookup, or end-to-end orchestration.
image-to-video
Animate any still image on RunComfy — this skill is a smart router that matches the user's intent to the right i2v model in the RunComfy catalog. Picks HappyHorse 1.0 I2V (Arena #1, native audio, identity preservation) for general animations, Wan 2.7 with `audio_url` for custom-voiceover lip-sync, or Seedance 2.0 Pro for multi-modal animation from image + reference video + reference audio. Bundles each model's documented prompting patterns so the caller gets sharper output without burning ite...
video-edit
Edit existing video on RunComfy — this skill is a smart router that matches the user's intent to the right edit model in the RunComfy catalog. Picks Wan 2.7 Edit-Video (general restyle / background swap / packaging swap, identity + motion preservation), Kling 2.6 Pro Motion Control (transfer precise motion from a reference video to a target character), or Lucy Edit Restyle (lightweight identity-stable restyle / outfit swap). Bundles each model's documented prompting patterns so the skill gets...
nano-banana-2
Generate images with Google Nano Banana 2 (Gemini-family flash-tier text-to-image) on RunComfy — bundled with the model's documented prompting patterns so the skill gets sharper output than naive prompting against the same model. Documents Nano Banana 2's strengths (rapid iteration, in-image typography rendering, predictable framing, optional web-grounded context), the resolution-tier pricing, the safety-tolerance dial, and when to route to Nano Banana Pro / GPT Image 2 / Flux 2 / Seedream in...