AI & ML

LLM As Judge Evaluator

Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments

AuthorNeoLabHQ

Version1.0.0

LicenseMIT

Token count~604

UpdatedJun 5, 2026

Install

Quick install

via npx skills · works with 57+ agents

npx skills add https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/sadd/skills/judge

Or pick agent:

npx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent claude-code

npx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent cursor

npx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent codex

npx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent opencode

npx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent github-copilot

npx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator" --agent windsurf

More install options

Shorthand — useful for multi-skill repos:

npx skills add NeoLabHQ/context-engineering-kit --skill "LLM-as-Judge Evaluator"

Manual — clone the repo and drop the folder into your agent's skills directory:

git clone https://github.com/NeoLabHQ/context-engineering-kit.git

cp -r context-engineering-kit/plugins/sadd/skills/judge ~/.claude/skills/

How to use: Once installed, ask your agent to "use the LLM-as-Judge Evaluator skill" or describe what you want (e.g. "Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought"). Requires Node.js 18+.

LLM-as-Judge Evaluator

Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments

What is it?
Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments Built for use cases involving llm-as-judge, evaluation, context-isolation, multi-dimensional-scoring, evidence-based.

How to use it?

Install this skill in your Claude environment to enhance llm-as-judge evaluator capabilities. Once installed, Claude will automatically apply the skill's guidelines when relevant tasks are detected. You can also explicitly invoke it by referencing its name in your prompts.

The full source and documentation is available on GitHub.

Key Features

Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments
Seamless integration with Claude's development workflow
Comprehensive guidelines and best practices for llm-as-judge evaluatorView on GitHub

GitHub Stats

StarsForksLast UpdateAuthorNeoLabHQLicenseGPL-3.0Version1.0.0

Features

Related Skills

Multi-Agent Architecture Patterns

Reference guide for multi-agent architecture patterns including Supervisor/Orchestrator, Peer-to-Peer/Swarm, and Hierarchical, with context isolation principles and Claude Code implementation

433NeoLabHQAI & MLDeveloper Tools00

Agent Evaluation Framework

Comprehensive Claude Code agent evaluation framework with multi-dimensional scoring, LLM-as-Judge mode, and research-backed performance variance analysis

433NeoLabHQAI & MLDeveloper Tools00

Multi-Perspective Critique

Multi-perspective review system using Multi-Agent Debate and LLM-as-Judge patterns with 3 specialized judges, debate rounds, and consensus building

433NeoLabHQAI & MLDeveloper Tools00

---

Source: https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/sadd/skills/judge
Author: NeoLabHQ
License: https://www.gnu.org/licenses/gpl-3.0.html
GitHub Stars: 433
Tags: llm-as-judge, evaluation, context-isolation, multi-dimensional-scoring, evidence-based

SKILL.md source

---
name: LLM-as-Judge Evaluator
description: Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments
---

# LLM-as-Judge Evaluator

Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments

What is it?
Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments Built for use cases involving llm-as-judge, evaluation, context-isolation, multi-dimensional-scoring, evidence-based.

## How to use it?
Install this skill in your Claude environment to enhance llm-as-judge evaluator capabilities. Once installed, Claude will automatically apply the skill's guidelines when relevant tasks are detected. You can also explicitly invoke it by referencing its name in your prompts.

The full source and documentation is available on GitHub.

## Key Features

* Standalone LLM-as-Judge evaluation tool with context isolation, Chain-of-Thought scoring, multi-dimensional weighted rubric, and evidence-backed assessments
* Seamless integration with Claude's development workflow
* Comprehensive guidelines and best practices for llm-as-judge evaluatorView on GitHub

### GitHub Stats
StarsForksLast UpdateAuthorNeoLabHQLicenseGPL-3.0Version1.0.0

### Categories
AI & MLDeveloper Tools

### Tags
llm-as-judgeevaluationcontext-isolationmulti-dimensional-scoringevidence-based

### Features

## Related Skills
More from AI & ML

### Multi-Agent Architecture Patterns
Reference guide for multi-agent architecture patterns including Supervisor/Orchestrator, Peer-to-Peer/Swarm, and Hierarchical, with context isolation principles and Claude Code implementation

433NeoLabHQAI & MLDeveloper Tools00

### Agent Evaluation Framework
Comprehensive Claude Code agent evaluation framework with multi-dimensional scoring, LLM-as-Judge mode, and research-backed performance variance analysis

433NeoLabHQAI & MLDeveloper Tools00

### Multi-Perspective Critique
Multi-perspective review system using Multi-Agent Debate and LLM-as-Judge patterns with 3 specialized judges, debate rounds, and consensus building

433NeoLabHQAI & MLDeveloper Tools00

---

**Source**: https://github.com/NeoLabHQ/context-engineering-kit/tree/master/plugins/sadd/skills/judge
**Author**: NeoLabHQ
**License**: https://www.gnu.org/licenses/gpl-3.0.html
**GitHub Stars**: 433
**Tags**: llm-as-judge, evaluation, context-isolation, multi-dimensional-scoring, evidence-based

AI & ML