google-agents-cli-eval
This skill should be used when the user wants to "run an evaluation", "evaluate my ADK agent", "write an evalset", "debug eval scores", "compare eval results", or needs guidance on ADK (Agent Development Kit) evaluation methodology and the eval-fix loop. Covers eval metrics, evalset schema, LLM-as-judge, tool trajectory scoring, and common failure causes. Part of the Google ADK (Agent Development Kit) skills suite. Do NOT use for API code patterns (use google-agents-cli-adk-code), deployment...
This skill ships only metadata — no inline instructions. See the source repo for details.
Install this skill
One command (all agents)
Runs the npx skills CLI which auto-detects every AI coding agent you have installed (Claude Code, Cursor, Codex, OpenCode, Windsurf, Copilot, and 51 more).
npx skills add https://github.com/google/agents-cliAlternative: shorthand form
npx skills add google/agents-cliInstall to a specific agent
Pick the agent you use. The CLI writes the skill to that agent's standard skill directory.
npx skills add google/agents-cli --agent claude-codenpx skills add google/agents-cli --agent cursornpx skills add google/agents-cli --agent codexnpx skills add google/agents-cli --agent opencodenpx skills add google/agents-cli --agent github-copilotnpx skills add google/agents-cli --agent windsurfManual install (no CLI)
Prefer to skip the CLI? Clone the repo and drop the skill folder into your agent's skills directory.
git clone https://github.com/google/agents-cli.gitcp -r agents-cli ~/.claude/skills/ For other agents, replace ~/.claude/skills/ with their skill directory — see the full list.
Use it
Once installed, ask your agent to "use the google-agents-cli-eval skill" or describe what you want (e.g. "This skill should be used when the user wants to "run an evaluation", "evaluate"). Most agents auto-discover the skill from its SKILL.md description — no slash command needed.
npx skills. Skill files are MIT-style permissive by default — check the source repo for the actual license.
SKILL.md source
--- name: google-agents-cli-eval description: This skill should be used when the user wants to "run an evaluation", "evaluate my ADK agent", "write an evalset", "debug eval scores", "compare eval results", or needs guidance on ADK (Agent Developm ---
Want the ready-to-ship business bundle?
500+ agent skills, $15 one-time, lifetime access. 20 categories spanning content, marketing, sales, finance, legal, ops, SEO & more — finished deliverables, not drafts. Works with Claude Code, Codex, Cursor & every agent runtime.
Related skills 6
azure-storage
Azure Storage Services including Blob Storage, File Shares, Queue Storage, Table Storage, and Data Lake. Answers questions about storage access tiers (hot, cool, cold, archive), when to use each tier, and tier comparison. Provides object storage, SMB file shares, async messaging, NoSQL key-value, and big data analytics. Includes lifecycle management. USE FOR: blob storage, file shares, queue storage, table storage, data lake, upload files, download blobs, storage accounts, access tiers, stora...
azure-kusto
Query and analyze data in Azure Data Explorer (Kusto/ADX) using KQL for log analytics, telemetry, and time series analysis. WHEN: KQL queries, Kusto database queries, Azure Data Explorer, ADX clusters, log analytics, time series data, IoT telemetry, anomaly detection.
azure-aigateway
Configure Azure API Management as an AI Gateway for AI models, MCP tools, and agents. WHEN: semantic caching, token limit, content safety, load balancing, AI model governance, MCP rate limiting, jailbreak detection, add Azure OpenAI backend, add AI Foundry model, test AI gateway, LLM policies, configure AI backend, token metrics, AI cost control, convert API to MCP, import OpenAPI to gateway.
azure-compute
Azure VM and VMSS router for recommendations, pricing, autoscale, orchestration, connectivity troubleshooting, capacity reservations, and Essential Machine Management. WHEN: Azure VM, VMSS, scale set, recommend, compare, server, website, burstable, lightweight, VM family, workload, GPU, learning, simulation, dev/test, backend, autoscale, load balancer, Flexible orchestration, Uniform orchestration, cost estimate, connect, refused, Linux, black screen, reset password, reach VM, port 3389, NSG,...
azure-cloud-migrate
Assess and migrate cross-cloud workloads to Azure with reports and code conversion. Supports Lambda→Functions, Beanstalk/Heroku/App Engine→App Service, Fargate/Kubernetes/Cloud Run/Spring Boot→Container Apps. WHEN: migrate Lambda to Functions, AWS to Azure, migrate Beanstalk, migrate Heroku, migrate App Engine, Cloud Run migration, Fargate to ACA, ECS/Kubernetes/GKE/EKS to Container Apps, Spring Boot to Container Apps, cross-cloud migration.
azure-upgrade
Assess and upgrade Azure workloads between plans, tiers, or SKUs, or modernize Azure SDK dependencies in source code. WHEN: upgrade Consumption to Flex Consumption, upgrade Azure Functions plan, change hosting plan, function app SKU, migrate App Service to Container Apps, modernize legacy Azure Java SDKs (com.microsoft.azure to com.azure), migrate Azure Cache for Redis (ACR/ACRE) to Azure Managed Redis (AMR).