Caio Review
/cs:caio-review <plan> — Eval-demanding Chief AI Officer interrogation of any plan that involves AI: model selection, risk classification, cost economics, or AI hiring.
Install
Quick install
npx skills add https://github.com/alirezarezvani/claude-skills/tree/main/c-level-advisor/c-level-agents/skills/caio-reviewnpx skills add alirezarezvani/claude-skills --skill caio-review --agent claude-codenpx skills add alirezarezvani/claude-skills --skill caio-review --agent cursornpx skills add alirezarezvani/claude-skills --skill caio-review --agent codexnpx skills add alirezarezvani/claude-skills --skill caio-review --agent opencodenpx skills add alirezarezvani/claude-skills --skill caio-review --agent github-copilotnpx skills add alirezarezvani/claude-skills --skill caio-review --agent windsurfMore install options
Shorthand — useful for multi-skill repos:
npx skills add alirezarezvani/claude-skills --skill caio-reviewManual — clone the repo and drop the folder into your agent's skills directory:
git clone https://github.com/alirezarezvani/claude-skills.gitcp -r claude-skills/c-level-advisor/c-level-agents/skills/caio-review ~/.claude/skills//cs:caio-review — CAIO Forcing Questions
Command: /cs:caio-review <plan>
The eval-demanding CAIO pressure-tests any plan that involves AI. Six questions before any AI feature ships, any multi-year vendor commitment, or any AI team expansion.
When to Run
- Before shipping any new AI-powered feature
- Before signing a multi-year AI vendor contract (API or self-hosted infra)
- Before EU launch of any AI feature
- Before a major AI team hire (especially ML engineer or research scientist)
- Before a fine-tuning project commitment
- Before adopting AI in a regulated domain (employment, credit, healthcare, education, etc.)
- When the founder uses the word "AI" near "competitive advantage" or "moat"
The Six CAIO Questions
1. What does this AI need to be good at, and how would you measure it?
No eval set = no ship. Before any AI feature deploys, define the eval criteria.- 50-100 representative inputs minimum
- Expected outputs OR rubric for grading
- Edge cases: ambiguous, adversarial, format-edge
- If you can't write down what "good" looks like, you don't have a feature; you have a vibe.
2. What's the SLO on hallucination / error rate, and what's the fallback?
Every AI feature has a failure mode. Plan for it.- Quantified SLO: "<5% hallucination on factual queries"
- Detection mechanism: monitoring, sampling, customer feedback loop
- Fallback: human-in-loop review, lower-risk default response, refuse-to-answer
- Blast radius if SLO breached: how many users affected, what is the cost?
3. What's the risk tier under EU AI Act, and is conformity assessment required?
Runai_risk_classifier.py if any EU residents are affected OR domain is regulated.
- PROHIBITED → cannot launch in EU; re-scope
- HIGH → conformity assessment + EU DB registration + 10 Articles of obligations (3-12 months, $50-200K)
- LIMITED → transparency obligations (chatbot disclosure, AI-generated content marking)
- MINIMAL → no specific obligations; NIST AI RMF voluntary
4. API, fine-tune, or build?
Runmodel_buildvsbuy_calculator.py for the specific use case.
- 80% of B2B SaaS use cases: API
- 15%: fine-tune (when domain-specific behavior + labeled data + ML team + high volume)
- <1%: build from scratch
- Decision must consider economic breakeven AND practical feasibility (data, team, compliance)
5. What's the 12-month cost trajectory at expected scale?
Runai_cost_economics.py for the workload.
- API: variable, scales linearly
- Self-hosted: mostly fixed, breakeven typically 1-10B tokens/month for 70B-class
- Hidden costs of self-hosted: ops, monitoring, model updates, capacity, failover, security
- Hidden costs of API: vendor lock-in, capability drift, rate limits, data residency
- Prompt caching is the most underrated lever; check provider support
6. What role unblocks this — and have we hired prerequisites first?
Map AI capability to specific role. Founders confuse AI engineer / ML engineer / research scientist.- AI engineer: applied + full-stack + prompts + evals + deployment (most startups need this)
- ML engineer: fine-tuning + retraining infra (only after platform engineer + labeled data)
- Research scientist: model invention (only if model IS the product)
- Don't hire research scientist as first AI hire — they need infrastructure to be productive
Workflow
# 1. Model selection check
python ../../../skills/chief-ai-officer-advisor/scripts/model_buildvsbuy_calculator.py use_case.json
# 2. Regulatory classification
python ../../../skills/chief-ai-officer-advisor/scripts/ai_risk_classifier.py use_case.json
# 3. Cost projection
python ../../../skills/chief-ai-officer-advisor/scripts/ai_cost_economics.py workload.json
Output Format
# CAIO Review: <plan>
**Date:** YYYY-MM-DD
## The Decision Being Made
[one sentence — which CAIO decision: model selection | risk classification | economics | next hire]
## Eval Discipline
- Eval set committed: yes/no
- SLO defined: <metric> < <threshold>
- Fallback behavior: <one line>
## Model Selection (if applicable)
- Recommended: API / FINE_TUNE / BUILD
- 3-year TCO: $X (chosen path) vs $Y (alternatives)
- Breakeven: <volume>
## Risk Classification (if applicable)
- EU AI Act tier: PROHIBITED / HIGH / LIMITED / MINIMAL
- Conformity assessment required: yes/no
- US state triggers: [list]
- Required controls open: N
## Cost Economics (if applicable)
- Monthly cost at current volume: $X
- Breakeven for self-hosted migration: <volume>
- Migration cost if applicable: $X (3-6 months)
## Org (if applicable)
- Next hire: <role>
- Why this, not the alternative: <one line>
- Prerequisite hires in place: yes/no
## Verdict
🟢 SHIP | 🟡 SHARPEN | 🔴 BLOCK
## Next Steps
[3 concrete actions]
Routing
/cs:cdo-review— for any training-data implications/cs:gc-review— for AI vendor contracts, output liability, training-data licensing/cs:ciso-review— for prompt injection / jailbreak / training-data poisoning threat model/cs:cfo-review— for multi-year vendor or GPU commitment TCO/cs:chro-review— for AI team hires (comp, ladder, leveling)/cs:decide— log the verdict/cs:freeze 60— on multi-year AI commitments
Related
- Agent: [
cs-caio-advisor](../../agents/cs-caio-advisor.md) - Skill: [
chief-ai-officer-advisor](../../../skills/chief-ai-officer-advisor/SKILL.md) - Adjacent:
../../../skills/chief-data-officer-advisor/(training data rights, data strategy)
---
Version: 1.0.0
SKILL.md source
--- name: caio-review description: /cs:caio-review <plan> — Eval-demanding Chief AI Officer interrogation of any plan that involves AI: model selection, risk classification, cost economics, or AI hiring. --- # /cs:caio-review — CAIO Forcing Questions **Command:** `/cs:caio-review <plan>` The eval-demanding CAIO pressure-tests any plan that involves AI. Six questions before any AI feature ships, any multi-year vendor commitment, or any AI team expansion. ## When to Run - Before shipping any new AI-powered feature - Before signing a multi-year AI vendor contract (API or self-hosted infra) - Before EU launch of any AI feature - Before a major AI team hire (especially ML engineer or research scientist) - Before a fine-tuning project commitment - Before adopting AI in a regulated domain (employment, credit, healthcare, education, etc.) - When the founder uses the word "AI" near "competitive advantage" or "moat" ## The Six CAIO Questions ### 1. What does this AI need to be good at, and how would you measure it? **No eval set = no ship.** Before any AI feature deploys, define the eval criteria. - 50-100 representative inputs minimum - Expected outputs OR rubric for grading - Edge cases: ambiguous, adversarial, format-edge - If you can't write down what "good" looks like, you don't have a feature; you have a vibe. ### 2. What's the SLO on hallucination / error rate, and what's the fallback? **Every AI feature has a failure mode. Plan for it.** - Quantified SLO: "<5% hallucination on factual queries" - Detection mechanism: monitoring, sampling, customer feedback loop - Fallback: human-in-loop review, lower-risk default response, refuse-to-answer - Blast radius if SLO breached: how many users affected, what is the cost? ### 3. What's the risk tier under EU AI Act, and is conformity assessment required? **Run `ai_risk_classifier.py` if any EU residents are affected OR domain is regulated.** - PROHIBITED → cannot launch in EU; re-scope - HIGH → conformity assessment + EU DB registration + 10 Articles of obligations (3-12 months, $50-200K) - LIMITED → transparency obligations (chatbot disclosure, AI-generated content marking) - MINIMAL → no specific obligations; NIST AI RMF voluntary ### 4. API, fine-tune, or build? **Run `model_buildvsbuy_calculator.py` for the specific use case.** - 80% of B2B SaaS use cases: API - 15%: fine-tune (when domain-specific behavior + labeled data + ML team + high volume) - <1%: build from scratch - Decision must consider economic breakeven AND practical feasibility (data, team, compliance) ### 5. What's the 12-month cost trajectory at expected scale? **Run `ai_cost_economics.py` for the workload.** - API: variable, scales linearly - Self-hosted: mostly fixed, breakeven typically 1-10B tokens/month for 70B-class - Hidden costs of self-hosted: ops, monitoring, model updates, capacity, failover, security - Hidden costs of API: vendor lock-in, capability drift, rate limits, data residency - Prompt caching is the most underrated lever; check provider support ### 6. What role unblocks this — and have we hired prerequisites first? **Map AI capability to specific role. Founders confuse AI engineer / ML engineer / research scientist.** - AI engineer: applied + full-stack + prompts + evals + deployment (most startups need this) - ML engineer: fine-tuning + retraining infra (only after platform engineer + labeled data) - Research scientist: model invention (only if model IS the product) - Don't hire research scientist as first AI hire — they need infrastructure to be productive ## Workflow ```bash # 1. Model selection check python ../../../skills/chief-ai-officer-advisor/scripts/model_buildvsbuy_calculator.py use_case.json # 2. Regulatory classification python ../../../skills/chief-ai-officer-advisor/scripts/ai_risk_classifier.py use_case.json # 3. Cost projection python ../../../skills/chief-ai-officer-advisor/scripts/ai_cost_economics.py workload.json ``` ## Output Format ```markdown # CAIO Review: <plan> **Date:** YYYY-MM-DD ## The Decision Being Made [one sentence — which CAIO decision: model selection | risk classification | economics | next hire] ## Eval Discipline - Eval set committed: yes/no - SLO defined: <metric> < <threshold> - Fallback behavior: <one line> ## Model Selection (if applicable) - Recommended: API / FINE_TUNE / BUILD - 3-year TCO: $X (chosen path) vs $Y (alternatives) - Breakeven: <volume> ## Risk Classification (if applicable) - EU AI Act tier: PROHIBITED / HIGH / LIMITED / MINIMAL - Conformity assessment required: yes/no - US state triggers: [list] - Required controls open: N ## Cost Economics (if applicable) - Monthly cost at current volume: $X - Breakeven for self-hosted migration: <volume> - Migration cost if applicable: $X (3-6 months) ## Org (if applicable) - Next hire: <role> - Why this, not the alternative: <one line> - Prerequisite hires in place: yes/no ## Verdict 🟢 SHIP | 🟡 SHARPEN | 🔴 BLOCK ## Next Steps [3 concrete actions] ``` ## Routing - `/cs:cdo-review` — for any training-data implications - `/cs:gc-review` — for AI vendor contracts, output liability, training-data licensing - `/cs:ciso-review` — for prompt injection / jailbreak / training-data poisoning threat model - `/cs:cfo-review` — for multi-year vendor or GPU commitment TCO - `/cs:chro-review` — for AI team hires (comp, ladder, leveling) - `/cs:decide` — log the verdict - `/cs:freeze 60` — on multi-year AI commitments ## Related - Agent: [`cs-caio-advisor`](../../agents/cs-caio-advisor.md) - Skill: [`chief-ai-officer-advisor`](../../../skills/chief-ai-officer-advisor/SKILL.md) - Adjacent: `../../../skills/chief-data-officer-advisor/` (training data rights, data strategy) --- **Version:** 1.0.0
Related skills 6
to-prd
Turn the current conversation context into a PRD and publish it to the project issue tracker. Use when user wants to create a PRD from the current context.
to-issues
Break a plan, spec, or PRD into independently-grabbable issues on the project issue tracker using tracer-bullet vertical slices. Use when user wants to convert a plan into issues, create implementation tickets, or break down work into issues.
Boardroom
/cs:boardroom <brief> — 6-phase multi-role deliberation across the C-suite with Phase 2 isolation, critic pre-screen, and synthesis. Outputs a board memo.
Brief
/cs:brief <topic> — Generate a one-page strategy brief from an office-hours intake. First step in the strategic sprint pipeline.
C Level Agents
Founder-mode executive team. 8 cs-* C-suite agents (CFO, CMO, CRO, CPO, COO, CHRO, CISO, Chief of Staff) and 17 /cs:* slash commands for forcing-question office hours, multi-role boardroom delibera...
Cco Review
/cs:cco-review <plan> — Retention-obsessed Chief Customer Officer interrogation of any plan that touches customer retention, segmentation, CS team sizing, or CS team hiring.