S
SEO NAVIGATOR · HR Implementation Plan 2026
HR + Operations · Bích Ngọc · Updated May 2026

HR Implementation Plan 2026
Built for the Centaur Agency

A two-track HR system that hires and manages talent for the AI-native model SEO Navigator is becoming. Track A turns ClickUp + Claude into a recruitment pipeline. Track B turns ClickUp + Claude into an OKR performance system. Both are built around the Agent Manager role definition — the new shape of every job in this agency.

9
Role Cards
2
Tracks
45%
Rubric AI-Native
7
Departments
90
Day Rollout
14
Days Tighter
The Operating Model
The Centaur Split — Applied to Hiring & Performance

Same doctrine as the rest of the agency. Claude does the throughput — scoring, drafting, surfacing patterns, summarizing data. Humans own the judgment — final hire decisions, performance conversations, escalation, culture. Email automation lives in ClickUp, never in Claude.

🎯

HUMAN — The Reins

Hire / no-hire decisions. Performance conversations. Salary negotiation. Culture coaching. Escalation. Final approval on every rubric, summary, and OKR. Ngọc reviews every Claude draft before it reaches a manager or candidate.

⚙️

AI — The Throughput

CV scoring against rubric. Calibration-based ranking. Auto-flag detection. Pre-retro ClickUp data analysis. Monthly performance summary drafts. Pattern surfacing across batches. Draft-only outputs — never sent to anyone without Ngọc's review.

What this plan revises from the original

The original HR plan operationally executes the centaur model well. Three corrections are folded in here. First, role cards (Sprint 1, G3 Role Redefinition) now precede rubrics, OKRs, and SOWs — so every downstream artifact references a validated role definition, not a guess.

Second, the hiring rubric reserves 45% of every score for AI-native dimensions: AI fluency, Agent Manager capability, Ownership Operator mindset, and QA discipline. The remaining 55% goes to role-specific skills. Third, performance KPIs now measure agent leverage — hours recovered, override rate, agents managed — not just output volume.

01
Sequencing Fix

Role Cards First

Phase 0.5 inserted before original Phase 1. One pilot role card drafted and signed in 5 days. The remaining 8 in parallel with Phase 3 scale.

02
Rubric Fix

45% AI-Native Baseline

Every rubric reserves 45% for AI fluency, Agent Manager capability, Ownership Operator mindset, and QA discipline — across all role families.

03
KPI Fix

Agent Leverage Metrics

Monthly summaries measure hours recovered through agents, override rate, and agent oversight quality — not just output volume.

The Dependency Chain
Why Order Matters — Role Cards → SOWs → OKRs → Rubrics

Every downstream artifact must reference an upstream definition. Build a rubric without a role card and you're scoring for the wrong competencies. Build OKRs without a SOW and you're targeting the wrong outcomes. Build a KPI scorecard without OKRs and you're measuring noise.

Dependency Chain — Visual Flow

NEW phase Existing phase Output
NEW · 0.5 ROLE CARDS G3 Sprint 1 Days 1–7 NEW · B0.0 SOWs WS6 hard dep Days 5–14 TRACK B OKRs B0.1–B0.3 Wk 5–6 TRACK A RUBRICS Phase 1.1 Days 6–10 TRACK A CALIB. EXAMPLES Phase 1.2 Days 11–14 OUTPUT PILOT RUN Phase 2 Wk 2–3 Defines what each role does in the AI-native model 9 cards · agents managed · QA · escalation Departmental accountability Measurable team targets CV scoring dimensions Anchor for Claude scoring First batch scored at quality Each node depends on the one before it · break this chain and the artifacts misalign
Side-by-Side
Original Phase 1 vs Revised Phase 0.5 + 1

Original Sequence

  1. Build evaluation rubric for pilot role
  2. Compile 3 calibration examples
  3. Define auto-flag list
  4. Run pilot CV batch
  5. Lead reviews ranking
  6. Scale to other role families
  7. Lock OKR taxonomy
  8. Collect OKRs from 5 missing teams

Risk: rubric and OKRs are built before role cards exist. Re-work guaranteed once Sprint 1 lands role definitions on Day 30.

Revised Sequence

  1. NEW: Draft pilot role card (Days 1–4)
  2. NEW: Validate with lead + Jake (Day 5)
  3. Build rubric anchored to role card
  4. Compile calibration examples (now scoring 4 baseline + role-specific dimensions)
  5. Auto-flag list (extended for AI-native fit)
  6. Run pilot CV batch
  7. NEW: Confirm or write SOWs for all 7 departments
  8. Lock OKR taxonomy + collect OKRs (anchored to SOWs)

Adds ~5–7 days to Phase 1 — eliminates re-work and ensures every artifact downstream is grounded in a validated upstream definition.

Sprint Alignment
How HR Phases Map onto Transformation Sprint 1 / 2 / 3

12-Week HR Timeline · Overlaid with Transformation Sprints

Track A Track B NEW phase Gate
W1W2W3W4 W5W6W7W8 W9W10W11W12 SPRINTS SPRINT 1 · Structure the Knowledge · Days 1–30 SPRINT 2 · Deploy Agent Swarm · Days 31–60 SPRINT 3 · Measure & Iterate · Days 61–90 TRACK A Recruitment P0.5 ROLE CARD P0 prereq P1 RUBRIC + CALIBRATION P2 PILOT BATCH ✓ GATE P3 SCALE TO OTHER ROLES + 8 ROLE CARDS TRACK B Performance B0.0 SOWs B0 OKR SETUP ✓ GATE B1 BI-WEEKLY RETRO CADENCE B2 MONTHLY PERFORMANCE SUMMARIES G6 Velocity baseline (Day 30) uses existing ClickUp throughput data — does not block on OKRs

Risk this revision doesn't eliminate

Sprint 1's G6 Velocity baseline expects measurement by Day 30. This revised sequence pushes OKR data flow into Week 6+, which is too late to feed G6. Recommended workaround: use task-throughput data from existing ClickUp lists (tasks completed per person per week, last 4 weeks) as the velocity baseline. It's grounded in real work happening today, not aspirational OKRs. Push G6 closure to Day 45 only if Jake disagrees.

The New Phase
Phase 0.5 — Role Cards Before Anything Else

A role card defines what each role does in the AI-native model: which agents the person manages, what only humans can decide, what they review and approve, when to escalate, and what they're accountable for. Without role cards, every downstream artifact (rubrics, OKRs, KPIs) is scoring or measuring against a guess.

NEW · Inserted Before Original Phase 1

Phase 0.5 — Role Cards (Days 1–7)

Sprint 1 of the transformation plan promises 9 role cards covering "agents managed, human decisions, QA, escalation." This phase delivers the pilot role card so HR Track A can begin Phase 1.1 with a validated reference. The remaining 8 cards are drafted in parallel with Phase 3 scaling, weeks 3–5.

A
Identity

Who the person is

  • Role title + reports-to
  • Mission (one sentence — why this role exists)
  • Career path / level definitions
B
AI Operating System

What they manage

  • Agents managed (PM Pulse, Sentinel, Catalyst…)
  • Tools used (ClickUp, Slack, GHL, Skills, MCP)
  • Cadence (daily / weekly / monthly rhythm)
C
Human Layer

What only humans decide

  • Human-only decisions — never delegated
  • QA responsibilities — what they review/approve
  • Escalation rules — when, to whom, on what timeline
Day-by-Day Execution
Pilot Role Card — 7-Day Schedule

Day 1 · Mon

Pick the pilot

Confirm with Jake which role goes first. Recommend SEO Executive — most active hiring + highest agent leverage (Content Catalyst + Sentinel both touch this role).

Day 2 · Tue

Working session

90-min session with Phuong Anh (SEO Team Leader). Walk through all 11 sections of the role card template. Capture verbatim — don't polish yet.

Day 3 · Wed

Draft v1

Polish session notes into structured markdown. Confirm which agents the role manages from the Sprint 2 plan (Sentinel + Catalyst confirmed). Share with Jake EOD.

Day 4 · Thu

Jake review

Validate against Pillar 6 transformation plan. Check Agent Manager framing, Ownership Operator culture, AEO awareness. Capture edits.

Day 5 · Fri

Sign-off + KB

Apply edits. Final sign-off from Jake + Phuong Anh. Save to Knowledge Base at /agency-os/role-cards/seo-executive.md

Day 6 · Sat

Begin Phase 1

Start original Phase 1.1 — but rubric is now anchored to the signed role card. Use the Agent Manager Rubric Template as the structural baseline.

Day 7 · Sun

Calibration prep

Begin compiling 3 calibration examples in the new format (4 baseline dimensions + role-specific). Pull past hires from ClickUp HR list.

The Full Set
9 Role Cards — Pilot + Parallel Drafting

Pilot in week 1. Remaining 8 across weeks 3–5, parallel with Phase 3 scaling. Each follows the same template, same gate, same Knowledge Base location.

1
PILOT

SEO Executive

Phuong Anh oversight · Manages Sentinel + Catalyst · Drafts to Ngọc by Day 5.

2
Week 3

SEO Team Leader

Phuong Anh herself · Coordinator-level role · Manages PM Pulse + oversees specialists.

3
Week 3

Content Executive

Manages Content Catalyst directly · Highest writing-skill weight · Editorial judgment.

4
Week 4

Google Ads Specialist

Tung's role · Manages Ad Arbitrage agent · Stewards Google Ads skill chain.

5
Week 4

Meta Ads Specialist

Manages 8-agent Campaign Accelerator pipeline · 3 human approval gates.

6
Week 4

CRM / GHL Specialist

Manages Revenue Relay · Owns SNMS 6-Pillar deployment · GHL workflow QA.

7
Week 5

UI/UX Designer

Tram (freelance) · Brand systems · Asset oversight · Manager TBD.

8
Week 5

AI Automation Developer

Trung's role + new hire · Builds the agents others manage · Highest AI Fluency weight.

9
Week 5

HR / Operations

Bích Ngọc's own role · Agent Manager for the agency itself · Track A + B owner.

Open question for Jake

Two of these roles need clarification before drafting begins. Designer — Tram is freelance; who does she report to, and is she actually in scope for an Agent Manager role card, or does she sit outside the agent fleet? Department leads — for the 5 teams currently missing OKRs (per original B0.2), is the gap missing leadership, missing SOWs, or missing role definitions? The role card is downstream of that diagnosis.

Track A
Recruitment Pipeline — ClickUp Form → Claude Scoring → Status Triggers

CVs enter through a ClickUp form. Claude scores each CV against a rubric anchored to the role card, in a single thread per JD that retains context across batches. ClickUp automation handles all candidate emails through status triggers — Claude never sends.

Track A Recruitment Flow

Claude ClickUp automation Human
CLICKUP FORM CV Intake 5–7 fields CLAUDE · 1 THREAD/JD Score vs Rubric + calibration set CLAUDE OUTPUT Ranked List + flags + reasons HUMAN · LEAD Top-3 Approval ≥80% agreement CLICKUP STATUS Shortlisted trigger update AUTOMATION Email Sent Mapped to rubric dimensions Same thread across batches Per-dim scores + justifications Disagreement → new calibration Ngọc moves card status Status-based trigger Claude never sends emails · Automation handles all candidate communication via ClickUp status triggers
Phase 0
Prerequisites — Before Week 1
0.1

ClickUp MCP access for Ngọc

Status — Done · Verified

Claude cannot read ClickUp candidate data without this connection. Verified Ngọc's ClickUp account has read access to all hiring-related Lists. MCP active in Claude Cowork. Track A cannot run without this.

  • MCP reads ClickUp data successfully
  • Tested: Claude returns task lists from one Hiring List
0.2

ClickUp form fields mapped to rubric

Status — Done for AI Auto Dev role

Form questions directly determine how fast and accurately Claude can assess candidates without opening every CV. Reviewed and updated for current open role; same pattern reused for future role families.

  • 5–7 questions max to preserve completion rate
  • One question per rubric dimension that can't be inferred from CV alone
Phase 1 (Revised)
Calibration Content — Now Anchored to Role Card

Same calibration discipline as the original plan. The change: every step now opens the role card first, and the rubric uses the Agent Manager template baseline (45% reserved for AI-native fit).

1.1

Build evaluation rubric (anchored to role card)

Days 6–7 of revised sequence

Open the signed role card. Build the rubric to score for the competencies the role card lists under agents managed, human decisions, QA responsibilities, and outcomes. Use the Agent Manager Rubric Template as the structural baseline. 45% reserved for AI-native dimensions; 55% allocated to role-specific competencies named in the role card.

  • Rubric file: /agency-os/hiring/rubrics/[role].md
  • Jake sign-off on weights before any candidate is scored
  • Linked to role card path in front matter
1.2

Compile 3 calibration examples (new format)

Days 8–10

Pull 3 past candidates: one excellent hire, one poor fit, one almost-passed-but-failed. New format scores each on the 4 baseline dimensions plus role-specific dimensions. For past hires where AI-native scores can't be reconstructed from the CV, estimate from observed first-90-day performance and note the estimate.

  • Anonymise · 2–3 sentence outcome reasoning per example
  • Stored alongside rubric file
  • Single biggest accuracy lever — without these, scoring is generic
1.3

Auto-flag list (extended for AI-native fit)

Day 10

Original flags retained (career gaps, salary mismatch, English mismatch, no measurable results, job-hopping, missing portfolio). Extended with 5 new AI-native flags: zero AI exposure + no curiosity (deal-breaker), can't describe correcting AI output, treats AI as magic-or-useless, all work history task-assigned, no self-review process.

  • 3+ flags from AI-native list = automatic rejection
  • Separate from deal-breakers (which ARE disqualifying — Jake sign-off)
1.4

Pipeline status flow + automation

Day 11

Map stages: New Application → Screening → Shortlisted → Interview 1 → Interview 2 → Offer → Hired / Rejected. Per transition: does it trigger an automated email? If yes, draft template. ClickUp automation triggers per status change. Claude does NOT send emails.

  • Status flow documented
  • Automation triggers tested in ClickUp
  • Email templates per transition reviewed by Jake
Phase 1 Gate — Rubric signed off, 3 calibration examples exist, flag list approved (including 5 new AI-native flags), ClickUp automation tested end-to-end. Then proceed to Phase 2 pilot.
2

Pilot Batch — Validate Scoring Quality

Week 2–3 · ~5–10 candidates

Open a Claude thread for the pilot JD. Paste full context: JD, rubric, 3 calibration examples, flag list, ClickUp candidate list URL. Save thread URL — reuse for all future batches for this JD. Run scoring; ask lead "Would you advance these 3 for an interview? Is anyone missing?" If ≥80% agreement: system works. If not: rubric weights adjusted; new disagreement becomes a new calibration example.

  • One Claude thread per JD — never re-open
  • Lead review uses rankings only, not full scoring tables
  • Every disagreement → new calibration entry
  • Email automation re-tested with one real candidate moved through all statuses
3

Scale to Other Role Families

Week 3–5 · Parallel with role card #2–9

For each role family (Ads, Dev/IT, Design, PM, Sales/Strategy): repeat steps 1.1–1.2 with the relevant lead. 2–3 calibration examples minimum per family. Prioritise roles with active or frequent hiring first. JD template + interview guide templates finalised. Salary benchmarks documented per role with source dates.

  • Rubric file per role family — anchored to its role card
  • Interview guide: 8 questions covering each rubric dim + 1 follow-up
  • Salary benchmarks dated, source-referenced, refreshed quarterly
Rubric Template
Agent Manager Rubric — 45% AI-Native, 55% Role-Specific

Every rubric across every role family reserves 45% of the score for the four AI-native dimensions: AI Fluency, Agent Manager Capability, Ownership Operator Mindset, QA Discipline. The remaining 55% is allocated by role to technical skills, communication, culture, and salary alignment.

The Score Composition · 45% Baseline + 55% Role-Specific

AI FLUENCY 15% AGENT MANAGER CAPABILITY 15% OWNERSHIP OPERATOR 10% QA 5% ROLE-SPECIFIC 55% ◀ AI-NATIVE BASELINE · 45% VARIES BY ROLE ▶ Has used AI in real work Reviews + corrects AI output Owns outcomes, not tasks Trust-but- verify Technical skills · Communication · Culture · Salary alignment Same baseline across every role family · Filters for Agent Manager fit before role-specific competence
Dimension 1 / 4
AI Fluency · 15%
15%

What it measures

Real AI usage · iteration · mental model

Has the candidate actually used AI in real work, iterated prompts, and developed a working mental model of where AI helps vs. where it fails? Daily users score 7–8. Builders score 9–10. No exposure + no curiosity = deal-breaker.

9–10
Has shipped AI-assisted workflows. Names tools + what each is good at. Iterates prompts systematically. Distinguishes hallucination vs context drift vs bad prompting.
7–8
Daily AI user. Has examples from last role. Walks through past tasks. Knows when to trust output and when to verify.
5–6
Casual user — light tasks only. Hasn't pushed into core workflow. Understands prompting concept but hasn't iterated.
3–4
Aware of tools. Hasn't used them in work context. Open to learning.
1–2
No exposure. Skeptical or dismissive without basis.
?

Interview Probes

Use 1–2 per interview · prep follow-ups
Walk me through the last task you used AI for. What was your first prompt? What did you change to get a better answer?
Where have you found AI is useless or actively wrong? Give me a specific example.
If I gave you Claude with no instructions and a vague task, where would you start?

Auto-flag — Zero AI exposure AND no curiosity to learn. Deal-breaker for any role.

Dimension 2 / 4
Agent Manager Capability · 15%
15%

What it measures

AI oversight · delegation · iteration

Can they oversee AI output, catch errors, define what to delegate vs. own, and refine prompts to improve quality? This is the core competency for the new role shape — every team member becomes an Agent Manager regardless of function.

9–10
Has overseen AI in real workflow. Caught AI errors. Knows what to delegate vs own. Refined prompts over multiple iterations. Thinks in system design terms.
7–8
Reviews AI work and makes it shippable. Treats AI output as draft, not deliverable. Articulates why AI got things wrong.
5–6
Uses AI output but hasn't critically reviewed at scale. Trusts too easily, or distrusts without diagnosing why.
3–4
Treats AI as a black box. Ships without review, or rejects wholesale.
1–2
No conception of AI as something to manage.
?

Interview Probes

Probe specifics · examples beat philosophy
Show me a piece of AI output you fixed before sending. What did the AI get wrong, and why do you think it got it wrong?
If Claude gives you a plausible but wrong answer, how do you catch it?
If you had to design a workflow where AI does 70% and a human does 30%, what would the human's 30% be?

Auto-flag — Cannot describe a single time they corrected AI output. Likely ships unreviewed work.

Dimension 3 / 4
Ownership Operator Mindset · 10%
10%

What it measures

Outcome ownership · proactivity · escalation

Do they own outcomes (not tasks), drive things to completion without being chased, and surface problems before they escalate? This is the agency culture filter — directly from Pillar 6 of the transformation plan.

9–10
Past examples of solving problems unprompted. Owns outcomes. Drives projects through obstacles. Surfaces blockers early. Has fired up the chain to get unblocked.
7–8
Strong initiative. Asks "why" before "how". Doesn't wait for direction. Reports progress without being chased.
5–6
Reliable executor on defined work. Does what's asked, well. Doesn't expand scope.
3–4
Needs frequent direction. Task-oriented vs outcome-oriented. Reports a blocker but won't try to unblock.
1–2
Passive. Will sit on a problem rather than escalate.
?

Interview Probes

Look for self-initiated stories
Tell me about a time you saw something broken or sub-optimal at work and fixed it without anyone asking.
When you hit a blocker, what do you do in the first 24 hours?
Describe a project you owned end-to-end. Who held you accountable?

Auto-flag — Every example given is a task assigned by a manager. No self-initiated work.

Dimension 4 / 4
QA · Trust-But-Verify Discipline · 5%
5%

What it measures

Self-review · error catching · process discipline

Do they review work before shipping, catch inconsistencies, and ask "how do we know this is right?" Lower weight than the first three because partly correlated with Agent Manager Capability — but distinct enough to score separately.

9–10
Documented QA processes from past roles. Catches inconsistencies in data, copy, logic. Has caught significant errors others missed.
7–8
Reviews own work systematically. Asks for proofreading on important work. Spots typos, broken logic, factual gaps.
5–6
Generally careful. Mistakes happen but don't repeat.
3–4
Quality is inconsistent. Needs supervision on details.
1–2
Ships without checking. Doesn't catch errors.
?

Interview Probes

Look for repeatable QA habits
Walk me through your last review of your own work before sending it out. What did you check?
What's the most embarrassing mistake you've shipped, and how do you avoid that class of mistake now?

Auto-flag — No examples of self-review. No process for catching mistakes.

Worked Examples
Weight Allocation by Role Family

Same 45% baseline. Remaining 55% redistributed by role. Use as starting points; adjust with the relevant team lead.

Dimension SEO Executive Google Ads Specialist Content Executive AI Auto Developer
AI Fluency15% (baseline)15%15%20% ↑
Agent Manager Capability15%15%15%20% ↑
Ownership Operator10%10%10%10%
QA Discipline5%5%5%10% ↑
Role-specific tech skills20% (Tech SEO)20% (Platform)25% (Writing)20% (Python/MCP)
Secondary skill10% (Writing)10% (Analytics)10% (Editorial)10% (Systems)
Tools / domain5% (Screaming Frog, GSC)5% (Vertical)
English10%10%(in writing)5%
Numeracy5%
Salary alignment10%10%15%5%
TOTAL100%100%100%100%

Calibration Example — Updated Format

Each calibration example must score the four baseline dimensions plus role-specific. Format: anonymous ID, outcome (excellent / poor / almost-passed), role family, baseline scores with one-sentence reasons each, role-specific scores, and a 2–3 sentence "why this outcome" closer that includes how the AI-native dimensions played out in practice. For past hires where AI-native scores can't be reconstructed from the CV: estimate from observed first-90-day performance and note the estimate.

Track B
OKR Performance Management — KR Updates → Claude Pre-Read → Bi-Weekly Retro

Custom fields in ClickUp capture KR progress. KR owners update Current Value before each retro. Claude reads the data the day before the meeting and surfaces patterns. The retro itself is a human conversation. Monthly summaries draft per team member, reviewed by Ngọc, used by managers as 1:1 prep.

Track B Performance Flow · Bi-Weekly Cadence

Human Claude ClickUp
HUMAN · KR OWNER Update KR By Friday 5pm CLICKUP DATA OKR List Locked fields CLAUDE · DAY BEFORE Pre-Read Patterns + flags NGỌC REVIEW Share to Slack 2hr before retro HUMAN MEETING Bi-Weekly Retro 45 min CLICKUP Action Items Current Value + Status field Field names never change No-update detection + overdue actions Quality check before sharing Decisions made by humans Owner + due date Claude analysis is INPUT to the meeting · Leads decide · Claude does not run the retro
Phase B0.0 (NEW)
SOWs Before OKRs — The Hard Dependency

Why this is inserted

The Agency OS WS6 rubric is explicit: "Department SOW exists and is signed" is a hard dependency before KPI scorecards begin. The original B0 jumped straight to OKR collection without checking SOW status. If a department lead can't articulate their SOW in 60 minutes, that's the deeper issue — and it likely also explains why their OKRs are missing.

B0.0.1

SOW Audit

2 days

Audit which of the 7 departments have signed SOWs. Document: department, owner, signed date, scope summary, gaps.

B0.0.2

Draft Sessions

1 week

For departments without SOWs: schedule 60-min session with the lead. Each SOW references the relevant role card(s). If the lead can't articulate the SOW in the session, escalate.

B0.0.3

Sign-off + KB

2 days

Jake sign-off per SOW. Store at /agency-os/sows/[department].md Linked from each role card.

Gate B0.0 — Every department has a signed SOW. Until then, no OKR collection. If 5 departments are still missing OKRs at this gate, that's a leadership/scoping issue, not a data-collection issue.
Phase B0 → B2
OKR Setup · Bi-Weekly Retro · Monthly Summaries
B0

OKR Structure Setup

Week 5–6 · After SOWs

Lock OKR custom field taxonomy in ClickUp. Field names never change after this point. Collect OKRs from all 7 teams (anchored to SOWs + role cards). Build CEO dashboard with progress bars, status breakdown, last-updated timestamps.

  • Custom fields: Objective, KR, Owner, Target, Current, Progress %, Status, Cadence
  • NEW field: Linked Role Card — text path to role card markdown
  • Single ClickUp dashboard view shared with Jake
B1

Bi-Weekly Retro Cadence

Week 6+ · Recurring

Recurring 45-min meeting. Pre-read updates by KR owners by Friday 5pm prior. Claude pre-read analysis morning of (Ngọc reviews before sharing). Fixed agenda: 10/20/10/5 split — On Track, At Risk, Actions, OKR updates. Action items captured live, entered as ClickUp tasks with owner + due date.

  • KR owner updates Current Value + Status before retro
  • Slack reminder 24hr before each retro
  • Claude analysis is INPUT — leads decide
B2

Monthly Performance Summaries

Week 8+ · After cadence stable

Per-role KPI definitions signed off by leads. Monthly template: KR progress, task completion, wins (with evidence), slippage (factual), 1:1 talking points, recommended focus. Claude drafts; Ngọc reviews before manager sees; manager uses as prep.

  • Claude drafts. Manager owns the conversation.
  • Ngọc review = HITL gate before delivery
  • Claude flags sparse data rather than inferring
Updated KPI Set
Now Includes Agent Leverage Metrics

The original KPI examples (briefs delivered, on-time rate, CQS, satisfaction) measured output volume only. The revised set adds agent leverage — the metrics that align individual reviews with the agency's north-star claim of ~165 weekly hours recovered through AI agents.

Role Output KPIs (kept) Agent Leverage KPIs (new)
SEO Executive Briefs delivered/mo · On-time rate · CQS score MoM Hours recovered via Sentinel + Catalyst · Override rate · Agent QA pass rate
Content Executive Articles published · Editorial revisions count · Voice consistency Catalyst draft acceptance rate · Time per published article (Claude vs human)
Google Ads Specialist Campaign launches · CPA delta · ROAS Ad Arbitrage report quality · Skill chain runs/mo · Wasted spend caught by AI
CRM Specialist Workflows shipped · Lead-to-appointment rate Revenue Relay messages reviewed · Conversation AI override rate · Reviews AI accuracy
SEO Team Leader Team velocity · Client retention · Project on-time PM Pulse coordination quality · Cross-team blockers resolved · Team agent fluency score
Action Plan
What Bích Ngọc Does This Week

Concrete day-by-day for Week 1 of the revised sequence. If the AI Automation Developer hiring is currently running on the original rubric, pause that pipeline and re-score against the Agent Manager Rubric Template before any candidate advances. Better to lose three days than hire someone whose strongest skill the rubric didn't measure.

Day 1 · Mon

Pilot role decision

30-min sync with Jake. Confirm SEO Executive as pilot (recommended). Schedule Day 2 session with Phuong Anh. If AI Auto Dev hiring is mid-flight: pause and queue for re-score.

Day 2 · Tue

Role card session

90-min working session with Phuong Anh. Walk through 11 sections. Capture verbatim. Confirm which agents the SEO Executive role manages (Sentinel + Catalyst per Sprint 2 plan).

Day 3 · Wed

Draft v1 + review prep

Polish notes into structured markdown. Sections: Identity, Mission, Agents Managed, Human Decisions, QA, Escalation, Outcomes, Cadence, Tools, KPIs, Career Path. Share with Jake EOD with specific review questions.

Day 4 · Thu

Jake review session

30-min review. Validate against Pillar 6: Agent Manager framing, Ownership Operator culture, AEO awareness. Capture edits.

Day 5 · Fri

Sign-off + Knowledge Base

Apply Jake's edits. Final sign-off from Jake + Phuong Anh. Save to /agency-os/role-cards/seo-executive.md. Begin Phase 1.1 immediately — rubric anchored to signed role card.

Day 6 · Sat

Rubric build

Open the Agent Manager Rubric Template. 45% baseline locked. Allocate 55% to SEO Executive role-specific dimensions per the role card. Send to Jake for weight sign-off.

Day 7 · Sun

Calibration prep

Pull 3 past SEO Executive candidates from ClickUp HR list. Begin compiling calibration examples in new format (4 baseline + role-specific scores + outcome reasoning).

Open Questions for Jake
Worth Resolving Before Week 2
Q1
Sequencing

AI Auto Dev pipeline status

Is current AI Automation Developer hiring being scored against the original rubric? If yes, do we pause and re-score against the Agent Manager Rubric Template, or proceed and hire under the old framework? Recommend pause.

Q2
Role Cards

Designer role card scope

Tram is freelance UI/UX. Does she sit inside the agent fleet (with a role card + KPIs) or outside it (vendor-style relationship, scope-of-work only)? Affects whether she gets a role card in week 5.

Q3
Track B

Missing OKR diagnosis

Original B0.2 flagged 5 teams missing OKRs. Is the gap missing leadership, missing SOWs, or missing role definitions? The fix differs for each. Diagnose before chasing OKR entries.

Q4
G6 Velocity

Sprint 1 velocity baseline

Recommend using existing ClickUp throughput data (last 4 weeks, tasks per person) for G6 baseline by Day 30. Alternative is pushing G6 closure to Day 45 to wait for OKR data. Jake's call.

North-star outcome at end of 90 days

By Day 90: 9 role cards signed, 5+ rubrics calibrated and in active use, 1 successful pilot CV batch with ≥80% lead agreement, all 7 departments with signed SOWs, all 7 teams with OKRs in ClickUp, 6+ bi-weekly retros completed with Claude pre-reads, 2 monthly performance summary cycles delivered. Track A operational at scale. Track B operational with monthly cadence. Together they form the HR layer of the centaur agency — every new hire scored for Agent Manager fit, every existing team member measured on agent leverage.