HR Implementation Plan 2026
Built for the Centaur Agency
A two-track HR system that hires and manages talent for the AI-native model SEO Navigator is becoming. Track A turns ClickUp + Claude into a recruitment pipeline. Track B turns ClickUp + Claude into an OKR performance system. Both are built around the Agent Manager role definition — the new shape of every job in this agency.
Same doctrine as the rest of the agency. Claude does the throughput — scoring, drafting, surfacing patterns, summarizing data. Humans own the judgment — final hire decisions, performance conversations, escalation, culture. Email automation lives in ClickUp, never in Claude.
HUMAN — The Reins
Hire / no-hire decisions. Performance conversations. Salary negotiation. Culture coaching. Escalation. Final approval on every rubric, summary, and OKR. Ngọc reviews every Claude draft before it reaches a manager or candidate.
AI — The Throughput
CV scoring against rubric. Calibration-based ranking. Auto-flag detection. Pre-retro ClickUp data analysis. Monthly performance summary drafts. Pattern surfacing across batches. Draft-only outputs — never sent to anyone without Ngọc's review.
What this plan revises from the original
The original HR plan operationally executes the centaur model well. Three corrections are folded in here. First, role cards (Sprint 1, G3 Role Redefinition) now precede rubrics, OKRs, and SOWs — so every downstream artifact references a validated role definition, not a guess.
Second, the hiring rubric reserves 45% of every score for AI-native dimensions: AI fluency, Agent Manager capability, Ownership Operator mindset, and QA discipline. The remaining 55% goes to role-specific skills. Third, performance KPIs now measure agent leverage — hours recovered, override rate, agents managed — not just output volume.
Role Cards First
Phase 0.5 inserted before original Phase 1. One pilot role card drafted and signed in 5 days. The remaining 8 in parallel with Phase 3 scale.
45% AI-Native Baseline
Every rubric reserves 45% for AI fluency, Agent Manager capability, Ownership Operator mindset, and QA discipline — across all role families.
Agent Leverage Metrics
Monthly summaries measure hours recovered through agents, override rate, and agent oversight quality — not just output volume.
Every downstream artifact must reference an upstream definition. Build a rubric without a role card and you're scoring for the wrong competencies. Build OKRs without a SOW and you're targeting the wrong outcomes. Build a KPI scorecard without OKRs and you're measuring noise.
Dependency Chain — Visual Flow
Original Sequence
- Build evaluation rubric for pilot role
- Compile 3 calibration examples
- Define auto-flag list
- Run pilot CV batch
- Lead reviews ranking
- Scale to other role families
- Lock OKR taxonomy
- Collect OKRs from 5 missing teams
Risk: rubric and OKRs are built before role cards exist. Re-work guaranteed once Sprint 1 lands role definitions on Day 30.
Revised Sequence
- NEW: Draft pilot role card (Days 1–4)
- NEW: Validate with lead + Jake (Day 5)
- Build rubric anchored to role card
- Compile calibration examples (now scoring 4 baseline + role-specific dimensions)
- Auto-flag list (extended for AI-native fit)
- Run pilot CV batch
- NEW: Confirm or write SOWs for all 7 departments
- Lock OKR taxonomy + collect OKRs (anchored to SOWs)
Adds ~5–7 days to Phase 1 — eliminates re-work and ensures every artifact downstream is grounded in a validated upstream definition.
12-Week HR Timeline · Overlaid with Transformation Sprints
Risk this revision doesn't eliminate
Sprint 1's G6 Velocity baseline expects measurement by Day 30. This revised sequence pushes OKR data flow into Week 6+, which is too late to feed G6. Recommended workaround: use task-throughput data from existing ClickUp lists (tasks completed per person per week, last 4 weeks) as the velocity baseline. It's grounded in real work happening today, not aspirational OKRs. Push G6 closure to Day 45 only if Jake disagrees.
A role card defines what each role does in the AI-native model: which agents the person manages, what only humans can decide, what they review and approve, when to escalate, and what they're accountable for. Without role cards, every downstream artifact (rubrics, OKRs, KPIs) is scoring or measuring against a guess.
Phase 0.5 — Role Cards (Days 1–7)
Sprint 1 of the transformation plan promises 9 role cards covering "agents managed, human decisions, QA, escalation." This phase delivers the pilot role card so HR Track A can begin Phase 1.1 with a validated reference. The remaining 8 cards are drafted in parallel with Phase 3 scaling, weeks 3–5.
Who the person is
- Role title + reports-to
- Mission (one sentence — why this role exists)
- Career path / level definitions
What they manage
- Agents managed (PM Pulse, Sentinel, Catalyst…)
- Tools used (ClickUp, Slack, GHL, Skills, MCP)
- Cadence (daily / weekly / monthly rhythm)
What only humans decide
- Human-only decisions — never delegated
- QA responsibilities — what they review/approve
- Escalation rules — when, to whom, on what timeline
Day 1 · Mon
Pick the pilot
Confirm with Jake which role goes first. Recommend SEO Executive — most active hiring + highest agent leverage (Content Catalyst + Sentinel both touch this role).
Day 2 · Tue
Working session
90-min session with Phuong Anh (SEO Team Leader). Walk through all 11 sections of the role card template. Capture verbatim — don't polish yet.
Day 3 · Wed
Draft v1
Polish session notes into structured markdown. Confirm which agents the role manages from the Sprint 2 plan (Sentinel + Catalyst confirmed). Share with Jake EOD.
Day 4 · Thu
Jake review
Validate against Pillar 6 transformation plan. Check Agent Manager framing, Ownership Operator culture, AEO awareness. Capture edits.
Day 5 · Fri
Sign-off + KB
Apply edits. Final sign-off from Jake + Phuong Anh. Save to Knowledge Base at /agency-os/role-cards/seo-executive.md
Day 6 · Sat
Begin Phase 1
Start original Phase 1.1 — but rubric is now anchored to the signed role card. Use the Agent Manager Rubric Template as the structural baseline.
Day 7 · Sun
Calibration prep
Begin compiling 3 calibration examples in the new format (4 baseline dimensions + role-specific). Pull past hires from ClickUp HR list.
Pilot in week 1. Remaining 8 across weeks 3–5, parallel with Phase 3 scaling. Each follows the same template, same gate, same Knowledge Base location.
SEO Executive
Phuong Anh oversight · Manages Sentinel + Catalyst · Drafts to Ngọc by Day 5.
SEO Team Leader
Phuong Anh herself · Coordinator-level role · Manages PM Pulse + oversees specialists.
Content Executive
Manages Content Catalyst directly · Highest writing-skill weight · Editorial judgment.
Google Ads Specialist
Tung's role · Manages Ad Arbitrage agent · Stewards Google Ads skill chain.
Meta Ads Specialist
Manages 8-agent Campaign Accelerator pipeline · 3 human approval gates.
CRM / GHL Specialist
Manages Revenue Relay · Owns SNMS 6-Pillar deployment · GHL workflow QA.
UI/UX Designer
Tram (freelance) · Brand systems · Asset oversight · Manager TBD.
AI Automation Developer
Trung's role + new hire · Builds the agents others manage · Highest AI Fluency weight.
HR / Operations
Bích Ngọc's own role · Agent Manager for the agency itself · Track A + B owner.
Open question for Jake
Two of these roles need clarification before drafting begins. Designer — Tram is freelance; who does she report to, and is she actually in scope for an Agent Manager role card, or does she sit outside the agent fleet? Department leads — for the 5 teams currently missing OKRs (per original B0.2), is the gap missing leadership, missing SOWs, or missing role definitions? The role card is downstream of that diagnosis.
CVs enter through a ClickUp form. Claude scores each CV against a rubric anchored to the role card, in a single thread per JD that retains context across batches. ClickUp automation handles all candidate emails through status triggers — Claude never sends.
Track A Recruitment Flow
ClickUp MCP access for Ngọc
Claude cannot read ClickUp candidate data without this connection. Verified Ngọc's ClickUp account has read access to all hiring-related Lists. MCP active in Claude Cowork. Track A cannot run without this.
- MCP reads ClickUp data successfully
- Tested: Claude returns task lists from one Hiring List
ClickUp form fields mapped to rubric
Form questions directly determine how fast and accurately Claude can assess candidates without opening every CV. Reviewed and updated for current open role; same pattern reused for future role families.
- 5–7 questions max to preserve completion rate
- One question per rubric dimension that can't be inferred from CV alone
Same calibration discipline as the original plan. The change: every step now opens the role card first, and the rubric uses the Agent Manager template baseline (45% reserved for AI-native fit).
Build evaluation rubric (anchored to role card)
Open the signed role card. Build the rubric to score for the competencies the role card lists under agents managed, human decisions, QA responsibilities, and outcomes. Use the Agent Manager Rubric Template as the structural baseline. 45% reserved for AI-native dimensions; 55% allocated to role-specific competencies named in the role card.
- Rubric file:
/agency-os/hiring/rubrics/[role].md - Jake sign-off on weights before any candidate is scored
- Linked to role card path in front matter
Compile 3 calibration examples (new format)
Pull 3 past candidates: one excellent hire, one poor fit, one almost-passed-but-failed. New format scores each on the 4 baseline dimensions plus role-specific dimensions. For past hires where AI-native scores can't be reconstructed from the CV, estimate from observed first-90-day performance and note the estimate.
- Anonymise · 2–3 sentence outcome reasoning per example
- Stored alongside rubric file
- Single biggest accuracy lever — without these, scoring is generic
Auto-flag list (extended for AI-native fit)
Original flags retained (career gaps, salary mismatch, English mismatch, no measurable results, job-hopping, missing portfolio). Extended with 5 new AI-native flags: zero AI exposure + no curiosity (deal-breaker), can't describe correcting AI output, treats AI as magic-or-useless, all work history task-assigned, no self-review process.
- 3+ flags from AI-native list = automatic rejection
- Separate from deal-breakers (which ARE disqualifying — Jake sign-off)
Pipeline status flow + automation
Map stages: New Application → Screening → Shortlisted → Interview 1 → Interview 2 → Offer → Hired / Rejected. Per transition: does it trigger an automated email? If yes, draft template. ClickUp automation triggers per status change. Claude does NOT send emails.
- Status flow documented
- Automation triggers tested in ClickUp
- Email templates per transition reviewed by Jake
Pilot Batch — Validate Scoring Quality
Open a Claude thread for the pilot JD. Paste full context: JD, rubric, 3 calibration examples, flag list, ClickUp candidate list URL. Save thread URL — reuse for all future batches for this JD. Run scoring; ask lead "Would you advance these 3 for an interview? Is anyone missing?" If ≥80% agreement: system works. If not: rubric weights adjusted; new disagreement becomes a new calibration example.
- One Claude thread per JD — never re-open
- Lead review uses rankings only, not full scoring tables
- Every disagreement → new calibration entry
- Email automation re-tested with one real candidate moved through all statuses
Scale to Other Role Families
For each role family (Ads, Dev/IT, Design, PM, Sales/Strategy): repeat steps 1.1–1.2 with the relevant lead. 2–3 calibration examples minimum per family. Prioritise roles with active or frequent hiring first. JD template + interview guide templates finalised. Salary benchmarks documented per role with source dates.
- Rubric file per role family — anchored to its role card
- Interview guide: 8 questions covering each rubric dim + 1 follow-up
- Salary benchmarks dated, source-referenced, refreshed quarterly
Every rubric across every role family reserves 45% of the score for the four AI-native dimensions: AI Fluency, Agent Manager Capability, Ownership Operator Mindset, QA Discipline. The remaining 55% is allocated by role to technical skills, communication, culture, and salary alignment.
The Score Composition · 45% Baseline + 55% Role-Specific
What it measures
Has the candidate actually used AI in real work, iterated prompts, and developed a working mental model of where AI helps vs. where it fails? Daily users score 7–8. Builders score 9–10. No exposure + no curiosity = deal-breaker.
Interview Probes
Auto-flag — Zero AI exposure AND no curiosity to learn. Deal-breaker for any role.
What it measures
Can they oversee AI output, catch errors, define what to delegate vs. own, and refine prompts to improve quality? This is the core competency for the new role shape — every team member becomes an Agent Manager regardless of function.
Interview Probes
Auto-flag — Cannot describe a single time they corrected AI output. Likely ships unreviewed work.
What it measures
Do they own outcomes (not tasks), drive things to completion without being chased, and surface problems before they escalate? This is the agency culture filter — directly from Pillar 6 of the transformation plan.
Interview Probes
Auto-flag — Every example given is a task assigned by a manager. No self-initiated work.
What it measures
Do they review work before shipping, catch inconsistencies, and ask "how do we know this is right?" Lower weight than the first three because partly correlated with Agent Manager Capability — but distinct enough to score separately.
Interview Probes
Auto-flag — No examples of self-review. No process for catching mistakes.
Same 45% baseline. Remaining 55% redistributed by role. Use as starting points; adjust with the relevant team lead.
| Dimension | SEO Executive | Google Ads Specialist | Content Executive | AI Auto Developer |
|---|---|---|---|---|
| AI Fluency | 15% (baseline) | 15% | 15% | 20% ↑ |
| Agent Manager Capability | 15% | 15% | 15% | 20% ↑ |
| Ownership Operator | 10% | 10% | 10% | 10% |
| QA Discipline | 5% | 5% | 5% | 10% ↑ |
| Role-specific tech skills | 20% (Tech SEO) | 20% (Platform) | 25% (Writing) | 20% (Python/MCP) |
| Secondary skill | 10% (Writing) | 10% (Analytics) | 10% (Editorial) | 10% (Systems) |
| Tools / domain | 5% (Screaming Frog, GSC) | — | 5% (Vertical) | — |
| English | 10% | 10% | (in writing) | 5% |
| Numeracy | — | 5% | — | — |
| Salary alignment | 10% | 10% | 15% | 5% |
| TOTAL | 100% | 100% | 100% | 100% |
Calibration Example — Updated Format
Each calibration example must score the four baseline dimensions plus role-specific. Format: anonymous ID, outcome (excellent / poor / almost-passed), role family, baseline scores with one-sentence reasons each, role-specific scores, and a 2–3 sentence "why this outcome" closer that includes how the AI-native dimensions played out in practice. For past hires where AI-native scores can't be reconstructed from the CV: estimate from observed first-90-day performance and note the estimate.
Custom fields in ClickUp capture KR progress. KR owners update Current Value before each retro. Claude reads the data the day before the meeting and surfaces patterns. The retro itself is a human conversation. Monthly summaries draft per team member, reviewed by Ngọc, used by managers as 1:1 prep.
Track B Performance Flow · Bi-Weekly Cadence
Why this is inserted
The Agency OS WS6 rubric is explicit: "Department SOW exists and is signed" is a hard dependency before KPI scorecards begin. The original B0 jumped straight to OKR collection without checking SOW status. If a department lead can't articulate their SOW in 60 minutes, that's the deeper issue — and it likely also explains why their OKRs are missing.
SOW Audit
Audit which of the 7 departments have signed SOWs. Document: department, owner, signed date, scope summary, gaps.
Draft Sessions
For departments without SOWs: schedule 60-min session with the lead. Each SOW references the relevant role card(s). If the lead can't articulate the SOW in the session, escalate.
Sign-off + KB
Jake sign-off per SOW. Store at /agency-os/sows/[department].md Linked from each role card.
OKR Structure Setup
Lock OKR custom field taxonomy in ClickUp. Field names never change after this point. Collect OKRs from all 7 teams (anchored to SOWs + role cards). Build CEO dashboard with progress bars, status breakdown, last-updated timestamps.
- Custom fields: Objective, KR, Owner, Target, Current, Progress %, Status, Cadence
- NEW field: Linked Role Card — text path to role card markdown
- Single ClickUp dashboard view shared with Jake
Bi-Weekly Retro Cadence
Recurring 45-min meeting. Pre-read updates by KR owners by Friday 5pm prior. Claude pre-read analysis morning of (Ngọc reviews before sharing). Fixed agenda: 10/20/10/5 split — On Track, At Risk, Actions, OKR updates. Action items captured live, entered as ClickUp tasks with owner + due date.
- KR owner updates Current Value + Status before retro
- Slack reminder 24hr before each retro
- Claude analysis is INPUT — leads decide
Monthly Performance Summaries
Per-role KPI definitions signed off by leads. Monthly template: KR progress, task completion, wins (with evidence), slippage (factual), 1:1 talking points, recommended focus. Claude drafts; Ngọc reviews before manager sees; manager uses as prep.
- Claude drafts. Manager owns the conversation.
- Ngọc review = HITL gate before delivery
- Claude flags sparse data rather than inferring
The original KPI examples (briefs delivered, on-time rate, CQS, satisfaction) measured output volume only. The revised set adds agent leverage — the metrics that align individual reviews with the agency's north-star claim of ~165 weekly hours recovered through AI agents.
| Role | Output KPIs (kept) | Agent Leverage KPIs (new) |
|---|---|---|
| SEO Executive | Briefs delivered/mo · On-time rate · CQS score MoM | Hours recovered via Sentinel + Catalyst · Override rate · Agent QA pass rate |
| Content Executive | Articles published · Editorial revisions count · Voice consistency | Catalyst draft acceptance rate · Time per published article (Claude vs human) |
| Google Ads Specialist | Campaign launches · CPA delta · ROAS | Ad Arbitrage report quality · Skill chain runs/mo · Wasted spend caught by AI |
| CRM Specialist | Workflows shipped · Lead-to-appointment rate | Revenue Relay messages reviewed · Conversation AI override rate · Reviews AI accuracy |
| SEO Team Leader | Team velocity · Client retention · Project on-time | PM Pulse coordination quality · Cross-team blockers resolved · Team agent fluency score |
Concrete day-by-day for Week 1 of the revised sequence. If the AI Automation Developer hiring is currently running on the original rubric, pause that pipeline and re-score against the Agent Manager Rubric Template before any candidate advances. Better to lose three days than hire someone whose strongest skill the rubric didn't measure.
Day 1 · Mon
Pilot role decision
30-min sync with Jake. Confirm SEO Executive as pilot (recommended). Schedule Day 2 session with Phuong Anh. If AI Auto Dev hiring is mid-flight: pause and queue for re-score.
Day 2 · Tue
Role card session
90-min working session with Phuong Anh. Walk through 11 sections. Capture verbatim. Confirm which agents the SEO Executive role manages (Sentinel + Catalyst per Sprint 2 plan).
Day 3 · Wed
Draft v1 + review prep
Polish notes into structured markdown. Sections: Identity, Mission, Agents Managed, Human Decisions, QA, Escalation, Outcomes, Cadence, Tools, KPIs, Career Path. Share with Jake EOD with specific review questions.
Day 4 · Thu
Jake review session
30-min review. Validate against Pillar 6: Agent Manager framing, Ownership Operator culture, AEO awareness. Capture edits.
Day 5 · Fri
Sign-off + Knowledge Base
Apply Jake's edits. Final sign-off from Jake + Phuong Anh. Save to /agency-os/role-cards/seo-executive.md. Begin Phase 1.1 immediately — rubric anchored to signed role card.
Day 6 · Sat
Rubric build
Open the Agent Manager Rubric Template. 45% baseline locked. Allocate 55% to SEO Executive role-specific dimensions per the role card. Send to Jake for weight sign-off.
Day 7 · Sun
Calibration prep
Pull 3 past SEO Executive candidates from ClickUp HR list. Begin compiling calibration examples in new format (4 baseline + role-specific scores + outcome reasoning).
AI Auto Dev pipeline status
Is current AI Automation Developer hiring being scored against the original rubric? If yes, do we pause and re-score against the Agent Manager Rubric Template, or proceed and hire under the old framework? Recommend pause.
Designer role card scope
Tram is freelance UI/UX. Does she sit inside the agent fleet (with a role card + KPIs) or outside it (vendor-style relationship, scope-of-work only)? Affects whether she gets a role card in week 5.
Missing OKR diagnosis
Original B0.2 flagged 5 teams missing OKRs. Is the gap missing leadership, missing SOWs, or missing role definitions? The fix differs for each. Diagnose before chasing OKR entries.
Sprint 1 velocity baseline
Recommend using existing ClickUp throughput data (last 4 weeks, tasks per person) for G6 baseline by Day 30. Alternative is pushing G6 closure to Day 45 to wait for OKR data. Jake's call.
North-star outcome at end of 90 days
By Day 90: 9 role cards signed, 5+ rubrics calibrated and in active use, 1 successful pilot CV batch with ≥80% lead agreement, all 7 departments with signed SOWs, all 7 teams with OKRs in ClickUp, 6+ bi-weekly retros completed with Claude pre-reads, 2 monthly performance summary cycles delivered. Track A operational at scale. Track B operational with monthly cadence. Together they form the HR layer of the centaur agency — every new hire scored for Agent Manager fit, every existing team member measured on agent leverage.