7 gaps × 13 actions + 7 workstreams → sequenced into weekly sprints with time allocations. Target: agent fleet operational by July 2026.
You own 24 of 62 tasks but most of your work is decisions and reviews, not execution. Total Jake commitment: ~57 hours across 12 weeks (~4.5 hrs/week). Total delegated: ~117 hours across your team.
| Sprint | Week | Gaps | Focus | Jake | Team |
|---|---|---|---|---|---|
| S1 | W1 | G2 Knowledge Base · G3 Role Redefinition | SOP Audit · Schema · 9 Role Cards · Workshop Prep | 9.5h | 7h |
| W2 | G3 Role Redefinition · G5 AEO Layer · G6 Velocity | Workshop · Check-ins · AEO + Velocity Approval | 6h | 22h | |
| W3 | G2 Knowledge Base · G3 Role Redefinition | SOP Validation 1–5 · Remaining Check-ins | 4h | 24h | |
| W4 | G2 Knowledge Base · G1 Agent Fleet | SOP Validation 6–10 · Agent Architecture Sketch | 4.5h | 5h | |
| S2 | W5 | G1 Agent Fleet Architecture | System Prompts — All 6 Agents | 6h | 6h |
| W6 | G1 Agent Fleet · G7 Outbound + Brand | Agent Testing · Outbound Flywheel | 6h | 12h | |
| W7 | G4 Hybrid OS · G1 Agent Fleet | Hybrid OS Package · Test Review · Registry | 6h | 10h | |
| W8 | G4 Hybrid OS · G7 Outbound + Brand | Beta Enrollment · Case Studies · Sprint Close | 3h | 8h | |
| S3 | W9–10 | G1 Agent Fleet · G7 Outbound + Brand | Agent Health Report · Positioning | 6h | 11h |
| W11–12 | G4 Hybrid OS · G7 Outbound + Brand | Beta Go/No-Go · Video · Final Scorecard | 5h | 11h |
| Role | S1 | S2 | S3 | Total | Key Work |
|---|---|---|---|---|---|
| Senior Team Lead | 24h | — | — | 24h | SOP conversion (all 10 to structured markdown) |
| PM / Account Strategist | 15h | 7h | 9h | 31h | Velocity tracking, ClickUp config, Agent Registry, beta onboarding, feedback |
| IT Lead | — | 15h | 3h | 18h | Agent wiring, n8n automation, GHL Snapshot, prompt refinements |
| Content Lead | — | 8h | 4h | 12h | One-pager, case studies, social publishing, video production |
| SEO Lead | 9h | — | — | 9h | AEO gate, Pillar 3 SOP, team training, validation |
| Each Pillar Lead | — | 6h | 6h | 12h | Agent testing, performance data collection |
| All Team Members | 2h | — | — | 2h | Workshop participation |
After Week 12, the transformation infrastructure is built. ~57 hours of Jake-time and ~117 hours of team-time builds the full operating system. Everything after is iteration — refining Skills, expanding orchestration, scaling Hybrid OS if greenlit.
The 7 internal workstreams are the operational engine that runs alongside the 7 strategic gaps. Each workstream feeds into one or more gaps — its AI tasks become the execution layer for the transformation. Below, each workstream shows which gaps it directly supports, what week it activates, and which tasks are AI-automatable vs. human-owned.
| Workstream | G1 Agent Fleet | G2 Knowledge Base | G3 Role Redef. | G4 Hybrid OS | G5 AEO | G6 Velocity | G7 Outbound |
|---|---|---|---|---|---|---|---|
| WS01 Workflow + SOP | ● | ●●● | ●● | ||||
| WS02 Onboarding | ● | ●● | |||||
| WS03 Communication | ●● | ● | |||||
| WS04 Internal Brain | ●●● | ●● | |||||
| WS05 Templates | ● | ●●● | ● | ||||
| WS06 Performance | ●● | ● | ● | ||||
| WS07 Culture | ●● | ● |
●●● = primary driver · ●● = strong support · ● = contributes
The 7 workstreams are the internal operational backbone. The 7 gaps are the strategic transformation targets. They're not separate — every workstream directly feeds at least 2 gaps. Execute the workstreams and the gaps close automatically.
Claude Managed Agents replaces your current DIY agent infrastructure. Instead of building your own agent loops, sandbox, tool wiring, and error recovery, Anthropic runs all of that on their infrastructure. You define the agent (system prompt, tools, MCP servers, Skills), define the environment (packages, network), and start sessions. The agent runs autonomously for hours, persists through disconnections, and self-recovers from errors.
Managed Agents compresses Sprint 2 (Agent Fleet deployment) from 4 weeks to ~1 week. The infrastructure you were going to build (sandbox, error recovery, state management, tool orchestration) is now handled by Anthropic. Your Sprint 2 work becomes: define agents → test → deploy. That frees up ~15 hours of Jake-time and ~20 hours of IT Lead time.
| Day | Agent | Deploy Time | Owner | What Changes vs. Original Plan |
|---|---|---|---|---|
| 31 | PM Pulse (P6) | 1 day | Jake + PM | Deploy first — it's the coordinator. Daily health digests start immediately. |
| 32 | SEO Sentinel (P1) | 1 day | Jake + SEO Lead | Highest hours recovered. Overnight rank monitoring sessions start Day 33. |
| 33 | Content Catalyst (P2) | 1 day | Jake + Content Lead | Content briefs + AEO audit run as single session. No separate AEO step needed. |
| 34–35 | Revenue Relay (P3) | 2 days | Jake + CRM Lead | GHL MCP auth setup takes an extra day. Pipeline monitoring starts Day 36. |
| 36–37 | Ad Arbitrage (P4) | 2 days | Jake + Ads Lead | Most complex (10 Skills). Multi-agent coordination: coordinator + sub-agents. |
| 38 | Build Bot (P5) | 1 day | Jake + IT Lead | Simplest agent. Lighthouse CLI in environment. Schema audits on-demand. |
Managed Agents' multi-agent coordination (research preview) maps directly to your Agency OS coordinator/specialist architecture. The PM Pulse agent becomes the meta-coordinator that can delegate to specialist agents:
Claude Managed Agents is the infrastructure answer to Gap 1 (Agent Fleet Architecture). Instead of spending Sprint 2 building sandboxes and error recovery, you spend 8 days defining and deploying 6 agents on Anthropic's hosted infrastructure. The remaining Sprint 2 time shifts to testing, refinement, and Hybrid OS work — accelerating the entire 90-day timeline by 2–3 weeks.
We mapped every UAT checklist item across all 7 teams (334 total checks) against Claude Managed Agent capabilities. Result: 239 fully automated, 77 semi-automated, 18 manual-only. Each department gets a dedicated UAT agent triggered by ClickUp status change, gating publishing, and reporting to Slack. Total cost per full-site UAT: ~$9.65.
■ Auto ■ Semi ■ Manual | SEO covers 9 SOP categories + QueryMind pipelines · Content includes QueryMind Planner + Jobs
| Pre-Publish — On-Page + Schema (12) | Status | Agent Method |
|---|---|---|
| Topic validated against QueryMind coverage gaps + GSC opportunity | AUTO | Pulls GSC data + QueryMind gap_label, ranks by search volume × gap priority |
| URL structure (no duplicate/incorrect slugs) | AUTO | Crawls sitemap, checks duplicates, trailing slashes, casing |
| Title & Meta Description complete + unique | AUTO | Parses <title> + meta, checks length, flags duplicates |
| Heading hierarchy (H1→H2→H3) | AUTO | Validates single H1, sequential nesting, no skips |
| Sitemap.xml accessible + valid | AUTO | Fetches /sitemap.xml, validates XML, checks URLs return 200 |
| Robots.txt not blocking important pages | AUTO | Parses Disallow rules vs. sitemap URLs |
| Canonical tags correct (self-referencing) | AUTO | Checks self-referencing canonical per page |
| Internal links working (no 404s) | AUTO | Crawls all links, reports 404s/500s with source |
| Images have alt text | AUTO | Parses <img> tags, flags missing/empty alt |
| Page speed acceptable (Lighthouse) | SEMI | Runs Lighthouse — human confirms threshold per project |
| Schema markup valid (JSON-LD) | SEMI | Validates JSON-LD syntax — human checks business logic |
| Index/noindex correct | AUTO | Checks meta robots + X-Robots-Tag headers |
| Post-Publish (5) | Status | Agent Method |
|---|---|---|
| Request indexing via GSC API + update ClickUp status | AUTO | GSC URL Inspection API → request indexing, update ClickUp task to "Index Requested" |
| Share published page on GBP | SEMI | Agent drafts GBP post copy + image — human publishes (no GBP write API) |
| Verify redirects & canonical in production | AUTO | Follows redirect map, checks 301 + canonical on live URL |
| CQS audit triggered on published content | AUTO | Chains to auditor-uat-agent → ContentAudit record created |
| Monitor ranking changes for published pages (7-day window) | AUTO | SEO Utils MCP, flags drops >3 positions vs. baseline |
| GBP Monthly Operations (4) | Status | Agent Method |
|---|---|---|
| Check GBP performance (Maps, Local Pack, keyword movement) | AUTO | SEO Utils MCP grid reports, WoW rank delta, flags drops >3 positions → Slack summary |
| Optimize services, products & categories | SEMI | Agent audits GBP vs. competitors (web scrape), proposes changes — human applies in GBP dashboard |
| Manage GBP content (posts, photos, videos) | SEMI | Agent generates post copy from templates + Content AI — human publishes via GBP or GHL Social Planner |
| Monitor reviews status | AUTO | Checks GHL Reputation dashboard or scrapes Google reviews, flags new negatives → Slack alert. Chains to SNMS P6 Reviews AI. |
| GSC Monthly Operations (4) | Status | Agent Method |
|---|---|---|
| Monitor performance & trends (clicks, impressions, CTR, position) | AUTO | GSC API → compares vs. prior period, flags anomalies >15% change → Slack + ClickUp |
| Check indexing & sitemap status | AUTO | GSC Sitemaps API + URL Inspection API for key pages, flags "Not Indexed" with reasons |
| Audit technical issues (404s, redirects, crawl budget) | SEMI | Agent crawls site (bash + curl), checks 404s, redirect chains, duplicate titles — human implements fixes |
| Monitor crawl errors (ongoing alerts) | AUTO | GSC crawl stats API, alerts on new 404s/5xx errors → Slack |
| GMC Monthly Operations (5) | Status | Agent Method |
|---|---|---|
| Sync Shopify inventory & Google Sheets | AUTO | Shopify API → export inventory → write to Google Sheets. Fully scriptable in managed container. |
| Check item availability (stock ≤0 detection) | AUTO | Queries Shopify inventory API, flags items with stock ≤0, cross-references GMC feed |
| Out of stock update (SKU reconciliation GMC ↔ Shopify) | AUTO | GMC Content API export → Shopify check → match SKUs → update GMC status → log to report sheet |
| Fix GMC errors (missing fields, policy violations) | SEMI | Pulls GMC diagnostics API, proposes fixes — human applies policy-related fixes |
| Product optimization (AI Carousel Optimizer) | AUTO | AI Carousel Optimizer skill: EAV descriptions, custom label mapping, title optimization. Batch 40 products/session. |
| Monthly Report Generation (3) | Status | Agent Method |
|---|---|---|
| Review task overview + embed ClickUp deliverables on CRM | AUTO | ClickUp MCP → query completed tasks, generate summary, embed in GHL custom field or Google Doc |
| Export rank tracking report (SEO Utils) & embed on CRM | AUTO | SEO Utils MCP → rank tracking data export → generate report (xlsx/HTML) → upload to Drive → embed link in CRM |
| Rerun grid reports, export & embed on CRM | AUTO | SEO Utils MCP → trigger grid re-run → wait for completion → export → embed. Chains with previous step for single-pass monthly report. |
| Citation Operations (2) | Status | Agent Method |
|---|---|---|
| Audit current citations (NAP consistency across directories) | AUTO | Local SEO Automation toolkit: scrapes major directories, compares NAP, generates audit report |
| Submit new citations to directories | MANUAL | Requires CAPTCHA, email verification, human login on 20+ directories. Agent prepares data package only. |
| SEO NEO Operations (4) | Status | Agent Method |
|---|---|---|
| Project pre-assembly & setup (ClickUp + tool config) | SEMI | Agent creates ClickUp structure from template, populates config — human provides strategy inputs |
| Set up & run heatmap | SEMI | Agent triggers heatmap tool via API if available — human reviews visual output |
| Run SEO NEO campaigns (GBP Blast, Snipper, DAS, RD100) | SEMI | Agent orchestrates via tool APIs where available — human monitors execution |
| Monitor grid ranking changes post-campaign | AUTO | SEO Utils MCP → automated grid pulls + delta tracking vs. pre-campaign baseline |
| New Project Audit (6) | Status | Agent Method |
|---|---|---|
| KW analysis (seed → clustering via QueryMind topical map) | AUTO | QueryMind pipeline: seed → expansion → clustering → CORE/OUTER. Chains to topical-map-uat-agent. |
| City analysis (local SEO grid per market) | AUTO | SEO Utils MCP + Local SEO Automation: city-level grid data, generates comparison report |
| Grid report + Whitespark/SEO Utils rank report | AUTO | SEO Utils MCP → automated grid pull + rank tracking export. Being consolidated into single report. |
| LLM visibility check (ChatGPT, Gemini, Perplexity) | AUTO | AI Visibility Audit skill: queries AI search engines for business presence, scores visibility |
| AI ranking check across AI search engines | AUTO | AI Visibility Audit skill: checks ranking position in AI-generated results |
| Analyze data & build SEO game plan | SEMI | Agent synthesizes all audit data into structured report with recommendations — human makes strategic decisions on priorities/budget |
| GBP New Project Optimization (4) | Status | Agent Method |
|---|---|---|
| Foundation content setup + competitor analysis | AUTO | Local SEO Automation: scrapes competitor GBP data (categories, hours, services, reviews), generates gap report |
| GBP content preparation (posts, service descriptions, Q&A) | SEMI | Agent drafts all content from templates + Content AI — human reviews before publishing |
| GBP optimization (fields, categories, attributes) | SEMI | Agent generates optimized field values — human applies in GBP dashboard (no write API) |
| Final review & sign-off | SEMI | Agent generates checklist completion report — human strategic sign-off |
| CSI & Clustering Quality (10) | Status | Agent Method |
|---|---|---|
| CSI defined: CE, SC, CSI fields populated in workspace | AUTO | Queries workspace settings → checks CE/SC/CSI non-null |
| Seed keyword produces ≥50 expanded keywords | AUTO | Counts ClusterKeyword records → flags <50 |
| No single-keyword clusters (min 3 per cluster) | AUTO | Queries cluster sizes → flags any <3 |
| No mega-cluster (>30% of total in one cluster) | AUTO | Calculates max cluster % → flags >30% |
| All clusters have CORE/OUTER classification | AUTO | Checks cluster.type field → flags NULL |
| Cluster names are descriptive (not generic) | SEMI | Checks for generic patterns — human validates semantic quality |
| No orphan keywords (all assigned to a cluster) | AUTO | Counts keywords with NULL cluster_id = 0 |
| Priority labels (P1–P4) assigned from gap analysis | AUTO | Checks priority field distribution, flags >50% NULL |
| CORE clusters cover central entity's primary topics | SEMI | Compares CORE names vs. CSI — human confirms relevance |
| CSV import handles edge cases (duplicates, encoding, empty rows) | AUTO | Uploads test CSV with known issues, checks import count |
| Coverage & Gap Tracking (8) | Status | Agent Method |
|---|---|---|
| Coverage baseline calculated (CORE published / total CORE) | AUTO | Validates coverage % against manual count |
| Content gaps identified (covered/gap/unique labels) | AUTO | Checks gap_label populated ≥80% of keywords |
| Published URLs linked to ClusterKeyword records | AUTO | Cross-references published_url with site crawl |
| Coverage % updates on new publish | AUTO | Simulates publish → checks recalculation fires |
| Visualization renders all clusters | AUTO | Checks Vue Flow node count matches DB cluster count |
| CORE clusters show correct keyword counts | AUTO | Compares UI badges against database counts |
| Gap keywords visually distinct from covered | SEMI | Checks CSS status classes — human validates clarity |
| Search volume + KD populated where available | AUTO | Checks search_volume NULL rate, flags >30% missing |
| Pre-Audit Validation (4) | Status | Agent Method |
|---|---|---|
| Source URL accessible (200, content extractable ≥500 words) | AUTO | Fetches URL, checks status + text length |
| Content extraction successful (Readability output valid) | AUTO | Checks extraction output length and quality |
| SERP benchmark fetched (≥5 competitors if keyword provided) | AUTO | Counts SERP results + competitor scrape success |
| Degradation level recorded if services unavailable | AUTO | Checks ContentAuditStep.degradation_level |
| CQS Scoring (8) | Status | Agent Method |
|---|---|---|
| All 6 dimensions scored (CSI, CoR, Density, SRL, TF-IDF, EEAT) | AUTO | Counts ContentAuditScore records = 6 |
| Each dimension score 0–10 range | AUTO | Validates score bounds on all 6 records |
| CQS formula correct: (CSI×0.25 + CoR×0.20 + Density×0.15 + SRL×0.10 + TF-IDF×0.10 + EEAT×0.20) × 10 | AUTO | Recalculates from dimensions, compares against stored score |
| Weights sum to 1.0 | AUTO | Sums weight fields = 1.00 |
| CQS 0–100 + AI Citability 0–10 in valid range | AUTO | Validates bounds on both scores |
| Dimension weights match spec | AUTO | Checks each: CSI=0.25, CoR=0.20, etc. |
| Score stored on ClusterKeyword.cqs_score | AUTO | Checks ClusterKeyword updated |
| Re-audit shows delta from previous score | SEMI | Checks for prior audit, calculates delta — human validates |
| Report Quality (8) | Status | Agent Method |
|---|---|---|
| BEFORE/AFTER examples per dimension | AUTO | Parses report for before/after fields — flags missing |
| SRL transformations (CE as Agent rewrites) | AUTO | Checks SRL section, validates CE as grammatical agent |
| Headings marked [OK] / [CHANGE] / [NEW] | AUTO | Parses heading audit — all headings have marker |
| BLUF suggestions per H2 section | AUTO | Checks BLUF section ≥1 suggestion per H2 |
| E-E-A-T blocks (4 dimensions with ready-to-paste content) | AUTO | Validates 4 EEAT sub-sections exist |
| Priority system (CRITICAL/HIGH/MEDIUM/BONUS/SKIP) | AUTO | Checks recommendations have priority labels |
| Report in target language | SEMI | Language detection — human confirms quality |
| TF-IDF term map with section assignments | AUTO | Checks term-to-section mapping exists with ≥10 terms |
| Pre-Publish (9) | Status | Agent Method |
|---|---|---|
| No typos or grammar issues | AUTO | Claude reads full text, flags errors with corrections |
| Clear CTAs (Book / Contact / Sign up) | SEMI | Identifies CTA elements — human judges effectiveness |
| Content aligns with brand voice | SEMI | Compares against brand guide — human confirms |
| No placeholders (lorem ipsum, TBD) | AUTO | Regex + Claude scan for lorem ipsum, TBD, TODO |
| Internal & external links working | AUTO | Crawls all links, checks status codes |
| Layout correct (desktop & mobile) | SEMI | Screenshots at viewports — human reviews |
| Images / videos load properly | AUTO | Checks media src URLs for 200, dimensions >0 |
| Legal pages available (Privacy, Terms) | AUTO | Checks footer links to /privacy-policy, /terms → 200 |
| No duplicate content | AUTO | Similarity scoring against other site pages |
| Post-Publish (3) | Status | Agent Method |
|---|---|---|
| Validate content on production | AUTO | Re-runs pre-publish checks on production URL |
| Monitor user behavior (scroll, bounce) | MANUAL | Needs GA4/Hotjar data — human interprets |
| Optimize content for conversion | MANUAL | Strategic decision: performance data + creative judgment |
| Pipeline Step Validation (12) | Status | Agent Method |
|---|---|---|
| Topic research: CSI foundations populated (CE, SC, CSI, semantic frame, query fanout) | AUTO | Checks ContentBrief.csi_foundations JSON — all required fields non-null |
| Competitor analysis: ≥5 competitors scraped with EAV triples | AUTO | Counts competitor URLs in eav_matrix, EAV count ≥30 |
| URR classification applied (UNIQUE/ROOT/RARE distribution) | AUTO | Checks URR labels, validates ROOT = highest count |
| H1 follows formula: CE + UNIQUE attribute + SC context | SEMI | Parses H1 against CSI — human validates creative quality |
| H2 headings map to ROOT attributes | AUTO | Cross-references H2s against ROOT attributes |
| H3 headings map to RARE attributes or FAQ | AUTO | Cross-references H3s against RARE + query fanout |
| BLUF per H2 section (≤50 words) | AUTO | Extracts BLUF, counts words, flags >50 |
| RAG chunks: 200–500 words, no cross-references | AUTO | Counts words, scans for "as mentioned above" |
| All 9 brief sections populated | AUTO | Checks all 9 JSON columns — flags NULL |
| Content gaps prioritized P1–P4 | AUTO | Validates content_gaps has priority labels |
| Copywriter checklist has 15 items | AUTO | Counts items in copywriter_checklist array |
| Degradation level recorded | AUTO | Checks ContentBriefStep.degradation_level |
| Brief Quality (6) | Status | Agent Method |
|---|---|---|
| Brief in correct target language | AUTO | Language detection vs. topical map language_name |
| UNIQUE differentiators ≥2 angles | AUTO | Counts items in unique_differentiators |
| TF-IDF terms + LSI keywords ≥10 | AUTO | Checks keywords_and_terms non-empty |
| Internal links target other topical map keywords | AUTO | Cross-references linking suggestions vs. ClusterKeyword table |
| Quality metrics has target CQS per dimension | AUTO | Checks quality_metrics for 6 CQS dimensions |
| Brief is actionable for content writer | SEMI | Claude reviews for clarity — human confirms |
| Post-Approval Handoff (4) | Status | Agent Method |
|---|---|---|
| Approved brief creates job with source_type='topical_map' | AUTO | Checks ContentJob: source_type, content_brief_id fields |
| Job runs only steps 9–12 (not full 12-step) | SEMI | Checks step log — human confirms no unnecessary steps |
| Brief data injected into draft prompt | AUTO | Checks draft metadata for brief reference |
| Batch planning respects P1→P4 ordering | SEMI | Checks queue order vs. priorities — human reviews |
| Draft Quality (8) | Status | Agent Method |
|---|---|---|
| Topical Map path: only steps 9–12 executed | AUTO | Checks step log — steps 1–8 skipped for topical_map source |
| Draft incorporates all 9 brief sections | AUTO | Checks draft metadata for section references |
| Draft follows H1/H2/H3 from approved brief | AUTO | Parses headings, compares against article_structure |
| BLUF in each H2 (first 50 words = direct answer) | AUTO | Extracts first 50 words per H2, checks answer pattern |
| EAV triples from brief referenced in draft | SEMI | Scans for key entities — human confirms coverage |
| Humanization removes AI patterns | AUTO | Flags "in conclusion", "it's important to note", excess passive |
| Formatting follows brand guide | SEMI | Checks structure — human validates voice |
| QA catches placeholder content | AUTO | Regex for TBD, lorem, [insert] |
| Post-Publish Coverage (8) | Status | Agent Method |
|---|---|---|
| ClusterKeyword.published_url set + returns 200 | AUTO | Checks field + fetches URL |
| ClusterKeyword.content_job_id linked | AUTO | Validates FK relationship |
| Coverage % recalculated | AUTO | Manual calc vs. TopicalMap.coverage_percentage |
| Published keyword shows green on map | AUTO | Checks status = 'published' |
| Internal links from brief in published content | AUTO | Fetches URL, checks for internal links to map pages |
| Published content passes SEO UAT | AUTO | Triggers seo-uat-agent on published URL |
| CQS audit triggered on published content | AUTO | Checks ContentAudit record created |
| Content quality acceptable (human final review) | MANUAL | Does content meet client quality standards |
| Standard CRM — Lead Capture & Flow (11) | Status | Agent Method |
|---|---|---|
| All forms submittable (contact, booking, register) | AUTO | Submits test data to each form, checks 200 + redirect |
| Data mapping correct (name, phone, email) | AUTO | Submits payload → checks GHL contact via API → validates fields |
| No duplicate leads on re-submit | AUTO | Submits same lead twice → checks GHL for duplicates |
| Leads go to correct pipeline/stage | AUTO | Queries GHL pipeline API → confirms stage match |
| Workflows triggered on lead creation | AUTO | Checks GHL workflow execution log for test contact |
| Leads assigned to correct reps | AUTO | Queries contact assignment vs. routing rules |
| Email/SMS delivery successful | AUTO | Checks GHL conversation history for test contact |
| No delivery failures in logs | AUTO | Queries activity log for bounce/failure events |
| Retry mechanism works | SEMI | Simulates failure — retry window human-defined |
| Alerts for unassigned leads | AUTO | Creates unassigned contact → checks alert fires |
| DND / opt-out flags validated | SEMI | Checks DND flags — compliance needs legal review |
| P1 — Speed-to-Lead (<30 sec) (5) | Status | Agent Method |
|---|---|---|
| Primary Bot ("SNMS Lead Intake") in Auto-Pilot mode, 1s wait | AUTO | GHL API: checks bot config — mode=auto-pilot, wait_time=1 |
| Chat Widget (All-in-One) installed on shop website | AUTO | Fetches shop URL, checks for GHL chat widget script in source |
| Bot responds to test message in <30 seconds | AUTO | Sends test SMS → measures response timestamp delta |
| WF-01 creates opportunity + auto-tags source | AUTO | Creates test contact → checks opp created + src_* tag applied |
| SLA timer fires internal notification at 5 min | AUTO | Creates contact with no bot engagement → checks notification log |
| P2 — NEPQ Qualification via Flow Builder (5) | Status | Agent Method |
|---|---|---|
| Flow Builder: 4 NEPQ nodes configured (Situation→Problem→Implication→Need Payoff) | AUTO | GHL API: checks bot flow structure — 4 Capture Information nodes exist |
| Custom fields populated (vehicle, pain_point, services) | AUTO | Runs test conversation → checks contact custom fields populated |
| AI Splitter branches correctly (Qualified → Book / Not → Nurture) | AUTO | Tests both branches: qualified contact gets booking link, unqualified enters WF-03 |
| Appointment Booking action sends link to correct calendar | AUTO | Checks booking link URL matches service-specific calendar |
| Knowledge Base trained (website crawled + 10 FAQ + objection doc) | AUTO | GHL API: checks KB sources count ≥3, FAQ count ≥10 |
| P3 — 90-Day Nurture (4) | Status | Agent Method |
|---|---|---|
| 5 service-specific campaigns loaded (Stop on Reply = ON) | AUTO | GHL API: checks 5 campaigns exist with stop_on_reply=true |
| Re-Engagement Bot activates on nurture reply | AUTO | Simulates reply in nurture → checks Re-Engagement bot responds |
| Tag-based segmentation: hot/warm/cold applied by behavior | AUTO | Simulates reply + no-reply scenarios → checks correct tags applied |
| 450+ O3 templates loaded in SMS + Email library | AUTO | GHL API: counts SMS + email templates, flags if <400 |
| P4–P6 — Upsell, VIP, Reviews (10) | Status | Agent Method |
|---|---|---|
| Upsell WF-11 triggers on Won + service tag + correct wait days | AUTO | Simulates Won opp with svc_ppf → checks WF-11 fires at Day 30 |
| Membership products configured in GHL Payments/Stripe | AUTO | GHL API: checks recurring products exist (Monthly Detail Club, Annual Plan) |
| B2B pipeline separate with 6 stages | AUTO | Checks pipeline config: 6 stages from Prospecting → Closed |
| VIP tier tags auto-applied (Bronze on 1st win) | AUTO | Simulates first Won → checks vip_bronze tag applied |
| 3-year post-purchase WF-08 fires per service type | SEMI | Checks WF-08 structure has service-tag branching — human validates sequence timing |
| Win-back WF-09 triggers after 90-day inactivity | AUTO | Checks workflow date filter: last activity >90 days + has Won opp |
| Reviews AI Auto-Pilot active (4–5 stars, 2h delay) | AUTO | GHL API: checks Reviews AI config — mode=auto-pilot, star_filter=4-5, wait=2h |
| Negative reviews (1–3 stars) NOT auto-replied — task created for Sales Manager | AUTO | Simulates 2-star review event → checks task created, no auto-reply |
| Review request WF-07 fires after "Completed" stage | AUTO | Moves opp to Completed → checks SMS with review link sent after 2h |
| Referral trigger links working + attribution tags applied | AUTO | Clicks test trigger link → checks src_referral + referred_by_* tags |
| Post-Publish — Production Validation (7) | Status | Agent Method |
|---|---|---|
| Re-test all forms in production | AUTO | Same form tests against production URL |
| Full happy path: lead → AI → qualify → book → service → review → referral | SEMI | Agent runs automated flow — human monitors for edge cases |
| Full sad path: lead → no reply → nurture → cold → win-back | SEMI | Agent triggers flow — human validates timing/messaging quality |
| Voice AI Agent answers test calls (3+ scenarios) | SEMI | Agent dials test number — human evaluates conversation quality |
| Monitor first response time + contact rate (48h window) | SEMI | Agent pulls GHL data after 48h — human validates sample size |
| Stuck leads + automation failures checked daily | AUTO | Daily session: contacts stuck >24h in same stage |
| All 14 workflows firing correctly in production | MANUAL | Requires 5-day soft launch monitoring with real leads |
| Tracking + Tagging (8) | Status | Agent Method |
|---|---|---|
| Google Tag / GTM properly installed | AUTO | Checks page source for gtag.js or GTM container |
| Conversion: form submissions tracked | AUTO | Submits test form, checks dataLayer push / GA4 event |
| Conversion: button clicks tracked | AUTO | Simulates clicks, checks dataLayer events |
| Conversion: purchase/booking tracked | SEMI | Validates tracking code — actual purchase needs human |
| Remarketing tags active (Google Ads pixel) | AUTO | Checks for remarketing pixel in page source |
| Google Ads linked with Google Analytics | AUTO | Queries GA4 Admin API for linked accounts |
| UTM tracking correct on all ad URLs | AUTO | Parses ad URLs for utm params, validates consistency |
| GHL pipeline sync configured (GCLID tracking) | AUTO | Conversion Tracker skill: checks GHL webhook → Ads offline import setup |
| Keyword Architect + Copy Lab (8) | Status | Agent Method |
|---|---|---|
| Keyword clusters generated with match types assigned | AUTO | Keyword Architect output: validates clusters have broad/phrase/exact assignments |
| Negative keyword list created (shared + campaign-level) | AUTO | Checks negative list exists with ≥50 terms, no keyword cannibalization |
| Ad group structure follows keyword clustering (no cross-pollination) | AUTO | Validates each ad group's keywords belong to a single semantic cluster |
| RSA headlines: 15 per ad group, unique, within 30 chars | AUTO | Copy Lab output: counts headlines, checks char limits, flags duplicates |
| RSA descriptions: 4 per ad group, within 90 chars | AUTO | Counts descriptions, validates char limits |
| Ad strength rating ≥ "Good" on all RSAs | SEMI | Agent checks Ads API ad_strength field — human reviews if "Average" |
| Sitelinks, callouts, call extensions configured | AUTO | Copy Lab output: checks extension count ≥4 sitelinks + ≥4 callouts |
| A/B test framework defined (pin positions documented) | SEMI | Agent checks for pin strategy doc — human validates creative rationale |
| Campaign Builder + Radius Optimizer (8) | Status | Agent Method |
|---|---|---|
| Campaign structure matches Campaign Builder spec | AUTO | Compares live Ads structure against Campaign Builder output |
| Bid strategy correctly set (Maximize Conversions / Target CPA) | AUTO | Ads API: checks bid_strategy field matches spec |
| Budget allocation matches specified tier | AUTO | Sums campaign budgets, compares against total spend plan |
| Proximity/radius rings configured with bid modifiers | AUTO | Radius Optimizer output: validates tiered rings in Ads location targeting |
| Google Ads Scripts installed for radius automation | SEMI | Checks script exists in Ads account — human validates logic |
| Landing page loads fast (Lighthouse ≥70) | AUTO | Runs Lighthouse on all landing page URLs from campaigns |
| Ad scheduling matches business hours | AUTO | Ads API: checks ad_schedule settings against client hours |
| Remarketing audiences created (RLSA + Customer Match) | AUTO | Remarketing Engine: checks audience lists exist in Ads account |
| Post-Launch — Performance + Crosswalk (8) | Status | Agent Method |
|---|---|---|
| Conversions verified in real-time via GA4 | SEMI | Queries GA4 Real-Time API — human confirms volume |
| Quality Score audit: all keywords ≥5 | AUTO | Performance Auditor: pulls QS per keyword, flags <5 |
| Wasted spend identified (search terms report audit) | AUTO | Performance Auditor: analyzes search terms, flags irrelevant with spend |
| Tracking not dropping (conversion count vs 7-day avg) | AUTO | Daily check: flags >30% conversion drop |
| Paid-Organic Crosswalk analysis generated | AUTO | Crosswalk skill: merges Ads data + organic rankings, identifies overlap |
| Cost savings opportunities identified from organic overlap | SEMI | Agent flags keywords ranking #1-3 organically with paid spend — human decides |
| Impression share >70% on brand terms | AUTO | Ads API: checks brand campaign impression share |
| Value-based conversion assignment per service type | MANUAL | Conversion Tracker: validates values match client revenue — requires business input |
| Pixel + Events (6) | Status | Agent Method |
|---|---|---|
| Meta Pixel base code + PageView event active | AUTO | Fetches page, checks for fbq('track','PageView') |
| Lead event firing on form submit | AUTO | Submits test form → checks Lead event registered |
| Purchase/CompleteRegistration event | SEMI | Validates code exists — actual purchase needs human test |
| Domain verified in Business Manager | AUTO | Checks DNS TXT record or meta tag for FB verification |
| Aggregated Event Measurement configured | AUTO | Queries Business Manager API for AEM config |
| Custom conversions set with correct URL rules | AUTO | Pulls custom conversion list, validates URL rules |
| Campaign Accelerator — Pre-Launch (10) | Status | Agent Method |
|---|---|---|
| Buyer persona generated (VoC mining + competitor intel) | AUTO | Accelerator pipeline: checks persona doc exists with demographics + psychographics |
| 4-Campaign architecture built (C-1 ASC+, C-2 Manual ABO, C-3 Creative Lab, C-4 Retargeting) | AUTO | Checks Meta Ads account for 4 campaigns matching architecture spec |
| 6 Hook Archetypes applied across ad creatives | SEMI | Agent checks creative copy against hook patterns — human validates creative quality |
| Anchor Offer defined (irresistible lead magnet for detailing) | SEMI | Agent checks offer exists in ad copy — human validates offer strength |
| Messenger flow configured (if using Messenger ads) | AUTO | Lead Gen Engine: checks Messenger automation exists in business page |
| Instant Forms built with correct field mapping to GHL | AUTO | Lead Gen Engine: checks form fields + webhook/CRM integration active |
| Post ID strategy documented (graduation criteria) | SEMI | Checks for Post ID graduation doc — human validates criteria |
| Audience targeting matches buyer persona (interests, lookalikes, custom) | AUTO | Compares campaign audience settings against persona demographics |
| Pre-launch checklist passed (Diagnostics skill) | AUTO | Diagnostics: runs 5-layer check (Objective→Targeting→Creative→Bidding→LP) |
| Lead delivery to GHL pipeline working | AUTO | Submits test lead via Instant Form → checks GHL contact + opp created |
| Post-Launch — Performance + Optimization (10) | Status | Agent Method |
|---|---|---|
| Real-time event tracking validated | SEMI | Checks Events Manager test events — human confirms behavior |
| Lead events match CRM data (Meta → GHL count alignment) | AUTO | Cross-references Meta lead count with GHL contact creation timestamps |
| CPL within target range | SEMI | Performance Hub: pulls spend + leads — human judges vs. target CPL |
| Campaign scoring Green/Yellow/Red applied | AUTO | Performance Hub: scores each campaign against benchmarks |
| Budget allocation quadrant analysis (Stars/Hidden Gems/Cash Cows/Money Pits) | AUTO | Optimizer: maps campaigns to quadrant, flags Money Pits for review |
| Audience saturation check (frequency <3) | AUTO | Optimizer: checks frequency metrics, flags saturation |
| Creative fatigue detection (CTR declining >20%) | AUTO | Diagnostics: compares WoW CTR, flags declining creatives |
| Funnel analysis: lead → appointment → show → close | SEMI | Optimizer: pulls funnel data — human validates stage conversion accuracy |
| Phone call tracking matched (if call ads) | SEMI | Lead Gen Engine: checks call tracking integration — human confirms accuracy |
| 90-day roadmap generated from performance data | MANUAL | Strategic planning requiring human judgment on scaling direction |
| Visual & Brand Consistency (10) | Status | Agent Method |
|---|---|---|
| UI matches Figma design (if Design-First) | SEMI | Figma MCP: export baseline → Playwright screenshot → side-by-side comparison |
| Colors consistent with brand palette | AUTO | Extracts CSS color values, compares against brand palette doc |
| Typography correct (font, size, line height) | AUTO | Audits computed styles vs. design system specs |
| Spacing & alignment consistent | SEMI | Checks CSS consistency — visual alignment needs human |
| Buttons/forms/cards consistent across pages | AUTO | Audits all component CSS classes for consistency |
| No visual mismatch between pages | SEMI | Multi-page screenshot comparison |
| Icon style consistent (no mixed libraries) | SEMI | Detects icon libraries in use, flags mixed sets |
| No placeholder assets (dummy images, stock IDs) | AUTO | Checks filenames/alt for "placeholder", "dummy", "sample" |
| Images high quality (not blurry/upscaled) | SEMI | Checks resolution vs. display size — human reviews quality |
| Design token consistency (CSS variables match Figma variables) | AUTO | Figma MCP: get_variable_defs → compare against CSS custom properties |
| Playwright — Visual Regression (6) | Status | Agent Method |
|---|---|---|
| Full-page screenshot regression test passes (≤1% diff) | AUTO | npx playwright test visual — fullPage comparison against baseline |
| Per-section regression: Hero, Services, Testimonials, CTA, Contact, Footer | AUTO | Per-section element screenshots vs. baselines |
| Desktop (1440px) visual test passes | AUTO | Playwright project: Desktop Chrome viewport |
| Mobile (375px) visual test passes | AUTO | Playwright project: iPhone 14 viewport |
| Tablet (768px) visual test passes | AUTO | Playwright project: iPad viewport |
| Figma baseline comparison (Design-First only) | SEMI | Claude multimodal: reads Figma export + implementation screenshot, reports diffs |
| Playwright — Responsive Layout (8) | Status | Agent Method |
|---|---|---|
| Mobile: nav shows hamburger, cards stack vertically | AUTO | Playwright: checks #menu-toggle visible, nav hidden, cards Y-stacked |
| Desktop: nav shows links, cards in grid | AUTO | Playwright: checks nav visible, cards same-Y position |
| No broken layout on smaller screens (no overflow) | AUTO | Detects horizontal scroll, text overflow, clipping |
| Text readable on all devices (≥14px mobile) | AUTO | Checks computed font-size, WCAG AA contrast |
| CTA buttons mobile-friendly (tap target ≥48×48px) | AUTO | Validates tap target sizes per Google guidelines |
| No dead-end pages (all pages have nav + CTA) | AUTO | Crawls pages, flags any without navigation links |
| Real device testing (beyond emulation) | SEMI | Emulated tests auto — real iOS/Android needs human with device |
| Cross-browser: Chrome, Safari, Edge | SEMI | Chromium auto — Safari/Edge need BrowserStack or manual |
| Playwright — Interactions + Accessibility (10) | Status | Agent Method |
|---|---|---|
| Mobile menu opens/closes correctly | AUTO | Playwright: click toggle → menu visible → click link → menu closes |
| Hamburger has aria-expanded attribute toggling | AUTO | Checks aria-expanded false→true→false on toggle |
| CTA buttons have hover styles (visual change) | AUTO | Screenshots default + hover state, validates difference |
| Contact form accepts input and validates | AUTO | Playwright: fills all fields, verifies values, screenshots filled form |
| Heading hierarchy correct (single H1, sequential H2→H3) | AUTO | Counts H1 (must be 1), validates heading sequence |
| All images have alt text or role="presentation" | AUTO | Iterates all <img>, checks alt or role attribute |
| Form inputs have associated labels | AUTO | Checks label[for=id] exists for each input |
| Keyboard navigation reaches all interactive elements | AUTO | Tab through all elements, counts focused interactive elements |
| Color contrast WCAG AA compliant | AUTO | WCAG contrast check on all text/background combinations |
| Font size accessible (≥12px minimum) | AUTO | Min font-size check across breakpoints |
| Build + Deploy Pipeline (8) | Status | Agent Method |
|---|---|---|
| Astro build compiles without errors (npm run build) | AUTO | Runs build command, checks exit code 0 |
| GitHub Actions CI pipeline passes | AUTO | Checks workflow run status via GitHub API |
| Vercel/Netlify deployment successful | AUTO | Checks deployment status API, validates live URL returns 200 |
| SEO: meta tags, sitemap.xml, robots.txt present | AUTO | Chains to seo-uat-agent for standard SEO checks |
| User journey smooth (landing → form → submit → thank you) | MANUAL | Requires user testing or session recording |
| No confusing steps in conversion flow | MANUAL | Agent maps flow but can't judge confusion |
| Collect feedback from client/stakeholders | MANUAL | Human communication — can't automate |
| Performance: Lighthouse ≥70 mobile, ≥80 desktop | AUTO | Runs Lighthouse via Playwright, validates scores |
| Post-Deploy — Production Monitoring (20) | Status | Agent Method |
|---|---|---|
| Live UI matches approved design | SEMI | Side-by-side screenshots — human sign-off |
| UI inconsistencies in production detected | AUTO | Re-runs all Playwright tests on production URL |
| Fix spacing/alignment issues | MANUAL | Requires dev work — agent flags only |
| Users understand flow (session recording) | MANUAL | Needs Hotjar/FullStory analysis |
| No unexpected drop-off | SEMI | Pulls GA4 funnel data — human interprets |
| CTA visibility strong | SEMI | Above-fold CTA check — human reviews impact |
| Touch interactions work on real devices | MANUAL | Touch/swipe needs real device testing |
| Identify UX improvement opportunities | SEMI | Analyzes data patterns — human decides priorities |
| Propose UI/UX enhancements | SEMI | Drafts recommendations — human reviews |
| No users completing key actions with friction | MANUAL | Requires real user testing |
| Pre-Publish — Infrastructure (4) | Status | Agent Method |
|---|---|---|
| Production env configured (domain, SSL) | AUTO | DNS resolution, SSL cert validity + expiry, headers |
| HTTPS working (no mixed content) | AUTO | Crawls pages, flags http:// resource loads |
| Env variables correct (API keys, DB) | SEMI | Tests API connectivity — human verifies secrets |
| No staging/dev configs remain | AUTO | Checks for staging URLs, debug flags, console.log |
| Pre-Publish — APIs & Auth (5) | Status | Agent Method |
|---|---|---|
| All APIs functioning | AUTO | Health-check each endpoint, check 200 + valid response |
| Timeout & error handling | AUTO | Sends slow/invalid requests, checks graceful handling |
| Retry logic works | AUTO | Simulates failure, monitors retry in logs |
| Auth (OAuth/token) works | AUTO | Tests valid + invalid creds, checks token refresh |
| Webhooks sending & receiving | AUTO | Triggers event, checks receipt on endpoint |
| Pre-Publish — User Flows + Security (8) | Status | Agent Method |
|---|---|---|
| Full user flow works (landing→form→submit→thanks) | AUTO | Puppeteer navigates full flow, validates each step |
| Form validation correct | AUTO | Empty/invalid/valid submissions, checks messages |
| Error messages clear | SEMI | Captures messages — human judges end-user clarity |
| reCAPTCHA working | SEMI | Checks script loads + renders — can't solve (by design) |
| No exposed API keys/secrets | AUTO | Scans source + JS bundles for key patterns |
| Input validation (XSS/SQLi) | AUTO | Submits common payloads, checks sanitization |
| Security headers (CORS, CSP) | AUTO | Checks CSP, X-Frame, X-Content-Type, HSTS |
| SSL/TLS secure | AUTO | TLS version, cipher suite, cert chain |
| Pre-Publish — Performance + Testing (7) | Status | Agent Method |
|---|---|---|
| Page load <3 seconds | AUTO | Lighthouse TTFB + FCP + LCP measurement |
| No JS/CSS errors | AUTO | Captures console errors via Puppeteer |
| API response time acceptable | AUTO | Times each call, flags >500ms |
| Images optimized | AUTO | Checks sizes >200KB, WebP/AVIF usage |
| Mobile / tablet / desktop testing | AUTO | Emulates 6+ viewports, captures errors |
| Cross-browser (Chrome, Safari, Edge) | SEMI | Tests Chromium — Safari/Edge need manual or BrowserStack |
| Error logging + alerts configured | AUTO | Triggers errors, checks logs appear in monitoring |
| Post-Publish (11) | Status | Agent Method |
|---|---|---|
| Server uptime | AUTO | Pings every 5 min, alerts Slack on downtime |
| Error logs with real traffic | AUTO | Groups by type/frequency, flags new errors |
| API failure rates | AUTO | Queries monitoring for failure %, alerts >1% |
| User journeys not broken | SEMI | Re-runs flows — edge cases need real monitoring |
| No unexpected crashes | AUTO | Monitors 5xx + process restarts |
| CRM receives data | AUTO | Test form → check GHL contact creation |
| Payment works in production | MANUAL | Real payment test needs human with test card |
| Webhooks not failing | AUTO | Monitors delivery logs for failures |
| Traffic spike handling | AUTO | Basic load test, checks response degradation |
| Memory / CPU stable | AUTO | Queries hosting metrics API |
| Error rate within threshold | AUTO | 24h error rate vs. threshold (<0.5%) |
When a client site reaches final staging, one ClickUp status change to "Full UAT" triggers all 12 agents via n8n. Each runs its checklist independently. The site only goes live when all departments pass.
| Agent | Checks | Auto | Semi | Manual | Time | Cost |
|---|---|---|---|---|---|---|
| seo-uat-agent | 17 | 14 | 3 | 0 | ~5–10 min | $0.50 |
| seo-ops-agent | 32 | 20 | 11 | 1 | ~15–25 min | $0.60 |
| crm-uat-agent | 42 | 34 | 7 | 1 | ~15–25 min | $1.40 |
| content-uat-agent | 12 | 7 | 3 | 2 | ~3–5 min | $0.30 |
| gads-uat-agent | 32 | 25 | 6 | 1 | ~12–18 min | $1.10 |
| meta-uat-agent | 26 | 18 | 7 | 1 | ~10–15 min | $0.95 |
| web-uat-agent | 62 | 30 | 22 | 10 | ~15–25 min | $1.30 |
| it-uat-agent | 35 | 28 | 6 | 1 | ~15–30 min | $1.20 |
| topical-map-uat-agent SEO | 18 | 15 | 3 | 0 | ~5–8 min | $0.45 |
| planner-uat-agent CTN | 22 | 18 | 4 | 0 | ~8–12 min | $0.70 |
| auditor-uat-agent SEO | 20 | 17 | 3 | 0 | ~6–10 min | $0.65 |
| content-job-uat-agent CTN | 16 | 13 | 2 | 1 | ~5–8 min | $0.50 |
| TOTAL (12 Agents) | 334 | 239 | 77 | 18 | ~30–40 min | ~$9.65 |
Full-site UAT becomes a $9.65, 40-minute automated process instead of a 20-hour manual effort. 239 checks never get skipped. 77 semi-automated checks flag for human judgment. 18 manual items (UX testing, payment, user feedback) stay human because they should be.