Related: For the high-level SEO Sentinel agent spec, see Manage agent SEO · This doc = detailed 19-section PRD · That doc = high-level agent spec Also: Agent Orchestration Harness PRD (defines HOW agents get triggered). Also: Local SEO Automation Playbook (n8n workflow path for same modules).
SN
SEO NavigatorPRD · SEO Sentinel v1
Product Requirements Document · v1.0 · April 19, 2026

PRD — SEO Sentinel v1: Local SEO Automation agent

First Managed Agent deployment for SEO Navigator. Scope is deliberately narrow: the 5 Local SEO Automation modules from Workflow 1 (GBP Analyzer, On-Page Intelligence, Geographic Intelligence, Citation Intelligence, AI Visibility Tracker). This PRD is the execution spec for Trung (IT), with clearly labelled inputs needed from Jake and the SEO team.

Owner: Jake (accountable) · Trung (responsible) Target ship date: End of Sprint 1 W3 (Day 22) Est. IT effort: ~14 hours over 3 weeks Depends on: Anthropic Managed Agents ${' '}Public Beta
🔗 Paired document: The orchestration layer (triggers, session lifecycle, delivery routing, VPS deployment) is a separate reusable PRD — prd_orchestration_harness.html. This Sentinel PRD focuses on what the agent does. The orchestration PRD focuses on how it gets invoked. Trung reads both; SEO team mostly needs this one.
⚠️ What I'm confident about vs. what needs verification

Confident (grounded in platform.claude.com/docs/en/managed-agents): Agent / Environment / Session lifecycle, built-in toolset, MCP config, event streaming, pricing model ($0.08/session-hour + standard tokens), beta header requirements.

Needs verification with Anthropic before Trung finalizes config:

These are flagged inline throughout the PRD and collected in Section 14: Open Questions. It's okay to ship without perfect answers — just don't guess.

Contents

  1. Executive summary
  2. Scope & out of scope
  3. Architecture at a glance
  4. What Jake owns (strategic inputs)
  5. What SEO team owns (domain inputs)
  6. What Trung owns (technical build)
  7. Agent definition
  8. Environment definition
  9. Secrets & API credentials
  10. MCP servers
  11. Trigger mechanism & orchestration
  12. Test plan (T1 & T2)
  13. Deployment sequence (day-by-day)
  14. Open questions & verification items
  15. Risks & mitigations
  16. Cost model
  17. Definition of Done
  18. RACI matrix
  19. Appendix: full curl reference

1 Executive summary The one-paragraph version

Deploy one Claude Managed Agent — SEO Sentinel v1 — that runs the 5 Local SEO Automation modules (GBP Analyzer, On-Page Intelligence, Geographic Intelligence, Citation Intelligence, AI Visibility Tracker) against a single client on demand. The agent is triggered by a ClickUp task (status change to "Ready") or a Slack slash command. It runs in an isolated Anthropic-managed cloud container, reads client context from a handoff payload JSON, calls external APIs (Apify, Firecrawl, Google Maps, OpenAI, Gemini, Perplexity) via bash, queries SEO Utils via MCP, and writes a structured audit report to /mnt/session/outputs/. Results post to Slack and update the ClickUp task. Target: ~$1.50–2.00 per run, ~20–30 min wall-clock, replacing ~3 hours of manual analyst work for the initial audit.

5
Modules in scope
~$1.75
Est. cost per run
~25m
Est. wall-clock per run
~3h
Manual analyst time replaced

2 Scope & out of scope Ruthless containment

In scope for v1

ModuleWhat the agent doesData source
1. GBP AnalyzerRuns 44-point Google Business Profile audit for client location, benchmarks against 3 competitors, scores completeness + optimization.Apify GBP scraper actor + Claude reasoning
2. On-Page IntelligenceCrawls client homepage + top 5 service pages, categorizes URLs, analyzes on-page SEO (titles, H1s, schema, internal links, content depth).Firecrawl API
3. Geographic IntelligenceGenerates ranking grid (7×7 or 13×13) around client address, scrapes rankings for 5 target keywords, runs DBSCAN clustering, outputs heatmap data (JSON for Leaflet).Google Maps API + Apify SERP actor + scikit-learn
4. Citation IntelligenceScrapes 40+ major directory citations, validates NAP (Name/Address/Phone) consistency, flags mismatches.Apify citation scraper actor
5. AI Visibility TrackerQueries ChatGPT, Gemini, Perplexity (+ optionally Claude) for 5 client-relevant queries; scores whether client name appears in the response and in what position.OpenAI API, Gemini API, Perplexity API

Explicitly out of scope for v1

💡 Why this scope

Workflow 1 is the highest-leverage first agent for three reasons: (1) every new client needs it (high frequency), (2) it's read-only (low blast radius if the agent makes mistakes), (3) it exercises all the infrastructure Trung needs to build anyway (bash, MCP, external APIs, file outputs). If we can ship this, we can ship anything else in the roadmap.

3 Architecture at a glance Four boxes, three arrows

# High-level flow — SEO Sentinel v1 run [1] ClickUp status → "Ready" ──webhook──▶ [2] Orchestration script (VPS) │ ▼ POST /v1/sessions │ ▼ [3] SEO Sentinel session (Anthropic container) │ ├─ reads handoff payload (client context JSON) ├─ runs 5 modules in sequence via bash: │ ├─ apify_run.sh gbp-scraper (module 1) │ ├─ firecrawl_crawl.sh (module 2) │ ├─ geo_grid.py (module 3) │ ├─ apify_run.sh citation-scraper (module 4) │ └─ ai_visibility.py (module 5) ├─ MCP calls to SEO Utils (for rank baseline reference) ├─ Claude synthesizes final report └─ writes /mnt/session/outputs/sentinel-audit-{client_id}.json + .md │ ▼ [4] Files API fetch → Slack post + ClickUp task update + Drive upload
🧠 Brain vs Hands reminder

The agent's brain is Claude Sonnet 4.6 doing reasoning — deciding which module to run, interpreting Apify output, writing the audit narrative. The agent's hands are bash scripts calling external APIs inside the container. Anthropic runs the loop; Trung wires the hands.

4 What Jake owns Strategic decisions only Jake can make

Jake Must complete before Day 8

Total Jake effort: ~2.5h across a week.

5 What SEO team owns Domain knowledge only the SEO team can provide

SEO Lead + Senior TL Must complete before Day 14

Total SEO team effort: ~17h across 2 weeks.

6 What Trung owns Technical build — the core PRD

Trung (IT Lead) Day 8 through Day 22

The rest of the PRD (Sections 7–19) is effectively Trung's spec. Headlines:

Total Trung effort: ~14h baseline, budget 20h for iteration.

7 Agent definition The configuration itself

An Agent in Managed Agents is a reusable, versioned bundle of: model + system prompt + tools + MCP + skills. Referenced by ID from every session.

# Agent config for SEO Sentinel v1 { "name": "SEO Sentinel", "model": "claude-sonnet-4-6", "description": "Local SEO Automation agent. v1 scope: 5 modules (GBP, On-Page, Geographic, Citations, AI Visibility).", "system": "<SEE SECTION 5 — SEO Lead writes this. Rough skeleton below.>", "tools": [ { "type": "agent_toolset_20260401" } // All built-in tools enabled: bash, read, write, edit, glob, grep, web_fetch, web_search // If we need to lock down later, use configs[] to disable individuals ], "mcp_servers": [ { "type": "url", "url": "https://mcp.clickup.com/mcp", "name": "clickup" }, { "type": "url", "url": "https://mcp.slack.com/mcp", "name": "slack" }, { "type": "url", "url": "https://<TUNNEL-URL>.cfargotunnel.com", "name": "seo-utils" } ], "skills": [ // Progressive-disclosure skills — uploaded via Files API, referenced by file_id // Plan to mount: // - koray-city-page-auditor (file_id_1) // - ai-visibility-audit (file_id_2) // - consensus-content-audit (file_id_3) // - seo-utils-mcp-guide (file_id_4) // - seo-navigator-agency-os (file_id_5) ], "metadata": { "owner": "seo-nav", "workflow": "workflow-1-local-seo", "version_notes": "v1 initial" } } // Response includes: id (agent_...), version (starts at 1, increments on update)

System prompt — v1 DRAFT (for SEO Lead + Senior TL to iterate)

📝 How to use this draft

Below is a working first draft Claude wrote to give SEO Lead and Senior TL something concrete to edit, not a blank page. Target final length ~1,500–2,500 words after your pass (mine is ~1,100 — intentionally spare, your methodology detail will fill it out).

What I got right: structure, output contract, guardrails, tone guidance, module playbook skeleton.
What you need to fill in: the actual Koray methodology depth, the specific 44-point GBP rubric logic, your detailing-vertical specifics, examples of "good" recommendation phrasing, and the self-check questions that catch common failure modes.

Don't overdo it. This goes in the agent's system field — it's re-processed every session turn. Keep it tight. Skills (mounted as files) carry the deep methodology reference; the system prompt should be the executable playbook, not the textbook.

# ═══════════════════════════════════════════════════════════════ # SEO SENTINEL · SYSTEM PROMPT v1 DRAFT # Author: Claude draft, 2026-04-19 # Review: SEO Lead + Senior TL before Day 12 # ═══════════════════════════════════════════════════════════════ # IDENTITY You are SEO Sentinel, SEO Navigator's Local SEO Automation agent. You produce comprehensive, accurate, actionable local SEO audits for local service businesses — primarily automotive detailing shops, also roofing, HVAC, dentistry, and similar owner-operated local businesses. You work for SEO Navigator, a boutique local SEO agency. Your outputs feed into human deliverables the agency sends to paying clients. Quality matters more than speed. A wrong recommendation loses the client's trust in the agency; a slow audit loses nothing. # YOUR JOB ON EVERY RUN On each run, you will receive a client handoff payload as JSON. You will: 1. Validate the payload has every required field. Halt if missing. 2. Execute five audit modules in sequence (detailed below). 3. Synthesize findings into a structured report. 4. Write two output files to /mnt/session/outputs/: - sentinel-audit-{client_id}.json (machine-readable, for downstream tools) - sentinel-audit-{client_id}.md (human-readable, for the SEO team) 5. Self-check the report before ending your turn. Revise if deficient. You end your turn only after both files exist and pass self-check. # METHODOLOGY YOU OPERATE FROM Local SEO has three ranking signals in Google's local pack: proximity, relevance, and prominence. Your audits surface what the business can actually influence — relevance (on-page, categories, services) and prominence (citations, reviews, backlinks, authority). Proximity is fixed by the client's physical address; you note it only as context. For on-page work, you apply Koray Tuğberk's semantic content network framework — pages are evaluated by source context fit, topical coverage, entity relationships, and query fan-out. See the koray-city-page-auditor skill mounted at /mnt/skills/ for detail. Don't re-derive it; reference it. For GBP work, you apply SEO Navigator's 44-point rubric at /mnt/skills/gbp-44-point-rubric.json. Every point is binary or 0-10 scored per the rubric spec. Don't invent new criteria. For AI visibility, the question is not "does the client rank in Google" but "does the client appear in generative answers to buyer-intent prompts." See the ai-visibility-audit skill. # THE FIVE MODULES — YOUR PLAYBOOK ## Module 1: GBP Analyzer What you produce: 44-point GBP score + competitor benchmark + top-5 gaps. How you get the data: - Run bash helpers/apify_run.sh gbp-scraper <client_gbp_url> - For each of the 3 competitors in the payload, run the same script. - Parse the JSON output each returns. How you score: - Load /mnt/skills/gbp-44-point-rubric.json. - For each of the 44 points, compute the score from the Apify output. - Sum weighted points → composite score 0-100. - For each competitor, compute the same composite. - Identify the top 5 gaps (rubric points where client underperforms the competitor average by the largest margin). Good output characteristics: - Specific: "Client lists 4 services; competitors average 11" — not "Services could be expanded." - Actionable: each gap names the exact GBP field to update. - Honest: if a rubric point couldn't be scored (data missing), mark null, don't guess. ## Module 2: On-Page Intelligence What you produce: Audit of the client's homepage + top 5 service pages. How you get the data: - Run bash helpers/firecrawl_crawl.sh <client_website_url> - Identify the 5 priority pages (homepage + top 4 service pages by internal link count). What you check per page: - Title tag (present, under 60 chars, contains primary keyword, unique) - Meta description (present, under 160 chars, compelling CTA) - H1 (single, matches page intent) - Heading hierarchy (H2s nested under H1, no skipped levels) - Schema.org markup (LocalBusiness, Service, FAQPage where relevant) - Internal links (count, anchor text variety) - Word count (thin < 300, thorough 800+, varies by page type) - Image alt text coverage % - Primary keyword in first 100 words Good output characteristics: - One table per page with all checks + pass/fail/flag for each. - Site-wide patterns called out ("4 of 5 pages missing LocalBusiness schema" is more useful than noting it 4 times). ## Module 3: Geographic Intelligence What you produce: Ranking heatmap data + visibility percentage per keyword + cluster analysis. How you get the data: - Run python helpers/geo_grid.py --address "{business_address}" --radius 5 --keywords <5 keywords from payload> - Script generates a 13×13 grid (169 points) around the address at 5-mile radius, scrapes rank for each keyword at each point via Apify SERP actor, runs DBSCAN to cluster hot/cold zones. - Output: GeoJSON file + summary JSON per keyword. What you report: - Per keyword: average rank, % of grid in top 3, % in top 10, % unranked. - Hot zones (clusters where rank ≤ 3) and cold zones (clusters where rank > 10 or unranked) — plain-English description of each. - Comparison to any existing baseline from SEO Utils MCP (see below). Good output characteristics: - Lead with the number, not the methodology. "12% top-3 coverage for 'ceramic coating near me'" is the insight. - Always cross-reference baseline. If no baseline exists, note that this run establishes it. ## Module 4: Citation Intelligence What you produce: NAP consistency scan across 40+ directories. How you get the data: - Ground truth NAP = client_name + business_address + contact_phone_primary from payload. - Run bash helpers/apify_run.sh citation-scraper "{client_name}" - Compare each directory listing's NAP to ground truth. What you report: - Directory matrix: directory name | listed | NAP match | mismatches. - Priority fix list: start with top-tier directories (Apple Maps, Yelp, Facebook, Bing Places, Yellow Pages) and any where phone/address mismatch (hurts rankings more than a missing listing). ## Module 5: AI Visibility Tracker What you produce: Client visibility score across ChatGPT, Gemini, Perplexity. How you get the data: - Derive 5 buyer-intent queries from client's priority services + city (example: "best ceramic coating in Columbus OH", "paint protection film installer near me"). Document the queries in your output. - Run python helpers/ai_visibility.py --queries <queries.json> - For each LLM × query combination, the script returns the response text. - Parse each response. Record: does the client appear? Competitors mentioned? In what position? What you report: - Per-LLM matrix: query | client mentioned (Y/N) | position | competitors mentioned. - Composite visibility score: (% of queries where client appears) × (inverse of average position when mentioned). - Gap analysis: if competitors appear but client doesn't, note why (thin content, missing entity markup, missing review volume). # INTERACTING WITH TOOLS Bash tool: Your primary execution surface. Run helper scripts under /mnt/skills/helpers/. Capture stdout + stderr. Parse JSON output with jq. Write intermediate data to /mnt/session/outputs/raw/. Read / write / edit tools: File operations. Use these for constructing the final JSON + Markdown outputs. Web search + web fetch: Use sparingly and only for (a) verifying a competitor's current website content, (b) checking a local news source cited in an audit, (c) disambiguating a business name. Never use for core audit data — the helper scripts are the source of truth. MCP — seo-utils: Query existing rank tracking baselines. Per the seo-utils-mcp-guide skill: use query_database on organic_rank_tracker_* tables — NOT the DataForSEO action tools unless the payload explicitly asks for fresh keyword research. MCP — clickup: Post run progress + results to the triggering task comment. Update task status when called out by the orchestration layer. MCP — slack: Post module completion updates to #seo-automation. # OUTPUT CONTRACT You write two files to /mnt/session/outputs/ before ending your turn. File 1: sentinel-audit-{client_id}.json { "client_id": "...", "client_name": "...", "audit_date": "ISO-8601 timestamp", "agent_version": "1.0.0", "overall_score": 0-100, "modules": { "gbp": { score, rubric_scores, competitor_benchmarks, top_5_gaps }, "onpage": { pages_audited, per_page_findings, sitewide_patterns }, "geographic": { keywords, per_keyword_metrics, heatmap_geojson_path }, "citations": { directory_matrix, priority_fixes }, "ai_visibility": { per_llm_matrix, visibility_score, gap_analysis } }, "priority_recommendations": [ { "priority": 1, "area": "...", "action": "...", "rationale": "...", "effort": "..." }, ... ], "errors_encountered": [ ... ] // empty if all modules succeeded } File 2: sentinel-audit-{client_id}.md Structure: 1. Executive Summary — 3-5 bullets that fit on one screen. Lead with the number. "Overall score: 71/100. Biggest issue: 8 missing GBP services." Not: "This audit examines multiple dimensions of the client's local SEO..." 2. Score Dashboard — overall + per-module scores in a table. 3. Module 1: GBP — findings + top 5 gaps + competitor benchmark table. 4. Module 2: On-Page — per-page findings + sitewide patterns. 5. Module 3: Geographic Intelligence — keyword visibility + hot/cold zones. 6. Module 4: Citations — directory matrix + priority fixes. 7. Module 5: AI Visibility — per-LLM matrix + gap analysis. 8. Priority Recommendations — P1 (do this week), P2 (this month), P3 (someday). Each with: specific action, why it matters, rough effort estimate. 9. Appendix — queries used, raw data file paths, methodology notes. # GUARDRAILS — WHAT YOU MUST NOT DO - Never post content to the client's GBP, website, social channels, or any external system on their behalf. You are read-only. - Never send emails on the client's or agency's behalf. - Never modify client assets (GBP listings, website files, directory listings, etc.). - Never use web_search or LLM API calls as a substitute for the helper scripts. The scripts are deterministic, reproducible, and accountable. - Never fabricate data. If a scraper returns empty, flag that module with "status": "partial" and continue. - Never pad the report. If a section has nothing worth saying, say so and move on. # GUARDRAILS — WHAT YOU MUST DO - Read the handoff payload first. Confirm every required field is present. If any are missing, halt and post a Slack message listing exactly what's missing, then stop. - Write raw module outputs to /mnt/session/outputs/raw/ as you go. This lets a human re-inspect individual module data without re-running. - After writing both output files, re-read them and self-check: · Are all 5 modules populated? · Are all priority_recommendations specific (contain concrete next steps, not "improve SEO")? · Is the overall_score internally consistent with module scores? · Does the Markdown executive summary match the JSON data? If self-check fails, revise once. If it still fails, mark the output with "self_check_failed": true and end your turn — the human will review. # WHEN TO PROCEED AUTONOMOUSLY VS. WHEN TO HALT Proceed autonomously: - Payload complete, helper scripts returning expected data, MCP calls succeeding. - Individual module fails — flag that module partial, continue with remaining four. Halt and ask the human (post to Slack, stop): - Payload missing required fields. - 3+ modules failing (systemic issue — network, credentials, infrastructure). - Competitor selection ambiguous: no 3 clear competitors identifiable within the client's service radius. - Self-check fails after one revision attempt. # YOUR TONE IN OUTPUTS - Direct. Skip hedging language. State what's true. - Specific. Numbers and named gaps, not vague descriptions. - Prioritized. Every recommendation gets P1/P2/P3, not a flat list. - Honest. If something couldn't be audited, say so explicitly. A short report with real findings beats a padded report with guesses. Think of your reader as a senior SEO strategist reviewing your work in 20 minutes. They need to know what's wrong, what to fix first, and why, with enough evidence that they can defend the recommendations to the client. They do not need to know how you got there — that's the raw data in the appendix. # END OF SYSTEM PROMPT v1 DRAFT # Next: SEO Lead expands methodology sections, Senior TL validates # output contract against existing deliverable templates.
✏️ SEO Lead iteration guide

When you edit this draft, focus your pass on:

  1. Methodology depth (section "Methodology you operate from"). I kept it spare; your version should pull in the specific Koray concepts that matter most to detailing + service-area businesses.
  2. Module scoring logic. Each module section has a "Good output characteristics" subsection — expand with 2–3 concrete examples from past audits of what "direct, specific, prioritized" looks like.
  3. Self-check questions. I wrote 4. You probably have 8 more from the times audits have failed in review. Add them — they're the cheapest way to catch failure modes before the report reaches a human.
  4. The competitor selection rule. I left it as "3 competitors from the payload." But who should those 3 be — top GBP-ranked? Closest geographic? Most similar service menu? State the rule explicitly.

Target word count after your pass: 1,500–2,500. If you go over 3,000, we have a skill-mounting problem — consider moving deep methodology into a skill file referenced from the prompt.

⚠️ Verify token count before finalizing

Once SEO Lead finishes the system prompt and we mount the 5 skills, measure total system context tokens. If >80K tokens, this will affect cost per run significantly (system prompt is re-read on every turn, though cached after first 5min). Action: Trung runs a dry session after the agent is created, captures usage.input_tokens from the first turn, reports back. If over budget, we split skills to Catalyst or trim the system prompt.

8 Environment definition The container template

An Environment is the container template — packages, networking, mounts. Reusable across sessions. Multiple sessions can share one environment but each session gets its own isolated container instance (filesystem state is NOT shared across sessions).

# Environment config for SEO Sentinel v1 { "name": "seonav-prod", "config": { "type": "cloud", "packages": { "pip": [ "requests", // HTTP for external APIs "pandas", // tabular data handling "numpy", // grid math "scikit-learn", // DBSCAN for Geographic Intelligence "beautifulsoup4", // on-page HTML parsing fallback "lxml", // fast XML parsing for sitemaps "geopy" // distance + geo utils for grid generation ], "apt": [ "jq", // JSON manipulation in bash "curl" // should be default, but explicit is safer ] }, "networking": { "type": "unrestricted" // v1 uses unrestricted for simplicity. Lock to "limited" in v2: // "type": "limited", // "allowed_hosts": [ // "api.apify.com", "api.firecrawl.dev", // "api.openai.com", "generativelanguage.googleapis.com", // "api.perplexity.ai", "maps.googleapis.com" // ], // "allow_mcp_servers": true, // "allow_package_managers": true } } } // Response includes: id (env_...) // Packages are pre-installed before the agent starts. Cached across sessions.

Files to mount (via Files API upload, then reference in agent's skills[] or manually via bash)

FilePurposeOwner
koray-city-page-auditor.mdKoray methodology referenceSenior TL pulls from existing skill
ai-visibility-audit.mdAI visibility scoring methodologySenior TL
seo-navigator-agency-os.mdAgency methodologySenior TL
gbp-44-point-rubric.jsonScoring matrix for GBP moduleSEO Lead (Section 5)
output-report-template.mdFinal audit report structureSEO Lead (Section 5)
handoff-schema.jsonClient context payload shapePM (already defined in 90-day plan W1)
helpers/apify_run.shGeneric Apify actor runnerTrung writes
helpers/firecrawl_crawl.shFirecrawl wrapperTrung writes
helpers/geo_grid.pyGrid generator + DBSCAN clusteringTrung writes
helpers/ai_visibility.pyMulti-LLM visibility query runnerTrung writes
💡 Why helpers as scripts, not MCP servers

Apify, Firecrawl, OpenAI, Gemini, Perplexity all have REST APIs. We could wrap them as custom MCP servers, but that's 5 more services Trung has to maintain. For v1, curl them from bash scripts inside the container. Faster to ship, easier to debug. Revisit if the pattern repeats across agents and the maintenance burden of ad-hoc scripts becomes larger than hosting an MCP.

9 Secrets & API credentials The verification-needed item

🛑 Verification required before Day 8

The Environment schema I've seen in the public-beta docs (packages, networking, type) does NOT show an explicit field for secrets or environment variables. I don't want to invent a mechanism that doesn't exist.

Trung action: Before Day 8, verify the secrets mechanism with Anthropic via one of:

  1. Anthropic sales/support contact — confirm how API keys are passed to the container.
  2. Test API call: attempt to include env_vars in the environment config payload and inspect error response. Schema validators typically give useful hints.
  3. Search platform.claude.com/docs/en/managed-agents/environments full page (I may not have seen it all).

Plan A — if Managed Agents supports environment-level secrets

Store these on the environment (or wherever the API dictates):

Bash scripts read from $APIFY_API_TOKEN etc. Standard 12-factor app pattern.

Plan B — fallback if no native secrets mechanism

Pass the keys as part of the initial user.message content from the orchestration script, bundled inside the handoff payload. The agent's first bash action is to export them into environment variables for the duration of the session. Downsides:

Plan B is acceptable for v1 testing. Not acceptable for v2 production at scale. If Anthropic confirms no native secrets, we file a feature request with their sales team and commit to migrating once available.

Plan C — most conservative

Build a thin secrets-proxy MCP server that the agent calls to fetch credentials per request, hosted on the same VPS as the orchestration script. MCP auth tokens themselves become the "secret" passed into the agent. Over-engineered for v1; mentioning only for completeness.

🔑 Recommended for v1

Start with Plan A attempt → Plan B fallback if Plan A doesn't exist. Document the decision in ClickUp and revisit at end of Sprint 1. Never commit API keys to git.

10 MCP servers What the agent can talk to natively

MCPURLPurpose for Sentinel v1Status
ClickUphttps://mcp.clickup.com/mcpRead client task details + update task status. Post results to task comments.Live
Slackhttps://mcp.slack.com/mcpPost audit results to #seo-automation channel.Live
SEO Utilshttps://<tunnel>.cfargotunnel.comRead existing rank tracking data as baseline reference. Per seo-utils-mcp-guide skill, use query_database on organic_rank_tracker_* tables.Depends on Mac Mini

SEO Utils MCP — the fragile one

⚠️ Single point of failure

SEO Utils runs locally on one Mac Mini, tunneled via cloudflared tunnel --url http://localhost:19515. If the Mac Mini reboots, loses network, or the tunnel process dies, Sentinel can still run modules 1–5 but loses the rank baseline reference. Action items for Trung:

  1. Wrap the tunnel in a systemd/launchd service with auto-restart.
  2. Add a health-check: simple curl to the tunnel URL every 5min from a monitoring script, Slack alert on 3 consecutive fails.
  3. In the agent system prompt, instruct: "If SEO Utils MCP is unavailable, continue with modules 1–5, flag in output that baseline rank data is missing, do not fail the overall run."

Explicitly NOT used by Sentinel v1

11 Trigger mechanism & orchestration How the agent actually gets invoked

🔗 Orchestration is its own PRD — see the paired document

Because the orchestration layer is reusable across every future agent (Content Catalyst, Revenue Relay, Ad Arbitrage, Build Bot, PM Pulse), it has its own dedicated PRD: prd_orchestration_harness.html.

That PRD covers: the config-file pattern that lets new agents plug in without code changes, the session lifecycle state machine, SSE event handling with reconnect logic, HITL gate handling (stubbed for v1, real in Sprint 2 for PM Pulse), idempotency, logging, VPS deployment, and a 7-test integration suite.

This section (11) remains here as a Sentinel-specific summary so SEO team and Jake have enough context to understand what Sentinel depends on, without needing to read the full orchestration spec. The deep spec is Trung's domain.

v1 triggers (Trung to implement)

  1. Manual CLI trigger (Day 14–18, for testing): Trung runs ./sentinel-run.sh <client_id> on the VPS. Script reads client payload from local JSON, creates session, streams SSE, prints output.
  2. ClickUp webhook trigger (Day 22 onward): ClickUp task in the Local SEO Onboarding list changes status to "Ready (Automate)". Webhook hits orchestration script's /webhook/sentinel endpoint. Script resolves client from task's custom fields, creates session.
  3. Slack slash command (optional stretch): /sentinel audit <client-name> in #seo-team. Useful for ad-hoc reruns.

Orchestration script responsibilities

// orchestrator/sentinel.js — Node.js on VPS, v1 target const Anthropic = require('@anthropic-ai/sdk'); const client = new Anthropic(); // reads ANTHROPIC_API_KEY from env async function runSentinel(clientPayload) { // 1. Create session const session = await client.beta.sessions.create({ agent: process.env.SENTINEL_AGENT_ID, environment_id: process.env.SEONAV_ENV_ID, title: `Sentinel audit · ${clientPayload.client_name}` }, { headers: { 'anthropic-beta': 'managed-agents-2026-04-01' } }); // 2. Send initial event BEFORE opening stream (critical!) await client.beta.sessions.events.create(session.id, { events: [{ type: 'user.message', content: [{ type: 'text', text: `Run a full Local SEO audit for this client. Execute all 5 modules. Write structured output to /mnt/session/outputs/sentinel-audit-${clientPayload.client_id}.json and .md. CLIENT PAYLOAD: ${JSON.stringify(clientPayload, null, 2)}` }] }] }); // 3. Open SSE stream, process events const stream = await client.beta.sessions.stream(session.id); for await (const event of stream) { switch (event.type) { case 'agent.message': log('Agent:', extractText(event)); break; case 'agent.tool_use': log('Tool:', event.name); break; case 'session.status_idle': if (event.stop_reason?.type === 'end_turn') { await deliverOutputs(session.id, clientPayload); return; } if (event.stop_reason?.type === 'requires_action') { // v1 has no custom tools or confirmation gates — shouldn't happen. // If it does, log and alert. log('Unexpected requires_action', event); } break; } } } async function deliverOutputs(sessionId, payload) { // Fetch files written by the agent const files = await client.beta.files.list({ scope_id: sessionId }, { headers: { 'anthropic-beta': 'files-api-2025-04-14,managed-agents-2026-04-01' } }); // Download, upload to Drive, post to Slack, update ClickUp // ... (implementation per normal Drive/Slack/ClickUp APIs) }

Deliverable routing

12 Test plan — T1 & T2 Two gates before declaring done

TestWhenInputPass criteriaOwner
T1 — synthetic client Day 18 Fake client: "SN Test Detailing", Ho Chi Minh City, generic detailing service menu, 3 synthetic competitors. Safe to break. All 5 modules execute without fatal error. Output file exists at correct path, valid JSON, covers all sections per output template. Session cost <$3. Run time <45 min. Trung (execute) → SEO Lead (validate)
T2 — real low-risk client Day 22 Shamrock Detailing Columbus OH or Procam Detailing Bullhead City AZ (per Jake's pick in Section 4). Real GBP, real competitors, real rank data. All T1 criteria +: SEO Lead side-by-side review vs. prior manual audit scores ≥ 80% coverage + 0 factual errors + useful recommendations. Output passes "would I send this to the client after light editing?" test. Trung (execute) → SEO Lead (validate)
🧪 Running T1 three times, not once

Per the 90-day plan exit criteria: T1 must pass 3 consecutive clean runs before we move on. A single clean run could be luck. Three rules out most randomness. If run 3 fails but runs 1–2 passed, we don't promote — we investigate why it failed.

13 Deployment sequence — day by day What happens when

DayOwnerMilestoneExit check
1–3JakeSection 4 items 1–5 completeBudget approved, test clients selected, DoD signed
1–7SEO Lead + TLSection 5 items 1–5 completeSystem prompt draft + 44-point rubric + keyword logic + output template all exist
8TrungPrereqs verifiedTest API call to /v1/agents returns 200. Tunnel uptime confirmed.
8TrungSecrets plan decided (Section 9)Plan A confirmed working OR Plan B locked in. Documented in ClickUp.
9TrungEnvironment env_seonav_prod createdenv_… ID captured. Dry session spawns successfully + packages installed.
10–11TrungFiles uploaded to environment mountAll skill files, rubrics, helper scripts exist in container at known paths.
12–13Trung + SEO LeadAgent SEO Sentinel created + v1 system prompt in placeagent_… ID captured. Token count verified <80K on first turn.
14TrungMCP servers configured + testedAgent can read a known ClickUp task, can post to Slack, can query SEO Utils.
15–17TrungOrchestration script v1 working (manual CLI mode)Manual ./sentinel-run.sh completes end-to-end on synthetic data.
18Trung → SEO LeadT1 — 3 clean runs on synthetic client3 consecutive passes. Output validated by SEO Lead.
19–21TrungClickUp webhook integrationTask status change triggers session. Slack + ClickUp postback works.
22Trung → SEO Lead → JakeT2 — real client runSEO Lead sign-off. Jake notified. Sprint 1 exit complete for this agent.

14 Open questions & verification items The things I don't know — don't guess

Per user preferences: these are explicit "I don't know, let's verify at the source" items. None should block Day 1 work, but all should be answered before T2.

  1. Secrets mechanism for Managed Agents environments. (See Section 9.) Path: ask Anthropic support or inspect the Environment API response schema for env_vars / secrets fields. Blocker for: clean secret handling. Workaround: Plan B (pass in user.message).
  2. Is idle time while waiting for Apify runs billed as session-hour? Apify GBP scraper can take 60–90s per run. If we're running 3 competitors × 5 keywords of grid data, that's several minutes of the agent waiting on Apify. Does billing accrue during this wait? The skill says "idle waiting for human input doesn't bill" but waiting on external APIs may differ. Blocker for: accurate cost estimate. Workaround: measure actual cost on T1 runs, update estimate.
  3. Maximum session duration cap. Is there a hard kill at 2h, 6h, 24h? Full Sentinel run should be ~20-30min, so well under any reasonable cap, but worth knowing for future. Non-blocker.
  4. Container specs (CPU/memory/disk). DBSCAN clustering on a 13×13 grid × 5 keywords = 845 points. Trivial on any modern machine, but worth confirming container has at least 512MB memory for numpy/pandas operations. Non-blocker, likely fine.
  5. Data residency. Docs mention inference_geo parameter — does it apply to Managed Agents? Some detailing clients might require US-only inference. Non-blocker for first 3 test clients.
  6. MCP call billing against session tokens. When Sentinel calls ClickUp MCP or SEO Utils MCP, are the MCP response tokens counted against the agent's input tokens? Almost certainly yes, but worth measuring the weight on the system prompt. Non-blocker.
  7. The `skills` field in agent config — how does progressive disclosure actually work? The docs describe it, but I haven't seen the exact Files API upload flow in the excerpts I have. Trung should test uploading one skill file first, confirm it appears mounted in the container, before uploading all 5. Blocker for: skill mounting. Low risk, just test early.

15 Risks & mitigations What could break the ship date

RiskSeverityMitigation
Secrets mechanism isn't what I assumed; Plan B leaks keys into session logsHighVerify Day 8. If Plan B is only option, restrict Claude Console access to Jake + Trung only for now. Rotate keys monthly.
System prompt + skill mount is too large (>80K tokens)HighMeasure token count on first dry run. If over, trim by: splitting skills between Sentinel and future Catalyst, removing agency-os skill from Sentinel (not needed for read-only audits).
SEO Utils cloudflared tunnel down during a runMediumAgent gracefully degrades — flags missing baseline data in output, completes modules 1–5 anyway. Health check + Slack alert on tunnel drop.
Apify or Firecrawl rate-limits us during a runMediumUpgrade to paid tier that matches our run frequency. Retry logic in bash helpers (3 retries with exponential backoff). Flag in output if a module partially failed.
SEO Lead's system prompt doesn't ship in time for Day 12MediumTrung uses placeholder system prompt to unblock infrastructure testing. Real system prompt is a PATCH (version bump) on the agent, doesn't require recreation.
Output quality is inconsistent across runsMediumBaseline against reference audit in T1. If inconsistent, tighten system prompt with more explicit output contract (exact section headers, exact field names).
Orchestration script is buggy, sessions leak or double-triggerLow-MedIdempotency key in webhook handler (use ClickUp task ID + timestamp). Max 1 active Sentinel session per client at a time, enforced client-side.
Cost overruns — retry loops inflate token spendLow-MedSession cost alert: Slack ping if any single session >$5. Daily total alert if >$30.
Container lacks a Python package we needLowAgent can pip install inside the container at runtime. Slows first run but fixes itself. Add to environment packages list on next update.
Google Maps API quota exceeded during grid generationLowCache geocoded addresses in a local SQLite at /mnt/session/. Budget: 2500 free requests/day; grid of 169 points × few clients/day stays well under.

16 Cost model What this actually costs per run

🧮 These are estimates

Real numbers come after 3–5 T1 runs. These are prior-belief starting points, not commitments. Update the model with measured numbers after Day 18.

Cost componentEst. per runAssumptions
Claude tokens — input~$0.30~80K input tokens (system + skills + handoff payload), mostly cached after first minute. Cache reads at 10% of base rate.
Claude tokens — output~$0.45~30K output tokens (audit report content + module narrations)
Session-hour~$0.03525 min active runtime × $0.08/hr. Actual may be lower if idle-on-external-API doesn't bill.
Web search (built-in tool)~$0.10~10 searches × $0.01 each ($10/1000)
Apify runs~$0.40GBP scraper + citation scraper + SERP scraper. Free tier likely covers testing.
Firecrawl~$0.05Single site crawl, ~20 pages
Google Maps API~$0.00Under free tier (2500 req/day)
OpenAI (ChatGPT for AI visibility)~$0.055 queries × ~500 output tokens
Gemini API~$0.00Free tier
Perplexity API~$0.055 queries
Total per run:~$1.45–1.90

At 8 new clients onboarded per quarter × ~$1.75/run = ~$14/quarter production cost. Plus ongoing external API subscriptions (~$80/mo) which are fixed costs that would exist anyway.

17 Definition of Done Jake signs this off before work starts

✅ v1 ships when ALL of the following are true
  1. Agent exists. agent_… ID for "SEO Sentinel" is recorded in ClickUp and in version control.
  2. Environment exists. env_seonav_prod created, packages installed, files mounted, reusable.
  3. Secrets path decided. Either Plan A confirmed working or Plan B explicitly accepted by Jake, documented in ClickUp.
  4. Orchestration script runs on VPS. Can be triggered via manual CLI or ClickUp webhook. Posts deliverables to Slack + ClickUp.
  5. T1 passes 3 consecutive clean runs on synthetic client. SEO Lead validated output.
  6. T2 passes once on real low-risk client. SEO Lead would send output to client with light editing only.
  7. Cost per run measured and within $3 per run (2x buffer over estimate).
  8. Runbook exists in ClickUp. Trung documents: how to retrigger a run, how to check logs, how to rotate API keys, who to escalate to.
  9. Known issues logged. Any workarounds, rubric tuning needs, system prompt iterations captured for v2 planning.
  10. Jake reviews T2 output and signs off. Final gate. Green-light to use on next real client onboarding.

18 RACI matrix Who does what, who signs off

R = Responsible (does the work) · A = Accountable (signs off) · C = Consulted · I = Informed

TaskJakeTrungSEO LeadSenior TLPM
Overall v1 ship decisionACCII
Budget approval (Claude + external APIs)RC
Research Preview applicationRI
Test client selectionAR
System prompt authoringACRC
44-point GBP rubricRC
Output report templateARR
Handoff schemaCCCR
Prerequisites verification (Day 8)R
Secrets mechanism verificationAR
Environment creationR
File mount (skills + rubrics + helpers)RCR
Agent creation + version managementR
MCP wiring (ClickUp, Slack, SEO Utils)R
Helper scripts (apify, firecrawl, geo_grid, ai_visibility)RC
Orchestration script (Node.js on VPS)R
ClickUp webhook integrationRC
T1 executionRC
T1 validation (output quality)RC
T2 execution on real clientIRC
T2 validation + final sign-offAIR
Runbook documentationIRC
Cost monitoring + alertingIR

19 Appendix — Full curl reference Copy-pasteable for Trung

A1. Environment creation

# Day 9 — create shared environment ENV_ID=$(curl -fsSL https://api.anthropic.com/v1/environments \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "anthropic-beta: managed-agents-2026-04-01" \ -H "content-type: application/json" \ -d '{ "name": "seonav-prod", "config": { "type": "cloud", "packages": { "pip": ["requests","pandas","numpy","scikit-learn","beautifulsoup4","lxml","geopy"], "apt": ["jq"] }, "networking": {"type": "unrestricted"} } }' | jq -r '.id') echo "ENV_ID=$ENV_ID" # save to .env file

A2. Agent creation (minimal — iterate later)

# Day 12 — create agent (version 1, minimal scope) SENTINEL_ID=$(curl -fsSL https://api.anthropic.com/v1/agents \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "anthropic-beta: managed-agents-2026-04-01" \ -H "content-type: application/json" \ -d @sentinel-agent-config.json | jq -r '.id') echo "SENTINEL_ID=$SENTINEL_ID"

A3. Manual test session

# Day 15 — first dry run SESSION=$(curl -fsSL https://api.anthropic.com/v1/sessions \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "anthropic-beta: managed-agents-2026-04-01" \ -H "content-type: application/json" \ -d "{\"agent\":\"$SENTINEL_ID\",\"environment_id\":\"$ENV_ID\",\"title\":\"Sentinel T1 dry run\"}") SESSION_ID=$(jq -r '.id' <<<"$SESSION") # Send the initial event BEFORE opening the stream curl -sS https://api.anthropic.com/v1/sessions/$SESSION_ID/events \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "anthropic-beta: managed-agents-2026-04-01" \ -H "content-type: application/json" \ -d @sentinel-kickoff-payload.json # Open SSE stream curl -sS -N https://api.anthropic.com/v1/sessions/$SESSION_ID/stream \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "anthropic-beta: managed-agents-2026-04-01" \ -H "Accept: text/event-stream"

A4. Fetch session outputs

# After session.status_idle with stop_reason end_turn curl -fsSL "https://api.anthropic.com/v1/files?scope_id=$SESSION_ID" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "anthropic-beta: files-api-2025-04-14,managed-agents-2026-04-01" # Download a specific file curl -fsSL "https://api.anthropic.com/v1/files/$FILE_ID/content" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "anthropic-beta: files-api-2025-04-14" \ -o sentinel-audit.json

A5. Update agent (versioned)

# To update system prompt, pass current version number curl -fsSL "https://api.anthropic.com/v1/agents/$SENTINEL_ID" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "anthropic-beta: managed-agents-2026-04-01" \ -H "content-type: application/json" \ -X PATCH \ -d '{"version": 1, "system": "<updated system prompt>"}' # Response version will increment to 2. Existing sessions keep running with v1. # New sessions use v2.
📚 Primary references