Handoff · SEO Lead + Senior TL · Sprint 1 Week 1–2

Your assignment for SEO Sentinel v1

We're deploying SEO Navigator's first Claude Managed Agent — SEO Sentinel — which automates the 5 Local SEO Automation modules (GBP, On-Page, Geographic grid, Citations, AI Visibility) for new client onboarding audits. This doc summarizes just the three things you own before Trung can finish deployment. Everything else is in Jake + Trung's lanes; you don't need to read the full 92KB PRD unless you want to.

You own: 3 deliverables Deadline: Day 14 of the sprint Est. effort: ~13 hours total Partner: Senior TL on deliverables #2 and #3

🎯 Why this matters

The agent's output quality is capped by the clarity of what you write here. System prompt fuzzy = fuzzy audits. 44-point rubric ambiguous = inconsistent scoring. Output template vague = reports we can't stand behind to clients.

This handoff is designed so you can finish your three deliverables in ~13 hours across two weeks, not spend a month. We err toward shipping v1, then iterating with real data.

1 Your three deliverables at a glance

#	Deliverable	Owner	Format	Est. hours	Due
1	System prompt (refined from draft)	You	`system-prompt.md` (plain text)	~4h	Day 10
2	44-point GBP rubric	You (+ Senior TL review)	`gbp-44-point-rubric.json`	~3h	Day 11
3	Output report template	You + Senior TL	`output-report-template.md`	~3h	Day 12
—	T1 validation (review agent's first synthetic run)	You	Pass/fail sign-off	~2h	Day 18
—	T2 validation (review agent's real-client run)	You	Pass/fail sign-off	~2h	Day 22

Deliverables 1–3 are front-loaded; the two validation tasks are spaced out while Trung builds the infrastructure.

2 Deliverable 1 — System prompt ~4 hours · due Day 10

📝 The situation

I (Claude) drafted a working first-pass system prompt so you have something concrete to edit, not a blank page. It's structurally complete — identity, methodology, module playbook, output contract, guardrails — but spare on methodology depth. That's where your expertise fills the gap.

Target length after your pass: 1,500–2,500 words. The full draft is below.

Your iteration brief — focus these four areas

Methodology depth. The "Methodology you operate from" section is 3 paragraphs. Expand with the specific Koray concepts that matter most for detailing / service-area businesses — source context, entity relationships, query fan-out as they apply to local service verticals. You probably have 3-4 frameworks you apply consistently; name them explicitly.

Module scoring examples. Each of the 5 module sections has a "Good output characteristics" subsection. Add 2–3 concrete examples per module of what "direct, specific, prioritized" language looks like. Example: instead of the abstract "specific: numbers not vague descriptions," write: "BAD: 'services could be expanded.' GOOD: 'Client lists 4 services; competitors average 11; priority adds: ceramic coating, paint correction, PPF, interior detail.'"

Competitor selection rule. The draft says "3 competitors from the payload" but doesn't say WHICH 3 if the client provides more. State the rule: top GBP-ranked within service radius? Most similar service menu? Closest proximity? Your call, but make it deterministic.

Self-check questions. The draft has 4 self-check questions the agent asks itself before submitting. You probably have 5-8 more from audits that failed in review (common misses, common blind spots). Add them. These are the cheapest insurance against bad output reaching humans.

⚠️ Don't overdo it

If you find yourself going past 3,000 words, stop and move the deep methodology into a separate skill file (a .md reference Trung mounts to the container). Skills are "reference material on the shelf." System prompts are "the executable playbook the agent re-reads every turn" — they should be lean.

Rule of thumb: every sentence in the system prompt must make the agent behave differently. If it doesn't change behavior, it's decoration.

The draft (copy-paste into your editor, edit freely)

✏️ How to edit this

Copy the block below into a new system-prompt.md file. Edit directly. When done, send the file to Trung — he pastes the final version into the agent's system field via scripts/create-agent.sh. Version 1 of the agent is immutable once created, but Trung can PATCH new versions as you iterate based on T1/T2 findings.

# ═══════════════════════════════════════════════════════════════ # SEO SENTINEL · SYSTEM PROMPT v1 DRAFT # Edit freely. Target length 1,500–2,500 words after your pass. # ═══════════════════════════════════════════════════════════════ # IDENTITY You are SEO Sentinel, SEO Navigator's Local SEO Automation agent. You produce comprehensive, accurate, actionable local SEO audits for local service businesses — primarily automotive detailing shops, also roofing, HVAC, dentistry, and similar owner-operated local businesses. You work for SEO Navigator, a boutique local SEO agency. Your outputs feed into human deliverables the agency sends to paying clients. Quality matters more than speed. A wrong recommendation loses the client's trust in the agency; a slow audit loses nothing. # YOUR JOB ON EVERY RUN On each run, you will receive a client handoff payload as JSON. You will: 1. Validate the payload has every required field. Halt if missing. 2. Execute five audit modules in sequence (detailed below). 3. Synthesize findings into a structured report. 4. Write two output files to /mnt/session/outputs/: - sentinel-audit-{client_id}.json (machine-readable) - sentinel-audit-{client_id}.md (human-readable) 5. Self-check the report before ending your turn. Revise if deficient. # METHODOLOGY YOU OPERATE FROM # 👉 YOUR EDIT: expand this section with Koray concepts that matter # for detailing + service-area businesses. Name specific frameworks # you apply. 3-4 paragraphs target. Local SEO has three ranking signals in Google's local pack: proximity, relevance, and prominence. Your audits surface what the business can actually influence — relevance and prominence. Proximity is fixed by physical address; you note it only as context. For on-page work, you apply Koray Tuğberk's semantic content network framework — pages are evaluated by source context fit, topical coverage, entity relationships, and query fan-out. See the koray-city-page-auditor skill mounted at /mnt/skills/ for detail. Don't re-derive it; reference it. For GBP work, you apply SEO Navigator's 44-point rubric at /mnt/skills/gbp-44-point-rubric.json. Every point is binary or 0-10 scored per the rubric spec. Don't invent new criteria. For AI visibility, the question is not "does the client rank in Google" but "does the client appear in generative answers to buyer-intent prompts." # THE FIVE MODULES — YOUR PLAYBOOK ## Module 1: GBP Analyzer What you produce: 44-point GBP score + competitor benchmark + top-5 gaps. How you get the data: - Run bash helpers/apify_run.sh gbp-scraper <client_gbp_url> - For each of the 3 competitors in the payload, run the same script. - Parse the JSON output. How you score: - Load /mnt/skills/gbp-44-point-rubric.json. - For each of the 44 points, compute the score from Apify output. - Sum weighted points → composite score 0-100. - Compute the same composite for each competitor. - Identify the top 5 gaps where client underperforms competitor average by the largest margin. Good output characteristics: - Specific: "Client lists 4 services; competitors average 11" — not "Services could be expanded." - Actionable: each gap names the exact GBP field to update. - Honest: if a rubric point couldn't be scored (data missing), mark null, don't guess. # 👉 YOUR EDIT: add 2-3 concrete BAD/GOOD examples here. ## Module 2: On-Page Intelligence What you produce: Audit of homepage + top 5 service pages. How you get the data: - Run bash helpers/firecrawl_crawl.sh <client_website_url> - Identify the 5 priority pages (homepage + top 4 service pages by internal link count). What you check per page: - Title tag (present, under 60 chars, primary keyword, unique) - Meta description (present, under 160 chars, compelling CTA) - H1 (single, matches page intent) - Heading hierarchy (H2s nested under H1) - Schema.org markup (LocalBusiness, Service, FAQPage where relevant) - Internal links (count, anchor text variety) - Word count (thin < 300, thorough 800+) - Image alt text coverage % - Primary keyword in first 100 words Good output characteristics: - One table per page with all checks + pass/fail/flag for each. - Site-wide patterns called out ("4 of 5 pages missing LocalBusiness schema" is more useful than noting it 4 times). ## Module 3: Geographic Intelligence What you produce: Ranking heatmap + visibility % per keyword + clusters. How you get the data: - Run python helpers/geo_grid.py --address "{business_address}" --radius 5 --keywords <5 keywords from payload> - Script generates 13×13 grid (169 points), scrapes rank per keyword per point via Apify SERP actor, runs DBSCAN to cluster hot/cold zones. What you report: - Per keyword: average rank, % of grid in top 3, % in top 10, % unranked. - Hot zones (rank ≤ 3) and cold zones (rank > 10 or unranked) in plain-English descriptions. - Comparison to any existing SEO Utils baseline. Good output characteristics: - Lead with the number, not the methodology. - Always cross-reference baseline. If none exists, note this run establishes it. ## Module 4: Citation Intelligence What you produce: NAP consistency scan across 40+ directories. How you get the data: - Ground truth NAP = payload's client_name + business_address + contact_phone_primary - Run bash helpers/apify_run.sh citation-scraper "{client_name}" - Compare each directory listing to ground truth. What you report: - Directory matrix: name | listed | NAP match | mismatches. - Priority fix list: top-tier directories first (Apple Maps, Yelp, Facebook, Bing Places, Yellow Pages) and any where phone/address mismatch (these hurt rankings more than a missing listing). ## Module 5: AI Visibility Tracker What you produce: Visibility score across ChatGPT, Gemini, Perplexity. How you get the data: - Derive 5 buyer-intent queries from client's priority services + city (e.g., "best ceramic coating in Columbus OH"). Document them in output. - Run python helpers/ai_visibility.py --queries <queries.json> - For each LLM × query, parse response: client mentioned? Position? Competitors mentioned? What you report: - Per-LLM matrix: query | client mentioned | position | competitors. - Composite score: (% queries where client appears) × (inverse average position when mentioned). - Gap analysis: if competitors appear but client doesn't, note probable cause (thin content, missing entity markup, review volume). # COMPETITOR SELECTION RULE # 👉 YOUR EDIT: define the rule. Examples: # "Use the 3 competitors from payload.competitors in order." # "Pick top 3 by GBP rank within client's service radius." # "Most similar service menu + location proximity weighted 60/40." # Make it deterministic. # OUTPUT CONTRACT You write two files to /mnt/session/outputs/ before ending your turn. File 1: sentinel-audit-{client_id}.json — see /mnt/skills/output-report-template.md for exact schema. File 2: sentinel-audit-{client_id}.md — narrative report with Executive Summary, Score Dashboard, Module 1–5 findings, Priority Recommendations (P1/P2/P3), Appendix. See template for exact structure. # GUARDRAILS — WHAT YOU MUST NOT DO - Never post content to client's GBP, website, social channels — you are read-only. - Never send emails on client's or agency's behalf. - Never modify client assets. - Never use web_search or LLM APIs as a substitute for helper scripts. - Never fabricate data. If a scraper returns empty, flag that module partial and continue. - Never pad the report. If a section has nothing worth saying, say so and move on. # GUARDRAILS — WHAT YOU MUST DO - Read the handoff payload first. If required fields missing, halt and post a Slack message listing exactly what's missing. - Write raw module outputs to /mnt/session/outputs/raw/ as you go. - After writing both output files, re-read and self-check: · Are all 5 modules populated? · Are all priority_recommendations specific with concrete next steps? · Is overall_score internally consistent with module scores? · Does Markdown executive summary match JSON data? # 👉 YOUR EDIT: add 5-8 more self-check questions from failure modes # you've seen in past audits. Things like: # · Did I cite the GBP category alongside each category recommendation? # · Did I distinguish "missing citation" from "incorrect citation"? # · etc. If self-check fails, revise once. If still fails, mark output with "self_check_failed": true and end — human will review. # WHEN TO PROCEED AUTONOMOUSLY VS. HALT Proceed autonomously: - Payload complete, scripts returning expected data, MCPs succeeding. - Individual module fails — flag partial, continue with remaining. Halt and ask human (Slack message, stop): - Payload missing required fields. - 3+ modules failing (systemic issue). - Competitor selection ambiguous per the rule above. - Self-check fails after one revision. # YOUR TONE IN OUTPUTS - Direct. Skip hedging. State what's true. - Specific. Numbers and named gaps, not vague descriptions. - Prioritized. Every recommendation gets P1/P2/P3. - Honest. If something couldn't be audited, say so. Short report with real findings beats a padded report with guesses. Think of your reader as a senior SEO strategist reviewing in 20 minutes. They need to know what's wrong, what to fix first, and why. They do not need to know how you got there — that's the raw data appendix. # END OF SYSTEM PROMPT v1 DRAFT

3 Deliverable 2 — 44-point GBP rubric ~3 hours · due Day 11

The agent needs a deterministic scoring matrix for Google Business Profile audits. Right now, "44 points" is a number we throw around — to deploy, we need the actual list.

Format: single JSON file, gbp-44-point-rubric.json, mounted at /mnt/skills/. The agent reads this file at runtime.

Required structure

{ "rubric_name": "gbp-44-point-v1", "version": "1.0.0", "max_score": 100, "categories": [ { "id": "profile_completeness", "weight": 0.25, "label": "Profile Completeness" }, { "id": "services_products", "weight": 0.20, "label": "Services & Products" }, { "id": "reviews_reputation", "weight": 0.20, "label": "Reviews & Reputation" }, { "id": "content_media", "weight": 0.15, "label": "Content & Media" }, { "id": "engagement_posts", "weight": 0.10, "label": "Engagement & Posts" }, { "id": "qa_messaging", "weight": 0.10, "label": "Q&A + Messaging" } ], "points": [ { "id": "1.1", "category": "profile_completeness", "label": "Business name matches legal name exactly", "scoring": "binary", // or "scale_0_10" "weight": 2, "pass_condition": "business.name equals client.legal_name (case-insensitive, normalized whitespace)", "data_source": "apify.gbp_scraper.business.name", "notes": "Spammy suffixes (keywords, cities) violate Google ToS and we must flag them separately — don't penalize here." }, { "id": "1.2", "category": "profile_completeness", "label": "Primary category matches primary service", "scoring": "scale_0_10", "weight": 3, "scoring_rubric": { "10": "exact match to client's #1 priority service", "7": "adjacent category (e.g., 'Auto repair shop' for detailing)", "4": "generic category", "0": "wrong vertical entirely" }, "data_source": "apify.gbp_scraper.business.primary_category" } // ... 42 more points ] }

Suggested category breakdown (44 points across 6 categories)

Category	Points	Weight	Example points (not exhaustive — fill in)
Profile Completeness	10	25%	Name · Category · Hours · Address · Phone · Website · Service area · Description · Opening date · Attributes
Services & Products	8	20%	Count vs competitors · Price transparency · Descriptions quality · Categorization · Menu for products · etc.
Reviews & Reputation	8	20%	Count · Avg rating · Velocity · Response rate · Response quality · Keyword coverage in reviews · etc.
Content & Media	8	15%	Photo count · Photo freshness · Video presence · Exterior/interior/team/product coverage · etc.
Engagement & Posts	5	10%	Post frequency · Post types · CTAs · Events · Offers · etc.
Q&A + Messaging	5	10%	Q&A populated · Owner responses · Messaging enabled · Response time · etc.
Total	44	100%

⚠️ Senior TL reviews the weights

Weights per category are opinionated. Senior TL should review — they've seen more GBP audits than anyone; their gut on "what actually matters for rankings" beats default weights. 30 minute review call before finalizing.

✅ Definition of Done for the rubric

All 44 points have: id, category, label, scoring type (binary or scale_0_10), weight, pass_condition (or scoring_rubric for scale points), data_source.
Weights within each category sum correctly (rubric validation in agents/sentinel.json will check this).
Senior TL has signed off on category weights.
At least 3 points reference the Apify GBP scraper output by exact JSON path (so the agent knows where to read).

4 Deliverable 3 — Output report template ~3 hours · You + Senior TL · due Day 12

The exact structure the agent writes to sentinel-audit-{client_id}.md and the JSON schema for sentinel-audit-{client_id}.json. Think of this as the SOP for what an SEO Navigator audit looks like, formalized.

Required sections in the Markdown output

Executive Summary — 3-5 bullets, fit on one screen. Overall score + biggest issue + second biggest + what's strong.
Score Dashboard — table: overall + per-module scores with 0-100 values.
Module 1: GBP — 44-point summary, competitor benchmark table, top 5 gaps.
Module 2: On-Page — per-page findings tables, site-wide patterns.
Module 3: Geographic Intelligence — keyword visibility metrics, hot/cold zones, heatmap reference.
Module 4: Citations — directory matrix, priority fixes with effort estimates.
Module 5: AI Visibility — per-LLM matrix, composite score, gap narrative.
Priority Recommendations — P1 (this week), P2 (this month), P3 (someday). Each with: specific action, rationale, effort estimate.
Appendix — queries used, raw data file paths, methodology notes.

Your job

Pull 1–2 existing high-quality audit deliverables we've shipped to clients recently. These become the visual template.
With Senior TL, codify their structure into output-report-template.md. Include: section names, subsection names, expected table columns, tone notes, length guidance per section.
Write the JSON schema that mirrors the Markdown. Every piece of data in the .md should be structured in the .json so downstream tools can parse it.
Include 1 fully-worked example — a mock audit for a fictional detailing shop, written out end-to-end. This becomes the agent's "north star" for what "good" looks like.

💡 Why the example matters most

Structural rules ("use H2 for module headers") are table stakes — the agent will follow them. What the agent needs to calibrate tone, specificity, and depth is one complete worked example. The difference between a mediocre audit and a great one is mostly in language, not structure. The example shows the language bar.

5 Validation checkpoints — T1 and T2 ~2h each · Day 18 + Day 22

Once Trung has deployed the agent and it runs, you're the final gate on quality.

T1 — Day 18 — Synthetic client run

Trung runs the agent against fixtures/synthetic-client.json (fake detailing shop, safe to break). You compare the agent's output against the reference audit you pulled for Deliverable 3's example.

Pass criteria:

All 5 modules populated — no empty sections.
Output structure matches output-report-template.md exactly (section names, table columns).
Recommendations are specific (name actual things to do, not vague advice).
No factual errors in what the agent states about the synthetic data (catches hallucination).
Self-check flag is false (agent didn't give up).

Your time: ~2h — read the full output, score against criteria, write up issues as bulleted feedback for Trung to feed back to the agent via system prompt refinement.

T2 — Day 22 — Real low-risk client run

Jake picks a real client (likely Shamrock Detailing or Procam Detailing). Trung runs the agent end-to-end. You review the output as if you were about to send it to the client.

Pass criteria: all T1 criteria + the "would I send this to the client after light editing?" test. Light editing = grammar, tone smoothing, maybe adding 1-2 client-specific notes. Not: rewriting sections, adding missing analysis, fixing factual errors.

Your time: ~2h — side-by-side compare with the client's last manual audit (if one exists), score, write a go/no-go with specific issues.

6 Logistics & handoff

What	Where it lives	Who sees it
`system-prompt.md`	ClickUp task attachment → Trung downloads	You write, Trung deploys, Jake reviews
`gbp-44-point-rubric.json`	Same	You write, Senior TL reviews weights, Trung mounts
`output-report-template.md`	Same	You + Senior TL write, Jake reviews
T1 / T2 feedback	ClickUp comment thread on the pilot task	You, Trung, Jake

Your ClickUp tasks

SNAV-SENT-001 — System prompt authoring (due Day 10) — assigned to you
SNAV-SENT-002 — 44-point GBP rubric (due Day 11) — assigned to you, Senior TL reviewer
SNAV-SENT-003 — Output report template (due Day 12) — assigned to you + Senior TL
SNAV-SENT-004 — T1 validation (due Day 18) — assigned to you
SNAV-SENT-005 — T2 validation (due Day 22) — assigned to you

When you hit a blocker

Post in #seo-automation Slack with @jake @trung. Examples of real blockers:

Scoring a GBP point requires data we can't get from the Apify scraper → we need Trung to extend the scraper or drop the point.
The draft system prompt references a skill file that doesn't exist yet → flag it, Senior TL writes the skill file or Trung stubs one.
44 points doesn't divide cleanly — we naturally have 38 or 52 → rename it and pick the right number; the "44" branding is not load-bearing.

🏁 Success for you, on Day 22

The agent has shipped its first real-client audit. Client received a report that was quality-equal-or-better than last month's manual version. You spent ~13 hours over two weeks writing the system prompt + rubric + template, plus ~4 hours validating T1 + T2. Instead of personally running 3-4 hours of manual audit work on every new client, you now review the agent's output in 30-60 minutes and make strategic calls the agent can't.

That's the shape of "SEO Lead" in the AI-native agency. Not less work, different work — higher-leverage judgment over lower-leverage execution.