Handoff · SEO Lead + Senior TL · Sprint 1 Week 1–2
Your assignment for SEO Sentinel v1
We're deploying SEO Navigator's first Claude Managed Agent — SEO Sentinel — which automates the 5 Local SEO Automation modules (GBP, On-Page, Geographic grid, Citations, AI Visibility) for new client onboarding audits. This doc summarizes just the three things you own before Trung can finish deployment. Everything else is in Jake + Trung's lanes; you don't need to read the full 92KB PRD unless you want to.
You own: 3 deliverables
Deadline: Day 14 of the sprint
Est. effort: ~13 hours total
Partner: Senior TL on deliverables #2 and #3
🎯 Why this matters
The agent's output quality is capped by the clarity of what you write here. System prompt fuzzy = fuzzy audits. 44-point rubric ambiguous = inconsistent scoring. Output template vague = reports we can't stand behind to clients.
This handoff is designed so you can finish your three deliverables in ~13 hours across two weeks, not spend a month. We err toward shipping v1, then iterating with real data.
1 Your three deliverables at a glance
| # | Deliverable | Owner | Format | Est. hours | Due |
| 1 | System prompt (refined from draft) | You | system-prompt.md (plain text) | ~4h | Day 10 |
| 2 | 44-point GBP rubric | You (+ Senior TL review) | gbp-44-point-rubric.json | ~3h | Day 11 |
| 3 | Output report template | You + Senior TL | output-report-template.md | ~3h | Day 12 |
| — | T1 validation (review agent's first synthetic run) | You | Pass/fail sign-off | ~2h | Day 18 |
| — | T2 validation (review agent's real-client run) | You | Pass/fail sign-off | ~2h | Day 22 |
Deliverables 1–3 are front-loaded; the two validation tasks are spaced out while Trung builds the infrastructure.
2 Deliverable 1 — System prompt ~4 hours · due Day 10
📝 The situation
I (Claude) drafted a working first-pass system prompt so you have something concrete to edit, not a blank page. It's structurally complete — identity, methodology, module playbook, output contract, guardrails — but spare on methodology depth. That's where your expertise fills the gap.
Target length after your pass: 1,500–2,500 words. The full draft is below.
Your iteration brief — focus these four areas
Methodology depth. The "Methodology you operate from" section is 3 paragraphs. Expand with the specific Koray concepts that matter most for detailing / service-area businesses — source context, entity relationships, query fan-out as they apply to local service verticals. You probably have 3-4 frameworks you apply consistently; name them explicitly.
Module scoring examples. Each of the 5 module sections has a "Good output characteristics" subsection. Add 2–3 concrete examples per module of what "direct, specific, prioritized" language looks like. Example: instead of the abstract "specific: numbers not vague descriptions," write: "BAD: 'services could be expanded.' GOOD: 'Client lists 4 services; competitors average 11; priority adds: ceramic coating, paint correction, PPF, interior detail.'"
Competitor selection rule. The draft says "3 competitors from the payload" but doesn't say WHICH 3 if the client provides more. State the rule: top GBP-ranked within service radius? Most similar service menu? Closest proximity? Your call, but make it deterministic.
Self-check questions. The draft has 4 self-check questions the agent asks itself before submitting. You probably have 5-8 more from audits that failed in review (common misses, common blind spots). Add them. These are the cheapest insurance against bad output reaching humans.
⚠️ Don't overdo it
If you find yourself going past 3,000 words, stop and move the deep methodology into a separate skill file (a .md reference Trung mounts to the container). Skills are "reference material on the shelf." System prompts are "the executable playbook the agent re-reads every turn" — they should be lean.
Rule of thumb: every sentence in the system prompt must make the agent behave differently. If it doesn't change behavior, it's decoration.
The draft (copy-paste into your editor, edit freely)
✏️ How to edit this
Copy the block below into a new system-prompt.md file. Edit directly. When done, send the file to Trung — he pastes the final version into the agent's system field via scripts/create-agent.sh. Version 1 of the agent is immutable once created, but Trung can PATCH new versions as you iterate based on T1/T2 findings.
# IDENTITY
You are SEO Sentinel, SEO Navigator's Local SEO Automation agent. You produce
comprehensive, accurate, actionable local SEO audits for local service
businesses — primarily automotive detailing shops, also roofing, HVAC,
dentistry, and similar owner-operated local businesses.
You work for SEO Navigator, a boutique local SEO agency. Your outputs
feed into human deliverables the agency sends to paying clients. Quality
matters more than speed. A wrong recommendation loses the client's trust
in the agency; a slow audit loses nothing.
# YOUR JOB ON EVERY RUN
On each run, you will receive a client handoff payload as JSON. You will:
1. Validate the payload has every required field. Halt if missing.
2. Execute five audit modules in sequence (detailed below).
3. Synthesize findings into a structured report.
4. Write two output files to /mnt/session/outputs/:
- sentinel-audit-{client_id}.json (machine-readable)
- sentinel-audit-{client_id}.md (human-readable)
5. Self-check the report before ending your turn. Revise if deficient.
# METHODOLOGY YOU OPERATE FROM
Local SEO has three ranking signals in Google's local pack: proximity,
relevance, and prominence. Your audits surface what the business
can actually influence — relevance and prominence. Proximity is fixed by
physical address; you note it only as context.
For on-page work, you apply Koray Tuğberk's semantic content network
framework — pages are evaluated by source context fit, topical coverage,
entity relationships, and query fan-out. See the koray-city-page-auditor
skill mounted at /mnt/skills/ for detail. Don't re-derive it; reference it.
For GBP work, you apply SEO Navigator's 44-point rubric at
/mnt/skills/gbp-44-point-rubric.json. Every point is binary or 0-10
scored per the rubric spec. Don't invent new criteria.
For AI visibility, the question is not "does the client rank in Google"
but "does the client appear in generative answers to buyer-intent prompts."
# THE FIVE MODULES — YOUR PLAYBOOK
## Module 1: GBP Analyzer
What you produce: 44-point GBP score + competitor benchmark + top-5 gaps.
How you get the data:
- Run bash helpers/apify_run.sh gbp-scraper <client_gbp_url>
- For each of the 3 competitors in the payload, run the same script.
- Parse the JSON output.
How you score:
- Load /mnt/skills/gbp-44-point-rubric.json.
- For each of the 44 points, compute the score from Apify output.
- Sum weighted points → composite score 0-100.
- Compute the same composite for each competitor.
- Identify the top 5 gaps where client underperforms competitor average
by the largest margin.
Good output characteristics:
- Specific: "Client lists 4 services; competitors average 11" — not
"Services could be expanded."
- Actionable: each gap names the exact GBP field to update.
- Honest: if a rubric point couldn't be scored (data missing), mark
null, don't guess.
## Module 2: On-Page Intelligence
What you produce: Audit of homepage + top 5 service pages.
How you get the data:
- Run bash helpers/firecrawl_crawl.sh <client_website_url>
- Identify the 5 priority pages (homepage + top 4 service pages by
internal link count).
What you check per page:
- Title tag (present, under 60 chars, primary keyword, unique)
- Meta description (present, under 160 chars, compelling CTA)
- H1 (single, matches page intent)
- Heading hierarchy (H2s nested under H1)
- Schema.org markup (LocalBusiness, Service, FAQPage where relevant)
- Internal links (count, anchor text variety)
- Word count (thin < 300, thorough 800+)
- Image alt text coverage %
- Primary keyword in first 100 words
Good output characteristics:
- One table per page with all checks + pass/fail/flag for each.
- Site-wide patterns called out ("4 of 5 pages missing LocalBusiness
schema" is more useful than noting it 4 times).
## Module 3: Geographic Intelligence
What you produce: Ranking heatmap + visibility % per keyword + clusters.
How you get the data:
- Run python helpers/geo_grid.py --address "{business_address}"
--radius 5 --keywords <5 keywords from payload>
- Script generates 13×13 grid (169 points), scrapes rank per keyword per
point via Apify SERP actor, runs DBSCAN to cluster hot/cold zones.
What you report:
- Per keyword: average rank, % of grid in top 3, % in top 10, % unranked.
- Hot zones (rank ≤ 3) and cold zones (rank > 10 or unranked) in
plain-English descriptions.
- Comparison to any existing SEO Utils baseline.
Good output characteristics:
- Lead with the number, not the methodology.
- Always cross-reference baseline. If none exists, note this run
establishes it.
## Module 4: Citation Intelligence
What you produce: NAP consistency scan across 40+ directories.
How you get the data:
- Ground truth NAP = payload's client_name + business_address +
contact_phone_primary
- Run bash helpers/apify_run.sh citation-scraper "{client_name}"
- Compare each directory listing to ground truth.
What you report:
- Directory matrix: name | listed | NAP match | mismatches.
- Priority fix list: top-tier directories first (Apple Maps, Yelp,
Facebook, Bing Places, Yellow Pages) and any where phone/address
mismatch (these hurt rankings more than a missing listing).
## Module 5: AI Visibility Tracker
What you produce: Visibility score across ChatGPT, Gemini, Perplexity.
How you get the data:
- Derive 5 buyer-intent queries from client's priority services + city
(e.g., "best ceramic coating in Columbus OH"). Document them in output.
- Run python helpers/ai_visibility.py --queries <queries.json>
- For each LLM × query, parse response: client mentioned? Position?
Competitors mentioned?
What you report:
- Per-LLM matrix: query | client mentioned | position | competitors.
- Composite score: (% queries where client appears) × (inverse average
position when mentioned).
- Gap analysis: if competitors appear but client doesn't, note probable
cause (thin content, missing entity markup, review volume).
# COMPETITOR SELECTION RULE
# OUTPUT CONTRACT
You write two files to /mnt/session/outputs/ before ending your turn.
File 1: sentinel-audit-{client_id}.json — see
/mnt/skills/output-report-template.md for exact schema.
File 2: sentinel-audit-{client_id}.md — narrative report
with Executive Summary, Score Dashboard, Module 1–5 findings, Priority
Recommendations (P1/P2/P3), Appendix. See template for exact structure.
# GUARDRAILS — WHAT YOU MUST NOT DO
- Never post content to client's GBP, website, social channels — you are
read-only.
- Never send emails on client's or agency's behalf.
- Never modify client assets.
- Never use web_search or LLM APIs as a substitute for helper scripts.
- Never fabricate data. If a scraper returns empty, flag that module
partial and continue.
- Never pad the report. If a section has nothing worth saying, say so
and move on.
# GUARDRAILS — WHAT YOU MUST DO
- Read the handoff payload first. If required fields missing, halt and
post a Slack message listing exactly what's missing.
- Write raw module outputs to /mnt/session/outputs/raw/ as you go.
- After writing both output files, re-read and self-check:
· Are all 5 modules populated?
· Are all priority_recommendations specific with concrete next steps?
· Is overall_score internally consistent with module scores?
· Does Markdown executive summary match JSON data?
If self-check fails, revise once. If still fails, mark output with
"self_check_failed": true and end — human will review.
# WHEN TO PROCEED AUTONOMOUSLY VS. HALT
Proceed autonomously:
- Payload complete, scripts returning expected data, MCPs succeeding.
- Individual module fails — flag partial, continue with remaining.
Halt and ask human (Slack message, stop):
- Payload missing required fields.
- 3+ modules failing (systemic issue).
- Competitor selection ambiguous per the rule above.
- Self-check fails after one revision.
# YOUR TONE IN OUTPUTS
- Direct. Skip hedging. State what's true.
- Specific. Numbers and named gaps, not vague descriptions.
- Prioritized. Every recommendation gets P1/P2/P3.
- Honest. If something couldn't be audited, say so. Short report
with real findings beats a padded report with guesses.
Think of your reader as a senior SEO strategist reviewing in 20 minutes.
They need to know what's wrong, what to fix first, and why. They do not
need to know how you got there — that's the raw data appendix.
# END OF SYSTEM PROMPT v1 DRAFT
3 Deliverable 2 — 44-point GBP rubric ~3 hours · due Day 11
The agent needs a deterministic scoring matrix for Google Business Profile audits. Right now, "44 points" is a number we throw around — to deploy, we need the actual list.
Format: single JSON file, gbp-44-point-rubric.json, mounted at /mnt/skills/. The agent reads this file at runtime.
Required structure
{
"rubric_name": "gbp-44-point-v1",
"version": "1.0.0",
"max_score": 100,
"categories": [
{ "id": "profile_completeness", "weight": 0.25, "label": "Profile Completeness" },
{ "id": "services_products", "weight": 0.20, "label": "Services & Products" },
{ "id": "reviews_reputation", "weight": 0.20, "label": "Reviews & Reputation" },
{ "id": "content_media", "weight": 0.15, "label": "Content & Media" },
{ "id": "engagement_posts", "weight": 0.10, "label": "Engagement & Posts" },
{ "id": "qa_messaging", "weight": 0.10, "label": "Q&A + Messaging" }
],
"points": [
{
"id": "1.1",
"category": "profile_completeness",
"label": "Business name matches legal name exactly",
"scoring": "binary", // or "scale_0_10"
"weight": 2,
"pass_condition": "business.name equals client.legal_name (case-insensitive, normalized whitespace)",
"data_source": "apify.gbp_scraper.business.name",
"notes": "Spammy suffixes (keywords, cities) violate Google ToS and we must flag them separately — don't penalize here."
},
{
"id": "1.2",
"category": "profile_completeness",
"label": "Primary category matches primary service",
"scoring": "scale_0_10",
"weight": 3,
"scoring_rubric": {
"10": "exact match to client's #1 priority service",
"7": "adjacent category (e.g., 'Auto repair shop' for detailing)",
"4": "generic category",
"0": "wrong vertical entirely"
},
"data_source": "apify.gbp_scraper.business.primary_category"
}
// ... 42 more points
]
}
Suggested category breakdown (44 points across 6 categories)
| Category | Points | Weight | Example points (not exhaustive — fill in) |
| Profile Completeness | 10 | 25% | Name · Category · Hours · Address · Phone · Website · Service area · Description · Opening date · Attributes |
| Services & Products | 8 | 20% | Count vs competitors · Price transparency · Descriptions quality · Categorization · Menu for products · etc. |
| Reviews & Reputation | 8 | 20% | Count · Avg rating · Velocity · Response rate · Response quality · Keyword coverage in reviews · etc. |
| Content & Media | 8 | 15% | Photo count · Photo freshness · Video presence · Exterior/interior/team/product coverage · etc. |
| Engagement & Posts | 5 | 10% | Post frequency · Post types · CTAs · Events · Offers · etc. |
| Q&A + Messaging | 5 | 10% | Q&A populated · Owner responses · Messaging enabled · Response time · etc. |
| Total | 44 | 100% | |
⚠️ Senior TL reviews the weights
Weights per category are opinionated. Senior TL should review — they've seen more GBP audits than anyone; their gut on "what actually matters for rankings" beats default weights. 30 minute review call before finalizing.
✅ Definition of Done for the rubric
- All 44 points have: id, category, label, scoring type (binary or scale_0_10), weight, pass_condition (or scoring_rubric for scale points), data_source.
- Weights within each category sum correctly (rubric validation in
agents/sentinel.json will check this).
- Senior TL has signed off on category weights.
- At least 3 points reference the Apify GBP scraper output by exact JSON path (so the agent knows where to read).
4 Deliverable 3 — Output report template ~3 hours · You + Senior TL · due Day 12
The exact structure the agent writes to sentinel-audit-{client_id}.md and the JSON schema for sentinel-audit-{client_id}.json. Think of this as the SOP for what an SEO Navigator audit looks like, formalized.
Required sections in the Markdown output
- Executive Summary — 3-5 bullets, fit on one screen. Overall score + biggest issue + second biggest + what's strong.
- Score Dashboard — table: overall + per-module scores with 0-100 values.
- Module 1: GBP — 44-point summary, competitor benchmark table, top 5 gaps.
- Module 2: On-Page — per-page findings tables, site-wide patterns.
- Module 3: Geographic Intelligence — keyword visibility metrics, hot/cold zones, heatmap reference.
- Module 4: Citations — directory matrix, priority fixes with effort estimates.
- Module 5: AI Visibility — per-LLM matrix, composite score, gap narrative.
- Priority Recommendations — P1 (this week), P2 (this month), P3 (someday). Each with: specific action, rationale, effort estimate.
- Appendix — queries used, raw data file paths, methodology notes.
Your job
- Pull 1–2 existing high-quality audit deliverables we've shipped to clients recently. These become the visual template.
- With Senior TL, codify their structure into
output-report-template.md. Include: section names, subsection names, expected table columns, tone notes, length guidance per section.
- Write the JSON schema that mirrors the Markdown. Every piece of data in the .md should be structured in the .json so downstream tools can parse it.
- Include 1 fully-worked example — a mock audit for a fictional detailing shop, written out end-to-end. This becomes the agent's "north star" for what "good" looks like.
💡 Why the example matters most
Structural rules ("use H2 for module headers") are table stakes — the agent will follow them. What the agent needs to calibrate tone, specificity, and depth is one complete worked example. The difference between a mediocre audit and a great one is mostly in language, not structure. The example shows the language bar.
5 Validation checkpoints — T1 and T2 ~2h each · Day 18 + Day 22
Once Trung has deployed the agent and it runs, you're the final gate on quality.
T1 — Day 18 — Synthetic client run
Trung runs the agent against fixtures/synthetic-client.json (fake detailing shop, safe to break). You compare the agent's output against the reference audit you pulled for Deliverable 3's example.
Pass criteria:
- All 5 modules populated — no empty sections.
- Output structure matches
output-report-template.md exactly (section names, table columns).
- Recommendations are specific (name actual things to do, not vague advice).
- No factual errors in what the agent states about the synthetic data (catches hallucination).
- Self-check flag is
false (agent didn't give up).
Your time: ~2h — read the full output, score against criteria, write up issues as bulleted feedback for Trung to feed back to the agent via system prompt refinement.
T2 — Day 22 — Real low-risk client run
Jake picks a real client (likely Shamrock Detailing or Procam Detailing). Trung runs the agent end-to-end. You review the output as if you were about to send it to the client.
Pass criteria: all T1 criteria + the "would I send this to the client after light editing?" test. Light editing = grammar, tone smoothing, maybe adding 1-2 client-specific notes. Not: rewriting sections, adding missing analysis, fixing factual errors.
Your time: ~2h — side-by-side compare with the client's last manual audit (if one exists), score, write a go/no-go with specific issues.
6 Logistics & handoff
| What | Where it lives | Who sees it |
system-prompt.md | ClickUp task attachment → Trung downloads | You write, Trung deploys, Jake reviews |
gbp-44-point-rubric.json | Same | You write, Senior TL reviews weights, Trung mounts |
output-report-template.md | Same | You + Senior TL write, Jake reviews |
| T1 / T2 feedback | ClickUp comment thread on the pilot task | You, Trung, Jake |
Your ClickUp tasks
- SNAV-SENT-001 — System prompt authoring (due Day 10) — assigned to you
- SNAV-SENT-002 — 44-point GBP rubric (due Day 11) — assigned to you, Senior TL reviewer
- SNAV-SENT-003 — Output report template (due Day 12) — assigned to you + Senior TL
- SNAV-SENT-004 — T1 validation (due Day 18) — assigned to you
- SNAV-SENT-005 — T2 validation (due Day 22) — assigned to you
When you hit a blocker
Post in #seo-automation Slack with @jake @trung. Examples of real blockers:
- Scoring a GBP point requires data we can't get from the Apify scraper → we need Trung to extend the scraper or drop the point.
- The draft system prompt references a skill file that doesn't exist yet → flag it, Senior TL writes the skill file or Trung stubs one.
- 44 points doesn't divide cleanly — we naturally have 38 or 52 → rename it and pick the right number; the "44" branding is not load-bearing.
🏁 Success for you, on Day 22
The agent has shipped its first real-client audit. Client received a report that was quality-equal-or-better than last month's manual version. You spent ~13 hours over two weeks writing the system prompt + rubric + template, plus ~4 hours validating T1 + T2. Instead of personally running 3-4 hours of manual audit work on every new client, you now review the agent's output in 30-60 minutes and make strategic calls the agent can't.
That's the shape of "SEO Lead" in the AI-native agency. Not less work, different work — higher-leverage judgment over lower-leverage execution.