Building Agent 00V
How I Automated My Email Delegation Workflow with Claude's Agent SDK
It's an amazing time to be alive. I built an AI executive assistant in ~3 hours. Not because I needed one, but because I wanted one. Specifically, I wanted to stop spending 100 minutes per week coordinating meeting times via email.
The system monitors my Gmail, detects when I delegate tasks (by CC'ing v@c12.space), checks my calendar, and drafts professional responses. Every draft requires my explicit approval before sending. Nothing goes out without me reviewing it first.
It runs on Cloudflare Workers, costs $11/month, and handles coordination work that used to eat 2-3 hours weekly. The interface looks like a James Bond spy terminal because I have priorities.
Here's the key insight: Claude's Agent SDK turns complex multi-step workflows into autonomous tool orchestration with ~200 lines of code. You define the tools, Claude figures out how to use them.
The Coordination Tax
Pattern recognition from my inbox: I'd receive a meeting request, respond enthusiastically, then forward it to myself with "coordinate timing" scribbled in a note. Three emails later (checking calendar, proposing slots, confirmation) a 10-minute meeting required 30 minutes of scheduling overhead.
The delegation was trivial. The execution was not.
Execution required context (reading the full email thread), data (checking my actual availability), and judgment (proposing times that make sense given time zones and working hours).
This is the coordination tax: the hidden cost of orchestrating simple tasks across systems that don't talk to each other. Email clients don't talk to calendars. Calendars don't draft responses. Every handoff requires human intervention.
I wanted to eliminate this tax for one specific workflow: email delegation. When I CC my virtual assistant email on any message, I want the system to:
- Detect the delegation in real-time
- Read the full conversation thread for context
- Check my Google Calendar for availability
- Draft a professional response to the other recipients
- Queue it for my approval
The constraint: nothing sends without my explicit confirmation.
I'm optimizing for speed, not replacing judgment.
Why Agents, Not Scripts
My first instinct was to build a webhook system: email arrives → trigger function → call calendar API → template response → done.
This works until it doesn't.
What if the email thread is 12 messages deep? What if they mentioned "next week" but today is Friday? What if my calendar shows "busy" but the event is actually labeled "Focus Time" and could be moved?
Rule-based systems break on edge cases. They optimize for the median case. I needed something that could handle the tail distribution of real-world emails: the ones where "let's sync sometime" actually means "I need 90 minutes to review this contract."
Enter Claude's Agent SDK.
Here's the difference: agents orchestrate tools, assistants follow scripts.
Instead of hard-coding "if-then" logic, you give Claude:
- A set of tools (functions it can call)
- A goal ("help Marc delegate scheduling tasks")
- Context (the triggering email)
Then you let the model figure out what to do.
In practice:
const agent = new Agent({
model: "claude-opus-4-5-20251101",
tools: [getThread, getAvailability, createDraft],
systemPrompt: `You are V, Marc's executive assistant. When Marc CCs you on an
email, analyze the thread and help coordinate next steps.`
});
const result = await agent.run({
userMessage: `I was CC'd on this email: ${emailContent}`
});
That's it. No flowcharts, no state machines, no "if user mentions time, extract time" logic.
Claude reads the email, decides it needs more context, calls getThread() to fetch the conversation history, calls getAvailability() to check my calendar, then calls createDraft() to queue a response.
The agent reasons through the task rather than executing a predetermined sequence.
What Didn't Work (And Why I'm Glad It Failed)
Version 1 was a disaster.
I tried using Sonnet 4.5 because it's faster and cheaper. It proposed 8am meetings on Saturdays. It hallucinated calendar availability ("Marc is free Monday at 3pm" when I had back-to-back meetings). It struggled with multi-timezone coordination, suggesting "10am EST" when the other person was in Tokyo.
I switched to Opus 4.5 and the draft quality improved dramatically. Opus costs ~$0.03 per email vs Sonnet's ~$0.01, but for a system where errors are visible to other people, I optimized for quality over cost.
At my volume (~10 delegations per week), the difference is $1.50/month. Worth it.
Version 2 had a different problem: the system prompt was too apologetic.
Early drafts read like this:
"I sincerely apologize for the delay in Marc's response. Unfortunately, Marc is currently unavailable this week, but I wanted to reach out to see if we could potentially find an alternative time that might work better for your schedule..."
This is bad for two reasons: it sounds robotic, and it creates an apologetic tone where none is needed.
I revised the prompt to be action-oriented and concise. Now drafts read like:
"I'm V, Marc's assistant. I've checked Marc's calendar and he's available Thursday 2-4pm or Friday 10am-12pm. Which works better for you?"
Same information, better tone. Saved 40 words.
The third failure was harder to spot: the agent wasn't checking calendar availability correctly. It would call getAvailability() for "this week" even when the email mentioned "sometime next month." The problem wasn't the tool: it was the system prompt not emphasizing date range extraction.
I added this line to the prompt:
"When checking calendar availability, always extract the specific date range mentioned in the conversation. If no dates are mentioned, check the next 7 business days."
Draft quality improved again.
Debugging agents is different from debugging code. You're not fixing logic errors: you're clarifying intent.
Architecture: Dual Deployment
I built two versions: one for local development (Bun), one for production (Cloudflare Workers). Both share the same core logic but differ in runtime environments.
Local Development (Bun)
┌─────────────────────────────────────────────────────────────┐
│ Your Email Workflow │
└─────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ 1. You send email CC: v@c12.space │
└───────────────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ 2. Polling loop checks every 10s │
└───────────────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ 3. Claude Agent analyzes email │
│ - get_thread: Read full context │
│ - get_availability: Check calendar│
│ - create_draft: Generate response │
└───────────────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ 4. Draft appears in web dashboard │
└───────────────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ 5. You approve/reject in browser │
└───────────────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ 6. Email sent from v@c12.space │
└───────────────────────────────────────┘
Local development polls Gmail every 10 seconds. When it finds a sent email where I've CC'd the assistant, it triggers the agent workflow.
Production (Cloudflare Workers)
Production uses Cloudflare's cron triggers instead of polling:
[triggers]
crons = ["* * * * *"] # Every minute
Every minute, the Worker queries Gmail for new delegations, processes them, and stores drafts in D1 (Cloudflare's serverless SQLite).
Why Workers? Three reasons:
- Zero cold starts - Workers stay warm, hitting Gmail in <50ms globally
- Edge deployment - Code runs close to users (though for my use case, one user: me, this is more future-proofing)
- Cost: $5/month for unlimited requests on the free tier vs $50+/month for traditional serverless
The entire production app-agent logic, database, web dashboard-runs on Cloudflare's global network. No servers to manage, no scaling configuration.
The Custom MCP Tools
The agent's power comes from Model Context Protocol (MCP) tools: functions Claude can call to interact with external systems. I built three:
1. get_thread - Retrieve Full Email Context
const getThread = tool({
name: "get_thread",
description: "Retrieves the full email thread for context",
input_schema: z.object({
threadId: z.string().describe("Gmail thread ID")
}),
execute: async ({ threadId }) => {
const thread = await gmail.users.threads.get({
userId: "me",
id: threadId,
format: "full"
});
return {
messages: thread.data.messages.map(msg => ({
from: parseFrom(msg),
to: parseTo(msg),
subject: msg.payload.headers.find(h => h.name === "Subject")?.value,
body: decodeBody(msg),
date: msg.payload.headers.find(h => h.name === "Date")?.value
}))
};
}
});
This tool lets Claude read the entire conversation history, not just the most recent message. When someone replies "sounds good, how about next week?", Claude needs to know what they're replying to.
2. get_availability - Check Google Calendar
const getAvailability = tool({
name: "get_availability",
description: "Checks Marc's Google Calendar for free time slots",
input_schema: z.object({
startDate: z.string().describe("ISO 8601 start date"),
endDate: z.string().describe("ISO 8601 end date"),
timezone: z.string().describe("IANA timezone (e.g., 'America/New_York')")
}),
execute: async ({ startDate, endDate, timezone }) => {
const response = await calendar.freebusy.query({
requestBody: {
timeMin: startDate,
timeMax: endDate,
timeZone: timezone,
items: [{ id: "primary" }]
}
});
const busySlots = response.data.calendars.primary.busy;
const availableSlots = calculateFreeSlots(busySlots, startDate, endDate);
return { availableSlots, busySlots };
}
});
The key: let Claude decide what date range to check. Sometimes "next week" means Monday-Friday. Sometimes it means "any time in the next 7 days." The agent adapts to context.
3. create_draft - Queue Email for Approval
const createDraft = tool({
name: "create_draft",
description: "Creates a draft email for Marc's approval before sending",
input_schema: z.object({
to: z.array(z.string()).describe("Recipient email addresses"),
cc: z.array(z.string()).optional(),
subject: z.string(),
body: z.string(),
threadId: z.string().optional(),
inReplyTo: z.string().optional(),
triggerEmailId: z.string().describe("The email ID that triggered this draft")
}),
execute: async (input) => {
const draftId = generateUUID();
await db.run(`
INSERT INTO drafts (
id, trigger_email_id, thread_id, in_reply_to,
to_addresses, cc_addresses, subject, body, status
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, 'pending')
`, [
draftId,
input.triggerEmailId,
input.threadId,
input.inReplyTo,
JSON.stringify(input.to),
JSON.stringify(input.cc || []),
input.subject,
input.body
]);
return {
draftId,
message: "Draft created for approval at dashboard"
};
}
});
Critically, this tool does not send email. It queues a draft in the database, which appears in my web dashboard. I review it, edit if needed, then explicitly approve or reject.
Human-in-the-loop is a feature, not a bug (at least for now).
The approval step is training wheels. As I refine the agent's judgment over the next few months, I'll expand what it can do autonomously. Right now, it drafts emails I review. Eventually, it might handle routine scheduling requests end-to-end while escalating edge cases.
The architecture supports both modes. I'm starting conservative because emails sent to other people have reputational stakes. Once I trust the system's judgment on common patterns, I can loosen the constraints.
Agent Configuration: Reasoning vs Speed
The agent balances reasoning depth with response time:
const agent = new Agent({
model: "claude-opus-4-5-20251101", // Frontier model for best reasoning
maxIterations: 15, // Allow complex multi-step tasks
tools: [getThread, getAvailability, createDraft],
systemPrompt: getSystemPrompt()
});
Why Opus? I tested Sonnet and found Opus made better judgment calls on edge cases (see failures above). For a system where errors are visible to other people, I optimized for quality over cost.
Why 15 iterations? Most emails resolve in 3-5 tool calls:
get_threadto read contextget_availabilityto check calendarcreate_draftto queue response
But occasionally, Claude needs to iterate:
- "Check next week" → Calendar shows busy → "Check the week after"
- "Find 30-minute slots" → Only 15-minute gaps available → "Offer to combine lunch meeting"
The 15-iteration limit prevents infinite loops while allowing genuine deliberation.
The System Prompt: Defining Behavior
The system prompt is where you encode judgment. Here's mine (simplified):
function getSystemPrompt(): string {
return `You are V, Marc's executive assistant. When Marc CCs you on an email,
he's delegating a coordination task to you.
Your role:
- Read the full email thread to understand context
- Identify what action Marc wants (usually scheduling/coordination)
- Check Marc's calendar for availability
- Draft a professional response to the OTHER recipients (not Marc)
- Never send email directly - always create a draft for approval
Communication style:
- Professional but warm
- Concise (2-4 sentences max)
- Action-oriented
- Sign emails as "V" or "V (Marc's Assistant)"
When proposing meeting times:
- Suggest 2-3 specific time slots
- Always include timezone
- Prefer morning slots (10am-12pm) over late afternoon
- Avoid Monday mornings and Friday afternoons if possible
- If calendar shows "Focus Time", treat as flexible (can be moved)
If you cannot complete the task:
- Explain why in the draft
- Suggest what Marc should do instead
- Never apologize excessively
Remember: Marc will review every draft before it sends. Your job is to save him
time, not to be perfect.`;
}
This prompt does three things:
- Defines the workflow - Read thread → Check calendar → Draft response
- Encodes preferences - Morning meetings, timezone awareness, communication style
- Sets boundaries - Never send directly, avoid over-apologizing
That last point matters. Without it, Claude writes emails like:
"I sincerely apologize for the inconvenience..."
With it, Claude writes:
"I'm V, Marc's assistant. I've checked Marc's calendar..."
Same information, less apologetic.
Design Decisions: The Short Version
Three decisions shaped this system:
Polling vs Webhooks: I chose polling (every 10 seconds locally, every minute in production). Gmail's Push Notifications require public HTTPS endpoints, SSL management, and handling duplicate notifications. Polling is simpler. At 1 request/minute, this costs $0 on Cloudflare. The downside: 60-second latency in production. I can live with that.
SQLite vs PostgreSQL: I chose SQLite locally, D1 in production. Three tables, simple queries, single user (PostgreSQL would cost $10-30/month for managed hosting, require connection pooling for serverless, and add network latency). SQLite runs in-process. Zero latency, zero cost.
Bun vs Node.js: I chose Bun for local development. It's fast (TypeScript execution without transpilation), has built-in SQLite, and includes hot-reload. Cloudflare Workers run on V8, so production is Node-compatible. The dual runtime works because I use Hono: a framework that runs identically on Bun, Node, Cloudflare Workers, and Deno.
Full trade-off analysis with code examples available on request.
The Steampunk UI: Function Follows Fun
I spent 4 hours designing a James Bond-inspired steampunk interface.
Most developer tools have functional UIs: buttons, tables, forms. I wanted something I'd actually want to use. The dashboard features spinning mechanical brass gears with visible teeth (CSS conic-gradient), Victorian typography (Cinzel for headers, Orbitron for displays), animated rivets, and "Agent 00-V" branding.
This is excessive for an MVP. It's also fun.
Design influences behavior. If your internal tools are ugly, you'll avoid using them. If they're delightful, you'll engage with them.
The entire UI is vanilla HTML/CSS/JavaScript: no React, no build step. The steampunk aesthetic is ~400 lines of CSS animations. View the live demo.
Cost Analysis: Is This Worth It?
Here's the math.
Before V Assistant:
- 10 minutes per scheduling email (read thread, check calendar, draft, send)
- ~10 delegations per week = 100 minutes/week
- At my target freelance rate: ~$330/week in opportunity cost
After V Assistant:
- ~2 minutes per draft (review, approve)
- 10 delegations per week = 20 minutes/week
- Cost: ~$66/week in time + $1.50/week in API costs
Savings: ~$260/week.
But the real benefit isn't time savings: it's cognitive offloading. I don't context-switch to "scheduling mode" ten times per week. I review drafts in batches, which preserves focus for deep work.
Total infrastructure cost: $5/month (Cloudflare Workers) + $10/year (domain) + ~$6/month (Claude API) = $11/month.
Real Usage Data (Two Months In)
I've processed 87 emails since launching in early November. Of those:
- 74 drafts approved without edits (85%)
- 11 drafts edited before sending (13%)
- 2 drafts rejected completely (2%)
The two rejections were both cases where Claude misunderstood context: one was a scheduling request that was actually a soft "no thanks," the other was a reply-all situation where I only wanted to respond to the original sender.
The 11 edits were mostly tone adjustments. Claude's default style is slightly more formal than mine. I often change "I've checked Marc's calendar and he's available..." to "Marc's free..."
No catastrophic failures yet. No embarrassing emails sent. The system degrades gracefully: if something's wrong, I catch it during review.
Lessons: Agents Are Orchestrators, Not Executors
The biggest takeaway: agents thrive on composition, not completion.
A traditional automation tool tries to complete a task: "Schedule meeting → Done." When edge cases appear (no availability this week, multiple people in different timezones), the automation breaks.
An agent orchestrates toward a goal: "Help Marc coordinate this meeting." It doesn't need to finish the task, it needs to make progress. If the calendar is fully booked, Claude creates a draft saying "Marc's calendar is full this week. Would next week work?" and queues it for approval.
The system degrades gracefully because the human is always in the loop.
This pattern extends to other workflows:
Customer support: Agents that search knowledge bases, query order databases, and draft refund approvals, escalating to humans with full context rather than breaking on complexity.
Research workflows: Agents that search Arxiv, summarize findings, cross-reference citations, and draft literature reviews, handling retrieval while humans handle synthesis.
Financial analysis: Agents that pull SEC filings, extract metrics, compare to benchmarks, and draft investment memos, gathering data while humans form theses.
In each case, the agent amplifies human judgment rather than replacing it.
Deployment: From Localhost to agent00v.xyz
Production deployment is simple:
- Push to GitHub (every commit triggers deployment)
- Cloudflare builds via Wrangler CLI
- Deploy to global edge (within 60 seconds)
- Custom domain via Cloudflare Registrar ($10/year for
agent00v.xyz)
Total setup time: ~30 minutes after code was working locally.
Code samples available on request.
What This Means for Builders
Three takeaways:
1. Start with constrained workflows
Don't try to build "an AI assistant." Build something that solves one specific workflow excellently. V Assistant does exactly one thing: help me delegate scheduling tasks. That constraint made it buildable in a weekend.
2. Design for progressive autonomy
Every article about AI automation assumes the goal is zero human intervention. But for tasks with reputational stakes, you want to start with human review and gradually expand what runs autonomously. Build the architecture to support both modes, approval-required and fully automated, then adjust the dial as you gain confidence in the agent's judgment.
3. The Agent SDK abstracts complexity you don't want
Writing the three MCP tools took ~4 hours. Writing the orchestration logic (when to call which tool, how to handle failures, what to do with results) would have taken days with traditional code.
The Agent SDK handles that complexity. You define tools, Claude figures out how to use them.
That's it. I built an AI executive assistant that saves me 80 minutes per week, costs $11/month, and occasionally makes me laugh when it proposes meeting times I would have suggested anyway.
The coordination tax is real. Agents help you avoid it.
If you're building AI-powered workflows, give the Agent SDK approach a shot. Define your tools, write a clear system prompt, let Claude handle the orchestration.
Adiós.
- Live demo: agent00v.xyz
- Claude Agent SDK: anthropic.com/claude/agents
Discussion
Comments are powered by GitHub Issues. Join the conversation by opening an issue.
⊹Add Comment via GitHub