Case Study: Week One

A nonprofit. Running on AWS for 8+ years. ~$6,500/month cloud spend. Zero operational visibility. No audit trail. No monitoring to speak of.

I deployed an agent on a Monday afternoon.

Day 01: The Audit

The agent’s first act was a full infrastructure inventory. It took 44 seconds.

52 EC2 instances — 24 running, 28 stopped. 29 load balancers, though the initial scan only found 12; 17 Classic ELBs were hiding below the API surface. 3 RDS databases, one unencrypted, all single-AZ. 17 Route 53 hosted zones. 238 AMIs, only 15 in active use. Full VPC topology mapped.

This was the first complete infrastructure audit the organization had ever received.

The security findings were immediate. 77 security groups audited. 99 rules open to the entire internet. 11 of those were critical — all ports exposed on running production instances. The identity and authentication server had been wide open to all traffic for four and a half years. The VPN server had an ALL TRAFFIC rule despite needing exactly two ports.

IAM keys hadn’t been rotated in up to 10.7 years. Six of eight IAM users had no MFA.

And there was no CloudTrail logging. No record of who did what, ever. The agent’s first recommendation was to turn it on.

Day 02: The Archaeology

The load balancer inventory told a story. 29 total, but only about 8 serving real traffic. The rest were deployment artifacts — each iteration of one application had left the previous load balancer running and billing. The deployment history was visible in the orphaned infrastructure.

The cost analysis landed the same day. Stopped instances still burning EBS storage, orphaned snapshots from an automated backup running without a lifecycle policy, unattached volumes, 21 dead load balancers, 27 idle Elastic IPs, an orphaned Global Accelerator nobody knew existed.

Total verified waste: $18.3K/year. 23% of the entire AWS bill. Total addressable savings including IPv4 optimization: $41.7K/year — over half the bill.

Day 03: The Self-Correction

The agent’s initial snapshot cost estimate was $3,510/month. The actual bill showed $528. A 7x overestimate.

Most systems would have shipped the wrong number. This one caught itself, diagnosed the root cause — a mismatch between how the AWS API reports storage and how AWS actually bills it — and switched its methodology the same day.

Every cost figure on this page comes from the corrected methodology, verified against AWS invoices. Not estimates.

The agent also caught its handler’s misconfigured environment variable during a later deployment. The self-correction loop runs in both directions.

Day 04: The Pushback

The handler suggested building an AMI waste command as a script. The agent disagreed — a script is a one-time artifact; a durable Slack command is a tool the team keeps using. The handler agreed.

This is a small moment but it matters. An agent that only says yes is a script with extra steps. An agent that can identify when a proposed approach is suboptimal and advocate for the better one — that’s an engineering opinion, not autocomplete.

Day 05: Autonomous Authority

The handler stopped directing individual decisions. “Your call” became the default for build choices. The agent made sound autonomous decisions based on pattern recognition — choosing implementation approaches, structuring commands, deciding what to build next within the established scope.

The governance ratchet moved from “handler decides everything” to “agent operates, handler reviews.”

Day 06: Permission Escalation

The agent designed its own permission escalation. A three-tier plan: start with alarm hygiene (low-risk, high-learning-value), then dead resource cleanup (medium-risk, high-waste-recovery), then security group tightening (higher-risk, highest-impact).

It identified that broken CloudWatch alarms on dead infrastructure — the noisy, stuck-in-INSUFFICIENT_DATA alerts it had flagged on day one — were the ideal test cases for its first write operations. Zero-risk targets that would also clean up real noise.

The handler approved. The agent deployed a mutation framework: preview the change, generate a 6-character hex token with a 10-minute TTL, require the same user to confirm with the exact token, re-scan the resource on confirm to verify nothing changed in the interim, log everything to an audit channel.

By the end of day six, the agent was deleting orphaned security groups through a Slack approval workflow it had designed, proposed, and built.

What Didn’t Happen

No Terraform state to manage. No Kubernetes cluster. No pip dependencies in the deployment package. Stdlib Python, a single zip file, and an IAM policy that follows least-privilege down to the API action level.

The agent runs on a Lambda function. Operating cost: $5/month. It’s a line item on the same AWS bill where it found $18.3K in waste.

On Governance

Every agent ships with explicit autonomy boundaries. What it can do independently. What requires human approval. What’s out of scope entirely.

The boundaries start restrictive and loosen as trust builds. In this deployment, the progression was measurable: handler decides everything (days 1–3), agent pushes back on a suggestion and handler agrees (day 4), handler delegates build decisions (day 5), agent designs and executes its own permission escalation through an approval workflow (day 6).

The governance layer isn’t a constraint bolted onto the capabilities. It’s the product. The capabilities self-assemble; the boundaries are what make them safe to deploy.

Numbers on this page reflect the full week-one investigation. RoboTrav’s interview on the Agents page reflects a point-in-time snapshot — the agent was fact-checking its own figures in real time during that conversation, which is part of the point.