Javlon Baxtiyorov
← Writing

AWS shipped agents that fix your CVEs. The hard part isn't the autonomy.

At AWS Summit New York 2026, Amazon unveiled agents that triage email and remediate security risks while trying to keep humans in control. The interesting engineering is in the guardrails, not the autonomy — and that's the part most teams skip.

AWS shipped agents that fix your CVEs. The hard part isn't the autonomy.
Photo by Cory M. Grenier · CC BY-SA 2.0

At AWS Summit New York 2026, Amazon unveiled a slate of AI agents that can do real work — from remediating security vulnerabilities to triaging email — and framed the whole thing around a single tension: how to maximize autonomy while keeping humans in control of how much the AI does on its own.

That framing is the most honest thing in the announcement, and it's worth dwelling on, because it's where most agent projects quietly fail.

What actually shipped

The launches cluster into infrastructure and guardrails:

  • Amazon Bedrock AgentCore — managed knowledge bases for enterprise RAG, an Agentic Retriever for multi-step queries, and web search with zero data egress from the customer's secured environment.
  • AWS Continuum — an AI-native security service that continuously discovers, prioritizes, validates, and remediates risks.
  • AWS Context — a knowledge graph so agents know where to get the right information before they act.
  • Strands Agents — a toolkit for production agents, now with better context management in the Harness SDK, an isolated execution environment (Strands Shell), and chaos testing and red teaming in Strands Evals.

The headline writes itself: agents that fix vulnerabilities. But the announcement I'd actually budget engineering time around is the last one.

The model was never the risky part

After building AI automation into production systems, I've come to a blunt conclusion: the model is rarely what hurts you. The unsupervised action is. A model that's confidently wrong while drafting text is a quality problem. A model that's confidently wrong while it has permission to change a security group, delete a record, or email a customer is an incident.

That's why the unglamorous pieces — Strands Shell for isolated execution, Strands Evals for chaos testing and red teaming, the explicit human-in-the-loop controls — matter more than the autonomy demo. They're the difference between an agent you can put in front of real workload and a demo you can't.

The guardrails I won't ship an agent without

Independent of vendor, this is the checklist:

  • Scoped permissions. The agent gets the narrowest set of actions the task requires — never the operator's full keys. An agent that can read everything and change one thing is a fundamentally safer design than the reverse.
  • Dry-run and approval gates. High-consequence actions propose first and execute only on approval. "Maximize autonomy, keep humans in control" in practice means deciding, per action, which side of that line it falls on.
  • An audit trail you can replay. Every decision and action logged, reversible where possible. If you can't reconstruct why the agent did something, you can't trust it with anything that matters.
  • Adversarial evals before production. Red-team the agent the way Strands Evals suggests — try to make it do the wrong thing, on purpose, before a user does it by accident.

The takeaway

AWS reframing autonomy as a dial rather than a switch is the right mental model, and it's quietly the most important thing in the announcement. The teams that win with agents in 2026 won't be the ones who let them do the most. They'll be the ones who made every consequential action observable, scoped, and reversible — and then turned the dial up only as far as the guardrails earned.


Sources: GeekWire — Amazon's new AI agents · About Amazon — AWS Summit NYC 2026 · AWS — top announcements, NY Summit 2026


← All writing Get in touch →