Keeping an Agent Safe in Production
The practical controls that let you stop watching an agent every second.
Keeping an Agent Safe in Production
Safety is what earns an agent the right to run without a babysitter.
Defense in layers
- Tool scope is your strongest control: if the agent cannot call a dangerous tool, no clever prompt makes it.
- Approval gates for anything irreversible or outward-facing (sending, spending, deleting).
- Reversible by default โ "draft" over "send," "archive" over "delete."
Guard the inputs
Prompt injection hides instructions inside the data your agent reads โ a web page, an email, a PDF. Treat all retrieved content as untrusted, and never let it change what tools are allowed.
Watch the right signals
- Log every tool call with its arguments.
- Alert on spend, volume, and error spikes.
- Keep a kill switch you can hit without a deploy.
Make the worst case boring, and you can finally look away.