Payments are in test mode. Use card 4242 4242 4242 4242 with any future expiry & CVC.
Knowledge hub
Safetyยท6 min read

Guardrails: What Your Agent Must Never Do

Capability without limits is a liability. A practical layering of guardrails that doesn't neuter the agent.

๐•inf@

Guardrails: What Your Agent Must Never Do

An agent you can't trust unsupervised isn't saving you time โ€” it's adding a review job. Guardrails are how you earn the right to look away.

Three layers

  • Prompt-level โ€” explicit "never" rules in the system prompt. Cheap, first line, not sufficient alone.
  • Tool-level โ€” the strongest layer. If the agent can't call delete_customer, no prompt injection can make it. Scope tools tightly.
  • Approval gates โ€” for irreversible or outward-facing actions (sending email, spending money, deleting data), require a human confirm.

Default to reversible

Design tools so mistakes are recoverable. "Draft" instead of "send." "Archive" instead of "delete." Most of safety is making the worst case boring.

Watch the inputs, not just the outputs

Prompt injection hides instructions inside the data your agent reads โ€” a web page, an email, a PDF. Treat retrieved content as untrusted. Never let fetched text silently change what tools the agent is allowed to use.

A good guardrail is invisible when things go right and decisive when they go wrong.

Found this useful? Share it.

๐•inf@