AI agents (and other non-human identities) are quickly becoming first-class actors in enterprise environments: they call APIs, read and write data, trigger workflows, and sometimes even approve or execute changes.
That’s powerful—and it changes the security math.
In a traditional “human user + browser” model, you can often rely on:
- short, interactive sessions
- user presence signals (MFA prompts, step-up)
- device signals (managed endpoint posture)
- relatively constrained token storage (browser cookie jar / OS keychain)
Agents break those assumptions. They run unattended. They get deployed to servers and CI systems. They may execute from ephemeral containers. They may need access continuously. And the most common way teams accidentally turn an agent into a persistent, stealthy attacker is by giving it long-lived refresh tokens (or equivalents) without lifecycle controls.
A recent security write-up on refresh token abuse and defensive patterns helped re-surface the core issue: refresh tokens are essentially durable credentials. If you treat them like “just another token,” you’ll design the wrong controls.
This post is a practical guide for IAM and security teams who need to support AI agents while keeping session and token risk under control.
We’ll cover:
- why refresh tokens are uniquely dangerous for agents
- which controls actually reduce blast radius (and which don’t)
- how to implement “agent-safe” token patterns across common IdPs and cloud stacks
- how to use event-driven session revocation (CAEP) and identity-sharing standards (SSF) to scale response
Along the way, we’ll anchor to a few Learn IAM topics you may want as background:
- OAuth and OIDC fundamentals: https://learn-iam.com/topic/access-management/oauth-oidc
- Session management patterns: https://learn-iam.com/topic/access-management/session-management
- Non-human identity (workload and machine identity): https://learn-iam.com/topic/iga/workload-and-machine-identity-management
- SCIM for identity lifecycle (where it fits and where it doesn’t): https://learn-iam.com/topic/iga/scim
The uncomfortable truth: refresh tokens are durable credentials
In OAuth 2.0 / OIDC, the refresh token exists to let a client obtain new access tokens without re-prompting the user.
For humans, that’s mostly a usability feature.
For non-human identities, refresh tokens often become a design crutch:
- “We need the agent to work 24/7”
- “We don’t want to manage certificates”
- “We just need a token we can store in the app config”
So someone runs an interactive auth flow once, copies a refresh token into a secret store, and calls it done.
From an attacker’s perspective, that’s ideal:
- Access tokens expire quickly (minutes).
- A stolen refresh token can mint new access tokens for days/weeks/months.
- Many detection stacks focus on access token use, not refresh token rotation anomalies.
- Revoking access tokens is often ineffective if refresh remains valid.
If you do nothing else after reading this post, do this:
Classify refresh tokens for agent clients as Tier-0 credentials.
They require the same seriousness you apply to:
- privileged API keys
- signing keys
- root cloud credentials
- CI/CD deploy tokens
Agents and humans have different session risk profiles
For humans, your risk is often:
- phishing and prompt bombing
- session hijacking via malware
- token theft from the browser
For agents, risk tends to look like:
- secrets exposure in CI logs or config files
- container image leakage (token baked into image)
- SSRF extracting tokens from metadata endpoints
- lateral movement using the agent’s “always-on” session
- refresh token replay from a different runtime environment
That difference changes which controls matter.
Example: “Require MFA” is meaningful for a human. For an agent, MFA is either impossible (no user present) or gets bypassed via “MFA once, then store refresh forever.”
So the agent control set is less about interactive proof and more about:
- strong client authentication
- secure token storage and access patterns
- rotation and replay detection
- explicit scoping and audience control
- event-driven revocation
The control menu: what you can do about refresh tokens
Here’s a practical comparison of the most common defensive options.
Comparison table: options for controlling refresh token risk
| Control / Pattern | How it helps | What it doesn’t solve | Best fit for AI agents? |
|---|---|---|---|
| Short refresh token lifetime (hours–days) | Reduces exposure window if stolen | Still replayable during lifetime | Yes, but only as a baseline |
| Refresh token rotation (one-time use) | Makes replay detectable; shrinks usefulness of stolen token | Requires good client storage; needs server-side enforcement | Yes, strongly recommended |
| Sender-constrained tokens (mTLS / DPoP) | Binds token to a key so theft alone isn’t enough | Adds operational complexity; key management becomes critical | Often yes (mTLS ideal for workloads) |
| Token binding / certificate-bound sessions | Similar to sender constraint; reduces token replay | Client support varies; may be hard in some SDKs | Sometimes |
| Strong client authentication (private_key_jwt, mTLS) | Makes the client harder to impersonate | Doesn’t prevent token exfil from the client itself | Yes |
| Narrow scopes + audiences | Limits what minted access tokens can do | Doesn’t prevent some misuse | Yes |
| Continuous Access Evaluation (CAEP) | Revokes sessions quickly based on risk events | Requires ecosystem support; not all apps enforce in real time | Yes, for enterprise scale |
| Central secret store (Vault/Key Vault/Secrets Manager) | Keeps tokens out of code and images | Doesn’t fix over-privilege; still needs rotation | Yes, but not sufficient |
| Workload identity federation (no long-lived refresh) | Eliminates refresh tokens and static secrets | Requires cloud/IAM integration and design effort | Ideal when possible |
If you’re building an agent platform, your north star should be: avoid refresh tokens entirely for workloads by using workload identity federation. When you can’t, you need to treat refresh tokens as Tier-0.
Preferred architecture: don’t give agents refresh tokens (use workload identity)
The cleanest answer is to avoid persistent OAuth refresh tokens for agents.
Instead, use workload identity patterns where the runtime proves its identity using platform-native trust, and receives short-lived credentials.
Common examples:
- Kubernetes: service account tokens + OIDC issuer + audience constraints, often paired with a workload identity provider.
- AWS: IAM roles for service accounts (IRSA) or ECS task roles.
- GCP: Workload Identity Federation / GKE Workload Identity.
- Azure: Managed Identities.
- SPIFFE/SPIRE: workload identities via X.509 SVIDs, enabling mTLS and identity-based policy.
In these models, you get:
- no “copy-paste refresh token”
- short-lived credentials by default
- a clearer identity boundary (workload → role/policy)
- easier revocation (disable the workload identity or the trust)
If you want a structured introduction to the non-human identity space, start here:
Concrete implementation pattern: agent in Kubernetes calling a protected API
A robust baseline looks like:
- Agent runs in Kubernetes with a dedicated ServiceAccount.
- Cluster has an OIDC issuer enabled (or uses managed provider identity).
- Your API gateway (or OAuth AS) accepts a workload assertion and issues a short-lived access token.
- Access token is:
- audience-restricted to the API
- short-lived (5–15 minutes)
- sender-constrained via mTLS where possible
- Authorization is enforced in the API using scopes/claims and fine-grained policy (e.g., OPA, Cedar, Zanzibar-style checks).
If you must use OAuth client credentials, use strong client auth (mTLS or private_key_jwt), and keep secrets out of the pod filesystem.
When you do need refresh tokens: design for rotation + sender constraint
Sometimes you’re integrating with SaaS APIs that require a three-legged OAuth flow and provide refresh tokens (e.g., Google Workspace APIs, Microsoft Graph delegated access in certain models, or a third-party SaaS integration).
In those cases, you can still dramatically reduce risk.
1) Use refresh token rotation (and enforce one-time use)
Refresh token rotation means:
- every refresh call returns a new refresh token
- the old refresh token is invalidated
- replay of an old refresh token triggers revocation or risk handling
Many IdPs support this in some form, though the exact behavior differs.
Implementation notes:
- Store the latest refresh token atomically (avoid race conditions with parallel refreshes).
- Treat refresh as a single-writer operation per identity/client.
- Build a “refresh broker” service for agents rather than letting every agent instance refresh independently.
Practical tip:
- If your agent scales horizontally, do not let each replica refresh independently.
- Use a token service that issues short-lived, internal tokens to agents and keeps the real refresh token isolated.
2) Add sender constraint: mTLS or DPoP
A rotated refresh token can still be stolen and used until rotated again.
Sender constraint binds the token to proof-of-possession:
- mTLS-bound tokens: bind to a client certificate.
- DPoP: bind to a public/private key proof in HTTP requests.
For agent workloads, mTLS is often operationally feasible because you control the runtime environment.
Key considerations:
- Put keys in an HSM-backed store when possible (or cloud KMS).
- Rotate keys on a schedule.
- Ensure your TLS termination doesn’t accidentally strip the client identity (common with misconfigured reverse proxies).
3) Make refresh tokens “hard to exfiltrate”
This is the unglamorous part that stops most real-world incidents.
Do:
- store refresh tokens only in Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager
- restrict read access to a minimal runtime identity
- block token values from logs (redaction)
- scan CI/CD pipelines for secret patterns
Don’t:
- put refresh tokens in
.envfiles in repos - bake them into container images
- expose them via debug endpoints
- store them in “shared” secrets buckets accessible by many services
Scoping: tokens should be narrow, boring, and purpose-built
Agents often get “admin” scopes because it’s easier.
That’s how you end up with:
- an agent that can read and delete data
- a token that can call unrelated APIs
- a single compromise that becomes a cross-domain breach
Practical scoping rules:
- One agent, one purpose. If it does two unrelated things, split it.
- One client registration per agent class. Don’t reuse a single OAuth client for every integration.
- Audience restrict access tokens to exactly one resource server.
- Prefer fine-grained authorization at the API layer (ABAC/PBAC) over coarse OAuth scopes alone.
This is a good moment to re-read:
- OAuth & OIDC: https://learn-iam.com/topic/access-management/oauth-oidc
- Session Management: https://learn-iam.com/topic/access-management/session-management
Enterprise reality: you need event-driven revocation (CAEP) for agents
Even with great storage and rotation, refresh tokens can leak.
What matters then is: how quickly can you cut off access across your ecosystem?
That’s where Continuous Access Evaluation Profile (CAEP) comes in.
CAEP is an emerging ecosystem approach where:
- the identity provider emits security events (risk, account disable, device compromise)
- relying parties (apps/APIs) react by invalidating sessions or forcing re-auth
For humans, that can mean “step-up MFA now.”
For agents, it usually means:
- revoke the agent’s session
- invalidate refresh tokens
- block access token minting until the runtime identity is re-attested
CAEP is often discussed alongside the broader family of identity event sharing approaches—where Shared Signals Framework (SSF) provides a standard way to transmit those events across vendors.
Why this matters for agents:
- agents often touch many systems
- the cost of “waiting for token expiry” can be hours
- revocation must propagate beyond a single app
Practical approach today (even without perfect CAEP coverage)
You can implement a CAEP-like posture with:
- centralized token service / broker
- a kill switch (disable agent identity + rotate secrets)
- SIEM-triggered automation (e.g., Splunk SOAR, Cortex XSOAR, Tines)
- IdP risk signals (Okta Risk, Microsoft Entra ID risk events)
The key is: build revocation into the architecture—don’t bolt it on after the first incident.
Recommended lifetimes (starting points) for agent-related tokens
There’s no universal answer, but you can start with sane defaults.
Comparison table: suggested token lifetimes
| Token type | Human interactive app (baseline) | Agent / non-human identity (baseline) | Notes |
|---|---|---|---|
| Access token | 5–15 minutes | 5–10 minutes | Keep short; rely on refresh or re-attestation |
| Refresh token | 7–30 days (with rotation) | 4–24 hours (with rotation) | Shorter for agents; treat as Tier-0 |
| Client assertion (private_key_jwt) | N/A | 1–5 minutes | Mint on-demand, rotate signing key |
| mTLS cert / key | N/A | 7–30 days | Rotate; store in KMS/HSM when possible |
Two important caveats:
- If you can use workload identity federation, you can often eliminate refresh tokens and keep everything short-lived.
- The more privileged the agent, the shorter the refresh window should be.
How to implement this with common products (enterprise-neutral)
Here’s how the above maps to real stacks. The goal is not to endorse a vendor; it’s to highlight the knobs you should look for.
Identity providers / Authorization servers
- Okta (Customer Identity / Workforce): supports OAuth/OIDC patterns, token policies, and event-driven signals in broader identity ecosystems.
- Microsoft Entra ID: integrates strongly with Conditional Access and risk signals; supports managed identities in Azure for workload scenarios.
- Auth0: configurable token lifetimes and refresh rotation patterns for many app types.
- Ping Identity / ForgeRock: often used for complex enterprise federation and policy-driven access, with flexible token services.
- AWS Cognito: common for application identity; also consider native AWS IAM roles for workloads to avoid OAuth refresh.
What to ask your IdP team for:
- refresh rotation behavior (and how replay is handled)
- max refresh lifetime and idle timeout options
- sender-constrained token support (mTLS/DPoP)
- APIs for revocation and session termination
- event hooks / security event export (for CAEP-like integration)
Secret storage
- HashiCorp Vault (strong for dynamic secrets + audit)
- AWS Secrets Manager
- Azure Key Vault
- GCP Secret Manager
Baseline controls:
- strict RBAC for secret read
- audit logs for access
- automated rotation when possible
API gateways / enforcement points
- Kong, Apigee, NGINX, Envoy, AWS API Gateway
What matters:
- enforcing audience/issuer checks
- validating sender constraint if used
- rate limiting and anomaly detection for token mint/refresh
A phased adoption plan (what to do Monday vs what to do this quarter)
Most teams can’t rebuild everything at once. Here’s a phased approach that reduces risk quickly.
Phase 0 (this week): stop the bleeding
- Inventory where refresh tokens are used for automation and agents.
- Identify refresh tokens stored outside approved secret stores.
- Reduce refresh token lifetimes for the highest-privilege clients.
- Ensure tokens are never logged (add redaction now).
Deliverable: a list of agent clients + where their tokens live.
Phase 1 (2–4 weeks): rotation + broker
- Enable refresh token rotation where supported.
- Build a “token broker” service so agent instances don’t directly hold refresh tokens.
- Enforce atomic refresh and eliminate parallel refresh races.
- Add alerting on refresh anomalies (unexpected IPs, geos, user-agents, excessive refresh frequency).
Deliverable: rotation enforced + broker in front of refresh.
Phase 2 (1–2 quarters): sender constraint + workload identity
- Introduce mTLS or DPoP for agent clients.
- Move eligible workloads to managed identity / workload identity federation.
- Adopt SPIFFE/SPIRE where you need portable workload identity across environments.
- Build “kill switch” automation that revokes sessions and rotates secrets when risk triggers.
Deliverable: agent auth is short-lived by default; refresh tokens only exist for legacy SaaS integrations.
Phase 3 (ongoing): SSF/CAEP-driven response at scale
- Standardize security event flows (IdP → apps/APIs → SIEM/SOAR).
- Where available, integrate CAEP-like session revocation to shrink time-to-revoke.
- Create a non-human identity lifecycle program: ownership, reviews, expiration, and decommissioning.
Deliverable: revocation and lifecycle are systemic, not ad hoc.
Checklist: refresh token hygiene for AI agents
Use this as a policy checklist for platform teams.
Architecture
- Prefer workload identity federation over refresh tokens.
- If refresh tokens are required, isolate them behind a token broker.
- One agent purpose per client; no shared “god client.”
Token controls
- Refresh token rotation enabled (one-time use) where supported.
- Refresh token lifetime and idle timeout configured for agent risk.
- Access tokens are audience-restricted and short-lived.
- Sender constraint (mTLS/DPoP) implemented for high-privilege agents.
Storage and handling
- Refresh tokens stored only in approved secret stores.
- Secret access is least-privilege and audited.
- No tokens in logs, crash dumps, or metrics.
Detection and response
- Alert on refresh anomalies and unusual mint patterns.
- Automated revocation playbook exists (disable identity, revoke sessions, rotate keys).
- Security events are shared across systems (SSF/CAEP where possible).
Governance
- Each agent identity has an owner, purpose, and expiration/review date.
- Privileged agents are reviewed like privileged human accounts.
Closing: design for the breach you’ll eventually have
Agents are going to be everywhere. The question isn’t whether you’ll have non-human identities—it’s whether you’ll manage them with the same rigor you apply to privileged access.
Refresh tokens are often the quiet failure mode: they feel like a convenience, but they behave like durable credentials.
If you adopt only three patterns, make them these:
- Prefer workload identity federation (no long-lived refresh tokens).
- Rotate refresh tokens and isolate them behind a broker.
- Build event-driven revocation so you can cut off access quickly.
That combination keeps your agent ecosystem flexible while preventing “one leaked refresh token” from becoming a multi-week incident.