Keeping Anthropic's 1-hour prompt cache when you use the Vercel AI Gateway

If you’re using Anthropic’s prompt caching with the 1-hour TTL and you route your requests through the Vercel AI Gateway, you may notice your cache stops lasting an hour. Nothing errors, and nothing in your logs changes. The cache just starts expiring after five minutes, and your bill goes up.

You’re not imagining it. Here’s why it happens and what to do about it.

A quick refresher on the two cache lifetimes

Prompt caching lets you mark a stable part of your prompt — a system prompt, a long document, a tool list — so Anthropic stores it and serves it back at about 10% of the normal input price. You get to pick how long it lives:

  • 5 minutes (the default). Cheap to write, but it expires quickly.
  • 1 hour. Costs a bit more to write, but it survives the gaps between requests.

For anything a person is interacting with, the 1-hour cache is usually the one you want. Someone who comes back after a minute of reading still lands on a warm cache instead of paying full price to reprocess a large prompt. Here’s how you ask for it with the Anthropic provider:

import { anthropic } from '@ai-sdk/anthropic';
import { generateText } from 'ai';

const result = await generateText({
  model: anthropic('claude-opus-4-8'),
  messages: [
    {
      role: 'system',
      content: BIG_SYSTEM_PROMPT,
      providerOptions: {
        anthropic: {
          cacheControl: { type: 'ephemeral', ttl: '1h' }, // '5m' | '1h'
        },
      },
    },
    { role: 'user', content: userQuestion },
  ],
});

That ttl: '1h' relies on an Anthropic beta header, and the SDK sends it for you.

What the Gateway does to it

When you move to the Gateway, you turn caching on like this:

const result = await generateText({
  model: 'anthropic/claude-opus-4-8', // now through the Gateway
  providerOptions: {
    gateway: { caching: 'auto' },
  },
  messages: [/* ... */],
});

The Gateway does cache. caching: 'auto' adds a cache breakpoint to your prompt, and it works. The catch is that it only gives you the default 5-minute cache. There’s no option to set the 1-hour TTL, and the docs don’t mention one. So if you were relying on the 1-hour cache, you’ve quietly lost it.

How to check

You don’t have to take my word for it. The token counts come back in the response metadata:

const { providerMetadata } = await generateText({ /* ... */ });

console.log(providerMetadata?.anthropic);
// { cacheCreationInputTokens, cacheReadInputTokens, ... }

Send a request, wait six minutes, then send the same request again and look at the second response. If the cache is still alive, you’ll see cacheReadInputTokens. If it expired, you’ll see cacheCreationInputTokens instead, which means you paid to write the cache all over again.

The fix

Don’t make the Gateway your default. If the 1-hour cache matters to your costs, send your requests straight to Anthropic, where the full TTL is available, and only fall back to the Gateway or another provider when Anthropic is having problems.

The ai-fallback package makes this easy. You give it a list of models, and if the first one fails it moves to the next:

import { createFallback } from 'ai-fallback';
import { anthropic } from '@ai-sdk/anthropic';
import { gateway } from '@ai-sdk/gateway';
import { vertexAnthropic } from '@ai-sdk/google-vertex/anthropic';
import { generateText } from 'ai';

const model = createFallback({
  models: [
    anthropic('claude-opus-4-8'),         // primary: full 1-hour cache
    gateway('anthropic/claude-opus-4-8'), // if Anthropic is down
    vertexAnthropic('claude-opus-4-8'),   // last resort
  ],
  onError: (error, modelId) => console.error(`failed over from ${modelId}:`, error),
  modelResetInterval: 60_000,             // go back to the primary after a minute
});

const result = await generateText({
  model,
  messages: [
    {
      role: 'system',
      content: BIG_SYSTEM_PROMPT,
      providerOptions: {
        anthropic: { cacheControl: { type: 'ephemeral', ttl: '1h' } },
      },
    },
    { role: 'user', content: userQuestion },
  ],
});

ai-fallback falls back on the errors you’d expect — rate limits, server errors, and auth failures — and modelResetInterval brings you back to the primary once it recovers, so a short outage doesn’t keep you on a slower path for the rest of the day.

The idea is simple. Your normal requests keep the 1-hour cache, and you only accept a shorter cache when something has already gone wrong and you’ve failed over.

Two things to keep in mind

The fallback providers don’t keep the 1-hour cache either. Claude on Vertex through the AI SDK is limited to the 5-minute cache as well, because Vertex rejects the header the 1-hour cache needs. That’s fine, because those paths are there for when Anthropic is down, not for everyday traffic.

And none of this is officially documented, so it can change. I’m describing how the SDK and Gateway behave today, based on our own usage logs. Run the six-minute check on your own setup before you build around it, and again whenever you update the packages.