Anthropic 取消了 claude code 的全量 1h ttl 上下文缓存

2026-04-13 12:141阅读0评论SEO基础

内容介绍
文章标签
相关推荐

问题描述：

github.com/anthropics/claude-code

Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation

已打开 01:49AM - 12 Apr 26 UTC 已关闭 10:15AM - 12 Apr 26 UTC seanGSISG bug has repro area:cost api:anthropic

# Cache TTL appears to have silently regressed from 1h to 5m around early March …2026, causing significant quota and cost inflation ## Summary Analysis of raw Claude Code session JSONL files spanning Jan 11 – Apr 11, 2026 shows that Anthropic appears to have **silently changed the prompt cache TTL default from 1 hour to 5 minutes sometime in early March 2026**. Prior to this change, Claude Code was receiving 1-hour TTL cache writes — which we believe was the intended default. The reversion to 5-minute TTL has caused a **20–32% increase in cache creation costs** and a measurable spike in quota consumption for subscription users who have never previously hit their limits. This appears directly related to the behavior described in #45756. --- ## Data Session data extracted from `~/.claude/projects/` JSONL files across **two machines** (Linux workstation + Windows laptop, different accounts/sessions), totaling **119,866 API calls** from Jan 11 – Apr 11, 2026. Each assistant message includes a `usage.cache_creation.ephemeral_5m_input_tokens` / `ephemeral_1h_input_tokens` breakdown that makes the TTL tier per-call observable. Having two independent machines strengthens the signal — both show the same behavioral shift at the same dates. ### Phase breakdown | Phase | Dates | TTL behavior | Evidence | |-------|-------|--------------|----------| | 1 | Jan 11 – Jan 31 | **5m ONLY** | `ephemeral_1h` absent/zero — likely predates 1h tier availability in the API | | 2 | Feb 1 – Mar 5 | **1h ONLY** | `ephemeral_5m = 0`, `ephemeral_1h > 0` across **33+ consecutive days** on both machines — near-zero exceptions | | 3 | Mar 6–7 | **Transition** | First 5m tokens re-appear, small volumes, 1h still present | | 4 | Mar 8 – Apr 11 | **5m dominant** | 5m tokens surge to majority; 1h becomes minority or disappears entirely | We believe Phase 2 represents Anthropic's **intended default behavior** — 1h TTL was rolled out as the Claude Code standard around Feb 1 and held consistently for over a month across two independent machines on two different accounts. January's all-5m data most likely predates the 1h TTL tier being available in the API. The regression began **around March 6–8, 2026**. No client-side changes were made between phases. The same Claude Code version and usage patterns were in place throughout. The TTL tier is set server-side by Anthropic. ### Day-by-day TTL data showing the regression (combined, both machines) ``` Date | 5m-create | 1h-create | Behavior ------------|------------|------------|---------- 2026-02-01 | 0.00M | 1.70M | 1h ONLY ← 1h default begins 2026-02-09 | 0.00M | 7.95M | 1h ONLY 2026-02-15 | 0.00M | 13.61M | 1h ONLY ← heaviest day, 100% 1h 2026-02-28 | 0.00M | 16.15M | 1h ONLY ← 16M tokens, still 100% 1h 2026-03-01 | 0.00M | 0.12M | 1h ONLY 2026-03-04 | 0.00M | 8.12M | 1h ONLY 2026-03-05 | 0.00M | 6.55M | 1h ONLY ← last clean 1h-only day | | | 2026-03-06 | 0.29M | 0.22M | MIXED ← first 5m tokens reappear 2026-03-07 | 4.56M | 0.50M | MIXED ← 5m surging 2026-03-08 | 16.86M | 3.44M | MIXED ← 5m now dominant (83%) 2026-03-10 | 10.55M | 0.51M | MIXED 2026-03-15 | 19.47M | 1.84M | MIXED 2026-03-21 | 21.37M | 1.70M | MIXED ← 93% 5m 2026-03-22 | 13.48M | 2.85M | MIXED ``` The transition is visible to the day: **March 6 is when 5m tokens first reappear** after 33 days of clean 1h-only behavior. By March 8, 5m tokens outnumber 1h by 5:1. This is consistent with a server-side configuration change being rolled out gradually then completing around March 8. --- ## Cost impact Applying official Anthropic pricing (rates.json, updated 2026-04-09): Combined dataset (119,866 API calls, two machines): **claude-sonnet-4-6** (`cache_write_5m = $3.75/MTok`, `cache_write_1h = $6.00/MTok`, `cache_read = $0.30/MTok`): | Month | Calls | Actual cost | Cost with 1h TTL | Overpaid | % waste | |-------|-------|-------------|-----------------|----------|---------| | Jan 2026 | 2,639 | $78.99 | $37.54 | $41.45 | **52.5%** | | Feb 2026 | 27,220 | $1,120.43 | $1,108.11 | $12.32 | **1.1%** ← nearly 0 on 1h | | Mar 2026 | 68,264 | $2,776.11 | $2,057.01 | $719.09 | **25.9%** | | Apr 2026 | 21,743 | $1,193.01 | $1,016.78 | $176.23 | **14.8%** | | **Total** | **119,866** | **$5,561.17** | **$4,612.09** | **$949.08** | **17.1%** | **claude-opus-4-6** (`cache_write_5m = $6.25/MTok`, `cache_write_1h = $10.00/MTok`, `cache_read = $0.50/MTok`): | Month | Calls | Actual cost | Cost with 1h TTL | Overpaid | % waste | |-------|-------|-------------|-----------------|----------|---------| | Jan 2026 | 2,639 | $131.65 | $62.57 | $69.08 | **52.5%** | | Feb 2026 | 27,220 | $1,867.38 | $1,846.85 | $20.53 | **1.1%** ← nearly 0 on 1h | | Mar 2026 | 68,264 | $4,626.84 | $3,428.36 | $1,198.49 | **25.9%** | | Apr 2026 | 21,743 | $1,988.35 | $1,694.64 | $293.71 | **14.8%** | | **Total** | **119,866** | **$9,268.97** | **$7,687.17** | **$1,581.80** | **17.1%** | February — the month Anthropic was defaulting to 1h TTL — shows only **1.1% waste** (trace 5m activity from one machine on one day). Every other month shows 15–53% overpayment from 5m cache re-creations. The cost difference is explained entirely by TTL tier, not by usage volume. The **percentage waste is identical across model tiers** (17.1%) because it is driven purely by the 5m/1h token split, not by per-token price. ### Why 5m TTL is so expensive in practice With 5m TTL, any pause in a session longer than 5 minutes causes the entire cached context to expire. On the next turn, Claude Code must re-upload that context as a fresh `cache_creation` at the write rate, rather than a `cache_read` at the read rate. The write rate is **12.5× more expensive** than the read rate for Sonnet, and the same ratio holds for Opus. For long coding sessions — which are the primary Claude Code use case — this creates a compounding penalty: the longer and more complex your session, the more context you have cached, and the more expensive each cache expiry becomes. Over the 3-month period analyzed: - **220M tokens** were written to the 5m tier - Those same tokens generated **5.7B cache reads** — meaning they were actively being used - Had those 220M tokens been on the 1h tier, re-accesses within the same hour would be reads (~$0.30–0.50/MTok) instead of re-creations (~$3.75–6.25/MTok) --- ## Quota impact Users on Pro/subscription plans are quota-limited, not just cost-limited. Cache creation tokens count toward quota at full rate; cache reads are significantly cheaper (the exact coefficient is under investigation in #45756). The silent reversion to 5m TTL in March is the most likely explanation for why subscription users began hitting their 5-hour quota limits for the first time — including the author of this issue, who had never hit quota limits before March 2026. --- ## Hypothesis The data strongly suggests that **1h TTL was the intended default for Claude Code** and was in place as of early February 2026. Sometime between Feb 27 and Mar 8, 2026, Anthropic silently changed the default to 5m TTL — either intentionally as a cost-saving measure, or accidentally as an infrastructure regression. Evidence supporting "1h was the intended default": - Phase 2 (1h ONLY) shows *zero* 5m tokens across **14 separate active days** spanning 3+ weeks — this is not noise or partial rollout, it is consistent deliberate behavior - The February cost profile is the only month with 0% overpayment — it represents what users should have been paying all along - The March reversion immediately produced the largest 5m-tier days in the entire dataset (30M tokens on Mar 22 alone), suggesting a sudden configuration flip rather than gradual drift - Subscription users began hitting 5-hour quota limits **for the first time** in March — directly coinciding with the reversion The most likely sequence of events: 1. **~Feb 1 and prior**: Anthropic defaulted to 1h TTL for Claude Code subscription users 2. **~Mar 6**: 5m tokens begin reappearing — gradual rollout of the change or partial infrastructure flip 3. **~Mar 8**: 5m TTL becomes dominant — the regression is fully in effect across both tested machines and accounts 4. **Mar 8+**: Mixed behavior continues, suggesting either incomplete rollout, A/B testing, or regional infrastructure variance The 33-day window of clean 1h-only behavior (Feb 1 – Mar 5) across two independent machines and two separate accounts makes this one of the strongest available signals that **1h TTL was Anthropic's deliberate default**, not a fluke. --- ## Request 1. **Confirm or deny** whether Anthropic made a server-side TTL default change in early February 2026 and reverted it in early March 2026 2. **Clarify the intended TTL behavior** for claude-code sessions — is 5m the intended default, or was 1h intended to be permanent? 3. **Consider restoring 1h TTL as the default** for Claude Code sessions, or exposing it as a user-configurable option. The 5m TTL is disproportionately punishing for the long-session, high-context use case that defines Claude Code usage 4. **Disclose quota counting behavior for cache_read tokens** (ref #45756) so users can make informed decisions about their usage patterns --- ## Methodology - Source: raw `~/.claude/projects/**/*.jsonl` session files (Claude Code stores per-message API responses including full `usage` objects) - Extraction: filtered for `type: "assistant"` entries with `message.usage.cache_creation` field - No external tools or proxies involved — this data comes directly from Claude Code's own session logs - Analysis tool: [cnighswonger/claude-code-cache-fix](https://github.com/cnighswonger/claude-code-cache-fix) `quota-analysis --source` mode (added to support this investigation) - Pricing: official Anthropic rates from `rates.json` (updated 2026-04-09)

来自Anthropic员工的回应：
{B18828E3-350A-4098-A597-478CAE40386B}1828×1046 207 KB

我说我的号明明是max5,但是接在sub2api里面却一直缓存过期，原来是被A÷暗改了

网友解答：

--【壹】--：

所以是ClaudeCode缓存爆炸的原因之一么？A\还是太"艺术"了

--【贰】--：

真的假的啊，三四十万的读缓存终于要落幕了吗，太遗憾了

--【叁】--：

QQ_17760043913072004×385 53.3 KB

那很悲了，原来我一直在被anthropic当猪宰

我大概算过，取决于上下文长度，基本一个这种写请求就能干掉2%-5%不等的5h窗口

--【肆】--：

刚刚测试的。问了个你好等了6分钟再发一条信息。是有1h缓存的

--【伍】--：

我前两天抓包的数据里，TTL还是1h啊，难道今天刚改的？

--【陆】--：

image496×863 48.2 KB
我去开个对话试试

--【柒】--：

Anthropic玩不起别玩，和鬼一样，等GPT6出来，我直接上pro，直接爽用，天天搞七搞八，不是封号就是搞小动作

--【捌】--：

我看了一下我本地文件里面的jsonl，请求里面写的也都是创建1h缓存。但是sub2api的请求记录那里看的很清楚，只要请求间隔超过5m就必定会触发缓存重建

--【玖】--：

目前好像看本地jsonl文件，还都是1h缓存，不过这种还是建议交给用户来选呗

--【拾】--：

我是有这么想，但是用oauth的话不太清楚应该怎么抓包验证

--【拾壹】--：

我是直接用没有接反代，佬做个对比实验看看，本地用是否会好

--【拾贰】--：

我本身就给cc打过1h ttl的补丁的。
我自己的号也是max5，按常理来说，如果a÷那边正常，就不应该创建5分钟上下文缓存ttl才对。只能说很奇怪了

--【拾叁】--：

之前5分钟缓存时期的产物，以前真的是盯着5分钟去回复的生怕怠慢它，得供着
image1004×1159 63.3 KB

--【拾肆】--：

应该是一个周期性的变化。

刚让cc自己分析了一下历史对话，3月9日之前确实每个对话中的首轮响应都是1h ttl的，3月10日-3月19日几乎全部切换到了5min ttl，但3月20日至今大部分（90%）的对话再次恢复了1h ttl。这种变化去年年底也出现过，当时有人提issue询问ttl从1h降至5min。

但是首轮响应1h ttl并不代表同一对话能够保持1h ttl，后端实际上会在对话过程中（35k-170k上下文不等）自动回落至5min ttl。

--【拾伍】--：

之前有看到1h ttl的补丁贴，但还没有实操。推测后端有个机制自动在未在请求中提供ttl参数（未打补丁的行为）的情况下动态切换走5min还是1h。

--【拾陆】--：

这个主要是对话的间隔时间。可以理解成cc结束输出内5分钟必须做出回应，不然就会被狠狠扣钱。
我这两天在写一个vr游戏的mod。cc构建好之后，戴头显等游戏启动就要等个半分钟，进去在局内测两下功能又要个一两分钟。剩下几分钟都快不够打字用的

--【拾柒】--：

本质还是读Project里的jsonl文件吧，再加上你超过5分钟后，看他是否去创建缓存了

--【拾捌】--：

应该是从服务端直接把你1h缓存掐了。管你发的1h还是5m的ttl，全都按5m创建。
我是这几天用sub2api发现的，即使给账号开了强制修改为1h缓存请求的选项，缓存照样在5分钟内过期。

--【拾玖】--：

a÷还是太了，这不是故意提高用户用量吗，我是不信一个问题 5 分钟内就可以搞定的

标签：人工智能软件开发 Anthropic ClaudeCode

问题描述：

github.com/anthropics/claude-code

Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation

已打开 01:49AM - 12 Apr 26 UTC 已关闭 10:15AM - 12 Apr 26 UTC seanGSISG bug has repro area:cost api:anthropic

来自Anthropic员工的回应：
{B18828E3-350A-4098-A597-478CAE40386B}1828×1046 207 KB

我说我的号明明是max5,但是接在sub2api里面却一直缓存过期，原来是被A÷暗改了

网友解答：

--【壹】--：

所以是ClaudeCode缓存爆炸的原因之一么？A\还是太"艺术"了

--【贰】--：

真的假的啊，三四十万的读缓存终于要落幕了吗，太遗憾了

--【叁】--：

QQ_17760043913072004×385 53.3 KB

那很悲了，原来我一直在被anthropic当猪宰

我大概算过，取决于上下文长度，基本一个这种写请求就能干掉2%-5%不等的5h窗口

--【肆】--：

刚刚测试的。问了个你好等了6分钟再发一条信息。是有1h缓存的

--【伍】--：

我前两天抓包的数据里，TTL还是1h啊，难道今天刚改的？

--【陆】--：

image496×863 48.2 KB
我去开个对话试试

--【柒】--：

Anthropic玩不起别玩，和鬼一样，等GPT6出来，我直接上pro，直接爽用，天天搞七搞八，不是封号就是搞小动作

--【捌】--：

我看了一下我本地文件里面的jsonl，请求里面写的也都是创建1h缓存。但是sub2api的请求记录那里看的很清楚，只要请求间隔超过5m就必定会触发缓存重建

--【玖】--：

目前好像看本地jsonl文件，还都是1h缓存，不过这种还是建议交给用户来选呗

--【拾】--：

我是有这么想，但是用oauth的话不太清楚应该怎么抓包验证

--【拾壹】--：

我是直接用没有接反代，佬做个对比实验看看，本地用是否会好

--【拾贰】--：

我本身就给cc打过1h ttl的补丁的。
我自己的号也是max5，按常理来说，如果a÷那边正常，就不应该创建5分钟上下文缓存ttl才对。只能说很奇怪了

--【拾叁】--：

之前5分钟缓存时期的产物，以前真的是盯着5分钟去回复的生怕怠慢它，得供着
image1004×1159 63.3 KB

--【拾肆】--：

应该是一个周期性的变化。

但是首轮响应1h ttl并不代表同一对话能够保持1h ttl，后端实际上会在对话过程中（35k-170k上下文不等）自动回落至5min ttl。

--【拾伍】--：

之前有看到1h ttl的补丁贴，但还没有实操。推测后端有个机制自动在未在请求中提供ttl参数（未打补丁的行为）的情况下动态切换走5min还是1h。

--【拾陆】--：

--【拾柒】--：

本质还是读Project里的jsonl文件吧，再加上你超过5分钟后，看他是否去创建缓存了

--【拾捌】--：

--【拾玖】--：

a÷还是太了，这不是故意提高用户用量吗，我是不信一个问题 5 分钟内就可以搞定的

标签：人工智能软件开发 Anthropic ClaudeCode

Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation

相关推荐

Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation

相关推荐