Anthropic 取消了 claude code 的全量 1h ttl 上下文缓存

2026-04-13 12:141阅读0评论SEO基础
  • 内容介绍
  • 文章标签
  • 相关推荐
问题描述:
github.com/anthropics/claude-code

Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation

已打开 01:49AM - 12 Apr 26 UTC 已关闭 10:15AM - 12 Apr 26 UTC seanGSISG bug has repro area:cost api:anthropic

# Cache TTL appears to have silently regressed from 1h to 5m around early March …2026, causing significant quota and cost inflation ## Summary Analysis of raw Claude Code session JSONL files spanning Jan 11 – Apr 11, 2026 shows that Anthropic appears to have **silently changed the prompt cache TTL default from 1 hour to 5 minutes sometime in early March 2026**. Prior to this change, Claude Code was receiving 1-hour TTL cache writes — which we believe was the intended default. The reversion to 5-minute TTL has caused a **20–32% increase in cache creation costs** and a measurable spike in quota consumption for subscription users who have never previously hit their limits. This appears directly related to the behavior described in #45756. --- ## Data Session data extracted from `~/.claude/projects/` JSONL files across **two machines** (Linux workstation + Windows laptop, different accounts/sessions), totaling **119,866 API calls** from Jan 11 – Apr 11, 2026. Each assistant message includes a `usage.cache_creation.ephemeral_5m_input_tokens` / `ephemeral_1h_input_tokens` breakdown that makes the TTL tier per-call observable. Having two independent machines strengthens the signal — both show the same behavioral shift at the same dates. ### Phase breakdown | Phase | Dates | TTL behavior | Evidence | |-------|-------|--------------|----------| | 1 | Jan 11 – Jan 31 | **5m ONLY** | `ephemeral_1h` absent/zero — likely predates 1h tier availability in the API | | 2 | Feb 1 – Mar 5 | **1h ONLY** | `ephemeral_5m = 0`, `ephemeral_1h > 0` across **33+ consecutive days** on both machines — near-zero exceptions | | 3 | Mar 6–7 | **Transition** | First 5m tokens re-appear, small volumes, 1h still present | | 4 | Mar 8 – Apr 11 | **5m dominant** | 5m tokens surge to majority; 1h becomes minority or disappears entirely | We believe Phase 2 represents Anthropic's **intended default behavior** — 1h TTL was rolled out as the Claude Code standard around Feb 1 and held consistently for over a month across two independent machines on two different accounts. January's all-5m data most likely predates the 1h TTL tier being available in the API. The regression began **around March 6–8, 2026**. No client-side changes were made between phases. The same Claude Code version and usage patterns were in place throughout. The TTL tier is set server-side by Anthropic. ### Day-by-day TTL data showing the regression (combined, both machines) ``` Date | 5m-create | 1h-create | Behavior ------------|------------|------------|---------- 2026-02-01 | 0.00M | 1.70M | 1h ONLY ← 1h default begins 2026-02-09 | 0.00M | 7.95M | 1h ONLY 2026-02-15 | 0.00M | 13.61M | 1h ONLY ← heaviest day, 100% 1h 2026-02-28 | 0.00M | 16.15M | 1h ONLY ← 16M tokens, still 100% 1h 2026-03-01 | 0.00M | 0.12M | 1h ONLY 2026-03-04 | 0.00M | 8.12M | 1h ONLY 2026-03-05 | 0.00M | 6.55M | 1h ONLY ← last clean 1h-only day | | | 2026-03-06 | 0.29M | 0.22M | MIXED ← first 5m tokens reappear 2026-03-07 | 4.56M | 0.50M | MIXED ← 5m surging 2026-03-08 | 16.86M | 3.44M | MIXED ← 5m now dominant (83%) 2026-03-10 | 10.55M | 0.51M | MIXED 2026-03-15 | 19.47M | 1.84M | MIXED 2026-03-21 | 21.37M | 1.70M | MIXED ← 93% 5m 2026-03-22 | 13.48M | 2.85M | MIXED ``` The transition is visible to the day: **March 6 is when 5m tokens first reappear** after 33 days of clean 1h-only behavior. By March 8, 5m tokens outnumber 1h by 5:1. This is consistent with a server-side configuration change being rolled out gradually then completing around March 8. --- ## Cost impact Applying official Anthropic pricing (rates.json, updated 2026-04-09): Combined dataset (119,866 API calls, two machines): **claude-sonnet-4-6** (`cache_write_5m = $3.75/MTok`, `cache_write_1h = $6.00/MTok`, `cache_read = $0.30/MTok`): | Month | Calls | Actual cost | Cost with 1h TTL | Overpaid | % waste | |-------|-------|-------------|-----------------|----------|---------| | Jan 2026 | 2,639 | $78.99 | $37.54 | $41.45 | **52.5%** | | Feb 2026 | 27,220 | $1,120.43 | $1,108.11 | $12.32 | **1.1%** ← nearly 0 on 1h | | Mar 2026 | 68,264 | $2,776.11 | $2,057.01 | $719.09 | **25.9%** | | Apr 2026 | 21,743 | $1,193.01 | $1,016.78 | $176.23 | **14.8%** | | **Total** | **119,866** | **$5,561.17** | **$4,612.09** | **$949.08** | **17.1%** | **claude-opus-4-6** (`cache_write_5m = $6.25/MTok`, `cache_write_1h = $10.00/MTok`, `cache_read = $0.50/MTok`): | Month | Calls | Actual cost | Cost with 1h TTL | Overpaid | % waste | |-------|-------|-------------|-----------------|----------|---------| | Jan 2026 | 2,639 | $131.65 | $62.57 | $69.08 | **52.5%** | | Feb 2026 | 27,220 | $1,867.38 | $1,846.85 | $20.53 | **1.1%** ← nearly 0 on 1h | | Mar 2026 | 68,264 | $4,626.84 | $3,428.36 | $1,198.49 | **25.9%** | | Apr 2026 | 21,743 | $1,988.35 | $1,694.64 | $293.71 | **14.8%** | | **Total** | **119,866** | **$9,268.97** | **$7,687.17** | **$1,581.80** | **17.1%** | February — the month Anthropic was defaulting to 1h TTL — shows only **1.1% waste** (trace 5m activity from one machine on one day). Every other month shows 15–53% overpayment from 5m cache re-creations. The cost difference is explained entirely by TTL tier, not by usage volume. The **percentage waste is identical across model tiers** (17.1%) because it is driven purely by the 5m/1h token split, not by per-token price. ### Why 5m TTL is so expensive in practice With 5m TTL, any pause in a session longer than 5 minutes causes the entire cached context to expire. On the next turn, Claude Code must re-upload that context as a fresh `cache_creation` at the write rate, rather than a `cache_read` at the read rate. The write rate is **12.5× more expensive** than the read rate for Sonnet, and the same ratio holds for Opus. For long coding sessions — which are the primary Claude Code use case — this creates a compounding penalty: the longer and more complex your session, the more context you have cached, and the more expensive each cache expiry becomes. Over the 3-month period analyzed: - **220M tokens** were written to the 5m tier - Those same tokens generated **5.7B cache reads** — meaning they were actively being used - Had those 220M tokens been on the 1h tier, re-accesses within the same hour would be reads (~$0.30–0.50/MTok) instead of re-creations (~$3.75–6.25/MTok) --- ## Quota impact Users on Pro/subscription plans are quota-limited, not just cost-limited. Cache creation tokens count toward quota at full rate; cache reads are significantly cheaper (the exact coefficient is under investigation in #45756). The silent reversion to 5m TTL in March is the most likely explanation for why subscription users began hitting their 5-hour quota limits for the first time — including the author of this issue, who had never hit quota limits before March 2026. --- ## Hypothesis The data strongly suggests that **1h TTL was the intended default for Claude Code** and was in place as of early February 2026. Sometime between Feb 27 and Mar 8, 2026, Anthropic silently changed the default to 5m TTL — either intentionally as a cost-saving measure, or accidentally as an infrastructure regression. Evidence supporting "1h was the intended default": - Phase 2 (1h ONLY) shows *zero* 5m tokens across **14 separate active days** spanning 3+ weeks — this is not noise or partial rollout, it is consistent deliberate behavior - The February cost profile is the only month with 0% overpayment — it represents what users should have been paying all along - The March reversion immediately produced the largest 5m-tier days in the entire dataset (30M tokens on Mar 22 alone), suggesting a sudden configuration flip rather than gradual drift - Subscription users began hitting 5-hour quota limits **for the first time** in March — directly coinciding with the reversion The most likely sequence of events: 1. **~Feb 1 and prior**: Anthropic defaulted to 1h TTL for Claude Code subscription users 2. **~Mar 6**: 5m tokens begin reappearing — gradual rollout of the change or partial infrastructure flip 3. **~Mar 8**: 5m TTL becomes dominant — the regression is fully in effect across both tested machines and accounts 4. **Mar 8+**: Mixed behavior continues, suggesting either incomplete rollout, A/B testing, or regional infrastructure variance The 33-day window of clean 1h-only behavior (Feb 1 – Mar 5) across two independent machines and two separate accounts makes this one of the strongest available signals that **1h TTL was Anthropic's deliberate default**, not a fluke. --- ## Request 1. **Confirm or deny** whether Anthropic made a server-side TTL default change in early February 2026 and reverted it in early March 2026 2. **Clarify the intended TTL behavior** for claude-code sessions — is 5m the intended default, or was 1h intended to be permanent? 3. **Consider restoring 1h TTL as the default** for Claude Code sessions, or exposing it as a user-configurable option. The 5m TTL is disproportionately punishing for the long-session, high-context use case that defines Claude Code usage 4. **Disclose quota counting behavior for cache_read tokens** (ref #45756) so users can make informed decisions about their usage patterns --- ## Methodology - Source: raw `~/.claude/projects/**/*.jsonl` session files (Claude Code stores per-message API responses including full `usage` objects) - Extraction: filtered for `type: "assistant"` entries with `message.usage.cache_creation` field - No external tools or proxies involved — this data comes directly from Claude Code's own session logs - Analysis tool: [cnighswonger/claude-code-cache-fix](https://github.com/cnighswonger/claude-code-cache-fix) `quota-analysis --source` mode (added to support this investigation) - Pricing: official Anthropic rates from `rates.json` (updated 2026-04-09)

来自Anthropic员工的回应:
{B18828E3-350A-4098-A597-478CAE40386B}1828×1046 207 KB

我说我的号明明是max5,但是接在sub2api里面却一直缓存过期,原来是被A÷暗改了

网友解答:
--【壹】--:

所以是ClaudeCode缓存爆炸的原因之一么?A\还是太"艺术"了


--【贰】--:

真的假的啊,三四十万的读缓存终于要落幕了吗,太遗憾了


--【叁】--:

QQ_17760043913072004×385 53.3 KB

那很悲了,原来我一直在被anthropic当猪宰

我大概算过,取决于上下文长度,基本一个这种写请求就能干掉2%-5%不等的5h窗口


--【肆】--:


刚刚测试的。问了个你好等了6分钟再发一条信息。是有1h缓存的


--【伍】--:

我前两天抓包的数据里,TTL还是1h啊,难道今天刚改的?


--【陆】--:

image496×863 48.2 KB
我去开个对话试试


--【柒】--:

Anthropic玩不起别玩,和鬼一样,等GPT6出来,我直接上pro,直接爽用,天天搞七搞八,不是封号就是搞小动作


--【捌】--:

我看了一下我本地文件里面的jsonl,请求里面写的也都是创建1h缓存。但是sub2api的请求记录那里看的很清楚,只要请求间隔超过5m就必定会触发缓存重建


--【玖】--:

目前好像看本地jsonl文件,还都是1h缓存,不过这种还是建议交给用户来选呗


--【拾】--:

我是有这么想,但是用oauth的话不太清楚应该怎么抓包验证


--【拾壹】--:

我是直接用没有接反代,佬做个对比实验看看,本地用是否会好


--【拾贰】--:

我本身就给cc打过1h ttl的补丁的。
我自己的号也是max5,按常理来说,如果a÷那边正常,就不应该创建5分钟上下文缓存ttl才对。只能说很奇怪了


--【拾叁】--:

之前5分钟缓存时期的产物,以前真的是盯着5分钟去回复的 生怕怠慢它,得供着
image1004×1159 63.3 KB


--【拾肆】--:

应该是一个周期性的变化。

刚让cc自己分析了一下历史对话,3月9日之前确实每个对话中的首轮响应都是1h ttl的,3月10日-3月19日几乎全部切换到了5min ttl,但3月20日至今大部分(90%)的对话再次恢复了1h ttl。这种变化去年年底也出现过,当时有人提issue询问ttl从1h降至5min。

但是首轮响应1h ttl并不代表同一对话能够保持1h ttl,后端实际上会在对话过程中(35k-170k上下文不等)自动回落至5min ttl。


--【拾伍】--:

之前有看到1h ttl的补丁贴,但还没有实操。推测后端有个机制自动在未在请求中提供ttl参数(未打补丁的行为)的情况下动态切换走5min还是1h。


--【拾陆】--:

这个主要是对话的间隔时间。可以理解成cc结束输出内5分钟必须做出回应,不然就会被狠狠扣钱。
我这两天在写一个vr游戏的mod。cc构建好之后,戴头显等游戏启动就要等个半分钟,进去在局内测两下功能又要个一两分钟。剩下几分钟都快不够打字用的


--【拾柒】--:

本质还是读Project里的jsonl文件吧,再加上你超过5分钟后,看他是否去创建缓存了


--【拾捌】--:

应该是从服务端直接把你1h缓存掐了。管你发的1h还是5m的ttl,全都按5m创建。
我是这几天用sub2api发现的,即使给账号开了强制修改为1h缓存请求的选项,缓存照样在5分钟内过期。


--【拾玖】--:

a÷还是太了,这不是故意提高用户用量吗,我是不信一个问题 5 分钟内就可以搞定的