【已更新 Anthropic 最新回应】Claude 官方订阅存在的缓存异常以及临时解决方案
- 内容介绍
- 文章标签
- 相关推荐
订阅 Claude 官方 Pro 和 Max plan 的佬们最近可能发现了额度的扣除速率比较反常。
常逛的其它几个社区里,甚至出现了简单的几个 “hi” 就导致额度剧烈消耗的案例。
(与此同时,站内疑似受害者 ):
可能是上下文有点长,那也很离谱了好吧··· [image] [image]claude 额度bug,消耗的异常的快? 搞七捻三
[image] 全程用的claude opus 4.6,中途会使用ccg调用codex+gemini max 20x,跑了3个项目,2个半小时就81%了? 这对吗Claude code的额度消耗过快 开发调优
用的是max*5,最近这几天这个额度用的好快啊,没有问几个问题呢,一个5小时的上限已经用了50%了,已经都是才10%吧,差距这么大啊,太狗了吧cursor额度是不是暗改了 搞七捻三
cursor最近额度计算消耗量明显变大了,ultra账号也禁不起2下蹬,之前一天高强度最多7%-8%,现在轻松20% api额度。有佬了解什么情况么claude额度告急!!! 开发调优
[image] 还有两天才能reset 感觉现在max 20x 额度减的厉害啊 不够用了~~~ 这两天咋办啊
恰巧 Anthropic 前几天某位 要倒血霉的 员工操作不当全网泄露了 Claude Code 的源码,Reddit 上某位老哥(skibidi-toaleta-2137)整了个逆向分析,推断应该是缓存 bug,并提出了可能导致缓存异常的原因和具体 issue [1] 。
该 Reddit 帖提到的这几个 issue 可在此查看:
[BUG] Conversation history invalidated on subsequent turns
已打开 08:59AM - 29 Mar 26 UTC jmarianski bug has repro platform:linux area:cost area:core regression### Preflight Checklist - [x] I have searched [existing issues](https://github.…com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? While investigating huge token usage I've noticed it come due to fact suddenly my conversation history gets invalidated and all subsequent turns revert to only caching system prompt and huge cache writes. ### What Should Happen? Cache shouldn't drop due to history changes. History should not be updated. Or we shouldn't be charged for historical updates. ### Error Messages/Logs ```shell Analysis of token usage from the start of my analysis: time cache_read cache_cr input out model stop ---------- ---------- ---------- ------- ----- ------------------ ------------ 22:22:48 312377 1944 1 215 opus-4-6 end_turn 22:23:39 314321 493 3 159 opus-4-6 end_turn 22:24:19 314814 172 3 108 opus-4-6 end_turn 22:33:42 0 0 8 1 haiku-4-5-20251001 max_tokens <-- resume 22:34:26 11428 305735 3 213 opus-4-6 tool_use <-- irrelevant cache rewrite after restart 22:34:35 317163 579 1 239 opus-4-6 tool_use 22:34:43 317742 566 1 152 opus-4-6 end_turn 22:37:13 318308 245 3 96 opus-4-6 end_turn 07:55:55 0 0 8 1 haiku-4-5-20251001 max_tokens <-- resume 07:56:22 11428 163547 3 143 opus-4-6 tool_use <-- partial cache regenerate (wth?) 07:56:40 174975 358 1 90 opus-4-6 end_turn 07:57:25 11428 307626 3 87 opus-4-6 end_turn <-- full cache regenerate 07:57:51 319054 108 3 89 opus-4-6 tool_use 07:58:05 319162 712 1 448 opus-4-6 tool_use 07:58:21 319874 833 1 367 opus-4-6 end_turn 07:59:21 320707 393 3 414 opus-4-6 tool_use 07:59:34 321100 609 1 560 opus-4-6 tool_use 07:59:47 321709 948 1 512 opus-4-6 tool_use 08:00:10 322657 615 1 348 opus-4-6 end_turn 08:03:00 323272 426 3 530 opus-4-6 tool_use 08:03:12 323698 972 1 468 opus-4-6 tool_use 08:03:22 324670 529 1 167 opus-4-6 end_turn 08:03:29 325199 215 3 28 opus-4-6 end_turn 08:05:30 0 0 8 1 haiku-4-5-20251001 max_tokens <-- resume 08:05:48 11428 187695 3 155 opus-4-6 tool_use 08:06:06 199123 876 1 780 opus-4-6 tool_use 08:06:25 199999 2199 1 1285 opus-4-6 tool_use 08:06:38 202198 1633 1 302 opus-4-6 end_turn 08:08:06 203831 481 3 88 opus-4-6 tool_use 08:08:17 204312 408 1 175 opus-4-6 end_turn 08:09:16 204720 206 3 154 opus-4-6 end_turn 08:10:25 204926 228 3 503 opus-4-6 tool_use 08:10:34 205154 1193 1 507 opus-4-6 tool_use 08:10:45 206347 1007 1 247 opus-4-6 end_turn 08:10:54 0 0 8 1 haiku-4-5-20251001 max_tokens <-- resume 08:11:16 11428 195983 3 136 opus-4-6 tool_use 08:11:37 207411 616 1 1207 opus-4-6 tool_use 08:11:49 208027 1457 1 290 opus-4-6 end_turn 08:12:02 209484 323 3 270 opus-4-6 end_turn 08:12:27 209807 284 3 190 opus-4-6 tool_use 08:12:39 210091 314 1 278 opus-4-6 tool_use 08:13:01 210405 728 1 1219 opus-4-6 tool_use 08:13:18 211133 1465 1 449 opus-4-6 end_turn 08:15:28 212598 599 3 325 opus-4-6 end_turn 08:16:26 213197 334 3 209 opus-4-6 end_turn 08:18:00 213531 288 3 137 opus-4-6 tool_use 08:18:07 213819 1131 1 122 opus-4-6 tool_use 08:18:15 214950 140 1 193 opus-4-6 tool_use 08:18:29 215090 1114 1 269 opus-4-6 tool_use 08:18:54 216204 10504 1 336 opus-4-6 tool_use <-- cache starts breaking down due to history change* 08:19:07 216204 11815 1 228 opus-4-6 tool_use 08:19:17 216204 12990 1 134 opus-4-6 tool_use 08:19:38 216204 13341 1 301 opus-4-6 tool_use 08:20:04 216204 13758 1 426 opus-4-6 tool_use 08:20:18 216204 15278 1 154 opus-4-6 tool_use 08:20:46 216204 15778 1 508 opus-4-6 tool_use 08:22:25 216204 17092 1 208 opus-4-6 tool_use 08:22:51 216204 17894 1 660 opus-4-6 tool_use 08:23:22 11428 224502 1 315 opus-4-6 end_turn <-- cache cannot get regenerated, reverting to full cache write 08:24:47 11428 224953 3 871 opus-4-6 tool_use 08:25:10 11428 227259 1 597 opus-4-6 tool_use 08:25:24 11428 228249 1 356 opus-4-6 tool_use 08:25:43 11428 228669 1 825 opus-4-6 tool_use 08:26:01 11428 229763 1 468 opus-4-6 tool_use 08:26:22 11428 230278 1 339 opus-4-6 end_turn 08:28:07 11428 230642 3 442 opus-4-6 end_turn 08:37:30 11428 231432 3 430 opus-4-6 end_turn --- (Ignore hour, it's another day) When running "npx @anthropic-ai/claude-code" 21:28:59 11374 46622 1 473 opus-4-6 tool_use <-- still on standalone binary 22:02:13 0 0 8 1 haiku-4-5-20251001 max_tokens <-- I tried resuming a couple of times 22:02:23 0 0 340 11 haiku-4-5-20251001 end_turn 22:02:25 11374 15278 3 21 opus-4-6 end_turn 22:04:51 0 0 8 1 haiku-4-5-20251001 max_tokens 22:04:58 0 0 341 11 haiku-4-5-20251001 end_turn 22:05:00 11374 15194 3 20 opus-4-6 end_turn 22:09:20 0 0 8 1 haiku-4-5-20251001 max_tokens 22:09:35 0 0 8 1 haiku-4-5-20251001 max_tokens 22:12:16 0 0 341 11 haiku-4-5-20251001 end_turn 22:12:18 11374 15194 3 21 opus-4-6 end_turn 22:15:36 0 0 8 1 haiku-4-5-20251001 max_tokens 23:22:46 0 0 8 1 haiku-4-5-20251001 max_tokens 23:23:06 0 0 341 12 haiku-4-5-20251001 end_turn 23:23:09 11374 17262 3 19 opus-4-6 end_turn 23:23:26 28636 27 3 12 opus-4-6 end_turn 23:31:41 0 0 8 1 haiku-4-5-20251001 max_tokens 23:31:50 0 0 345 13 haiku-4-5-20251001 end_turn 23:31:54 11374 17188 3 32 opus-4-6 end_turn <-- start of npx trials 23:32:25 28562 51 3 167 opus-4-6 end_turn 23:33:52 28613 320 3 666 opus-4-6 end_turn 23:34:55 0 0 8 1 haiku-4-5-20251001 max_tokens 23:35:12 0 0 355 15 haiku-4-5-20251001 end_turn 23:35:22 11374 17198 3 328 opus-4-6 end_turn 23:36:50 28572 367 3 500 opus-4-6 end_turn 23:37:15 28939 506 3 143 opus-4-6 tool_use 23:37:19 29445 523 1 91 opus-4-6 tool_use 23:37:56 29968 4869 1 1284 opus-4-6 tool_use 23:38:06 34837 1343 1 173 opus-4-6 end_turn 23:38:19 36180 219 3 151 opus-4-6 tool_use 23:38:27 36399 9511 1 341 opus-4-6 tool_use 23:38:33 45910 442 73 77 opus-4-6 tool_use 23:38:37 46352 250 1 77 opus-4-6 tool_use 23:38:42 46602 1161 1 134 opus-4-6 tool_use 23:38:59 47763 415 1 369 opus-4-6 tool_use 23:39:06 48178 427 1 96 opus-4-6 tool_use 23:39:09 48605 393 1 77 opus-4-6 tool_use 23:39:13 48605 639 1 152 opus-4-6 tool_use 23:40:17 11374 38207 3 362 opus-4-6 end_turn 23:41:35 49581 438 3 766 opus-4-6 end_turn 23:43:02 0 0 8 1 haiku-4-5-20251001 max_tokens <-- another session 23:43:23 11374 40201 3 97 opus-4-6 tool_use 23:43:30 51575 122 1 310 opus-4-6 tool_use 23:43:35 51697 408 1 152 opus-4-6 tool_use 23:43:41 52105 219 93 170 opus-4-6 tool_use 23:43:49 52324 442 1 259 opus-4-6 tool_use 23:43:54 52766 558 1 102 opus-4-6 tool_use 23:44:08 53324 2593 1 403 opus-4-6 end_turn 23:51:33 0 0 8 1 haiku-4-5-20251001 max_tokens 23:52:30 55917 431 3 187 opus-4-6 tool_use 23:54:20 56348 292 37 284 opus-4-6 tool_use 23:54:29 56640 612 158 492 opus-4-6 tool_use 23:54:54 60657 13 3 508 opus-4-6 end_turn 23:58:58 60670 717 2 454 opus-4-6 tool_use 23:59:05 61387 847 1 336 opus-4-6 tool_use 23:59:23 62234 1282 1 674 opus-4-6 tool_use 23:59:34 63516 1024 1 506 opus-4-6 tool_use 23:59:45 64540 583 1 264 opus-4-6 tool_use 23:59:53 65123 284 1 393 opus-4-6 tool_use 00:00:10 65407 470 1 887 opus-4-6 tool_use 00:03:07 65877 1024 1 871 opus-4-6 tool_use 00:03:16 66901 2098 1 538 opus-4-6 tool_use 00:03:25 68999 1492 1 379 opus-4-6 tool_use 00:03:36 70491 1043 1 640 opus-4-6 tool_use 00:03:43 71534 704 1 233 opus-4-6 tool_use 00:03:51 72238 250 1 148 opus-4-6 tool_use 00:03:58 72488 355 1 249 opus-4-6 tool_use 00:04:03 72843 396 1 259 opus-4-6 tool_use 00:04:10 73239 435 1 278 opus-4-6 tool_use 00:04:31 73674 359 1 941 opus-4-6 tool_use 00:04:45 74033 1595 1 662 opus-4-6 tool_use 00:05:00 75628 914 1 830 opus-4-6 tool_use 00:05:18 76542 1610 1 963 opus-4-6 tool_use 00:05:31 78152 1379 1 640 opus-4-6 tool_use 00:05:41 79531 1374 1 549 opus-4-6 tool_use 00:05:51 80905 1576 1 550 opus-4-6 tool_use 00:06:06 82481 629 1 986 opus-4-6 tool_use 00:06:20 83110 1102 1 994 opus-4-6 tool_use 00:06:30 84212 1074 1 578 opus-4-6 tool_use 00:06:48 85286 2418 1 854 opus-4-6 tool_use 00:07:01 87704 880 1 555 opus-4-6 tool_use 00:07:13 88584 818 1 601 opus-4-6 tool_use 00:07:30 89402 2165 1 520 opus-4-6 end_turn 00:11:02 91567 542 3 691 opus-4-6 tool_use 00:11:12 92109 777 1 733 opus-4-6 tool_use 00:11:25 92886 1042 1 578 opus-4-6 tool_use 00:11:39 93928 3347 1 694 opus-4-6 tool_use 00:12:21 98211 39 3 442 opus-4-6 end_turn 00:13:24 98250 462 3 161 opus-4-6 tool_use 00:13:32 98712 274 1 178 opus-4-6 end_turn 00:15:06 98986 190 3 237 opus-4-6 tool_use 00:15:37 99176 362 1 1202 opus-4-6 tool_use 00:15:42 6637 16169 2 114 opus-4-6 tool_use 00:15:44 99538 1427 1 280 opus-4-6 tool_use 00:15:47 22806 3503 1 160 opus-4-6 tool_use 00:15:49 100965 2929 1 52 opus-4-6 end_turn <-- my joy is great at this point 00:15:50 26309 191 1 91 opus-4-6 tool_use 00:15:54 26500 217 1 92 opus-4-6 tool_use 00:15:57 26717 152 1 88 opus-4-6 tool_use 00:16:02 26869 607 3 125 opus-4-6 tool_use 00:16:06 27476 146 1 166 opus-4-6 tool_use 00:16:09 27622 430 1 105 opus-4-6 tool_use 00:16:14 28052 123 1 109 opus-4-6 tool_use 00:16:19 28052 542 1 176 opus-4-6 tool_use 00:16:23 28594 440 1 95 opus-4-6 tool_use 00:16:28 29034 120 1 112 opus-4-6 tool_use 00:16:32 29154 139 1 180 opus-4-6 tool_use 00:16:37 29293 256 1 206 opus-4-6 tool_use 00:17:09 29549 1065 1 82 opus-4-6 tool_use 00:17:12 30614 131 1 117 opus-4-6 tool_use 00:17:24 30745 135 1 79 opus-4-6 tool_use 00:17:28 30745 237 1 100 opus-4-6 tool_use 00:17:35 30982 140 1 138 opus-4-6 tool_use 00:17:39 31122 222 1 112 opus-4-6 tool_use 00:17:46 31344 226 1 257 opus-4-6 tool_use 00:17:51 31570 288 1 125 opus-4-6 tool_use 00:17:56 31858 214 1 128 opus-4-6 tool_use 00:18:01 32072 316 1 139 opus-4-6 tool_use 00:18:04 32388 487 1 116 opus-4-6 tool_use 00:18:09 32875 134 1 125 opus-4-6 tool_use 00:19:29 33009 143 1 5925 opus-4-6 tool_use 00:19:35 33152 5912 1 113 opus-4-6 tool_use <-- yup, was at 100% usage at this point 00:19:36 0 0 0 0 - - 00:19:36 0 0 0 0 - - 10:08:54 11374 58355 3 270 opus-4-6 tool_use <-- costly resume, but cache TTL = 1h in claude code 10:09:00 69729 538 1 277 opus-4-6 tool_use 10:09:07 70267 7086 180 188 opus-4-6 tool_use 10:09:14 77353 436 1 201 opus-4-6 tool_use 10:09:18 77789 219 1 118 opus-4-6 tool_use 10:09:34 78008 518 1 183 opus-4-6 tool_use 10:10:40 78526 444 1 95 opus-4-6 tool_use 10:10:46 78970 1213 1 185 opus-4-6 tool_use 10:10:55 80183 996 1 270 opus-4-6 tool_use 10:13:03 81179 602 1 268 opus-4-6 tool_use 10:18:08 81781 675 1 121 opus-4-6 tool_use 10:18:14 82456 148 1 226 opus-4-6 tool_use 10:29:13 82604 823 1 184 opus-4-6 tool_use 10:29:19 83427 889 1 239 opus-4-6 tool_use ---------- ---------- ---------- ------- ----- ------------------ ------------ If you provide me with means, I can send you full request/response dumps *- no idea if this cache breaking was due to me inspecting binary or some historical tool change happened on the background level. ``` ### Steps to Reproduce Write "cch=00000" in command line and ask claude what does he see. He still should see "cch=00000". And token usage should be all "cache read" mostly, not "cache write" for subsequent requests. Step to temporarily fix: `npx @anthropic-ai/claude-code@2.1.34` // you need to fix it on older version to benefit from it ### Claude Model Opus ### Is this a regression? Yes, this worked in a previous version ### Last Working Version Based on reports: 2.1.67 ### Claude Code Version 2.1.86 (Claude Code) ### Platform Anthropic API ### Operating System Ubuntu/Debian Linux ### Terminal/Shell Other ### Additional Information Similar issue: https://github.com/anthropics/claude-code/issues/34629 - this one relates to immediate start of conversation Tool I wrote for debugging: https://gitlab.com/treetank/cc-diag Verification script: https://gitlab.com/treetank/cc-diag/-/raw/c126a7890f2ee12f76d91bfb1cc92612ae95284e/test_cache.py
[BUG] Prompt cache regression in --print --resume since v2.1.69(?): cache_read never grows, ~20x cost increase
已打开 12:42PM - 15 Mar 26 UTC 已关闭 01:26AM - 01 Apr 26 UTC cinniezra bug has repro platform:linux area:cost regression### Preflight Checklist - [x] I have searched [existing issues](https://github.…com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? ## Summary `--print --resume` sessions stopped caching conversation turns between API calls starting around v2.1.69. Only Claude Code's internal system prompt (~14.5k tokens) is cached; all conversation history is `cache_create`d from scratch on every message. **This causes a ~20x cost increase per message compared to v2.1.68.** ## Environment - **Platform:** Ubuntu (Hetzner VPS) - **Use case:** Discord bot using `claude --print --model <model> --resume <session-id> --output-format stream-json --verbose` with prompts piped via stdin - **Tested models:** `claude-opus-4-6[1m]`, `opus`, `claude-opus-4-5-20251101` The regression is version-dependent, not model-dependent. ## Suspect Something in newer updates after 2.1.68 may have inadvertently broken cache breakpoint placement for `--print --resume` sessions. ## Workaround Pinned to v2.1.68 (`npm install -g @anthropic-ai/claude-code@2.1.68`). ### What Should Happen? ## Expected behavior (v2.1.68) `cache_read` grows as conversation accumulates, `cache_create` drops to a small delta (~800 tokens): ``` Message 1: cache_read=13,997 cache_create=22,946 cost=$0.15 (cold start) Message 2: cache_read=32,849 cache_create=4,636 cost=$0.05 Message 3: cache_read=36,846 cache_create=879 cost=$0.03 Message 4: cache_read=37,295 cache_create=802 cost=$0.02 ``` ## Actual behavior (v2.1.76 and likely earlier versions after v2.1.68) `cache_read` is stuck at ~14.5k (Claude Code's system prompt only), `cache_create` equals the full conversation size and grows every message: ``` Message 1: cache_read=14,569 cache_create=54,437 cost=$0.35 Message 2: cache_read=14,569 cache_create=55,084 cost=$0.35 Message 3: cache_read=14,569 cache_create=55,512 cost=$0.35 Message 4: cache_read=14,569 cache_create=55,733 cost=$0.36 Message 5: cache_read=14,569 cache_create=55,954 cost=$0.36 ``` The conversation turns are never reused from cache between calls. Only Claude Code's internal system prompt (~14.5k tokens) caches successfully. ### Error Messages/Logs ```shell ## Testing matrix All tests used fresh session UUIDs and back-to-back messages (well within the 5-minute cache TTL): | Version | Model | Context | cache_read grows? | Steady-state cost/msg | |---------|-------|---------|-------------------|----------------------| | 2.1.68 | `opus` | 200k | **Yes** | ~$0.02 | | 2.1.68 | `claude-opus-4-6[1m]` | 1M | **Yes** | ~$0.02 | | 2.1.76 | `opus` | 200k | **No (stuck at 14.5k)** | ~$0.04-0.40 (grows) | | 2.1.76 | `claude-opus-4-6[1m]` | 1M | **No (stuck at 14.5k)** | ~$0.35-0.40 | | 2.1.76 | `claude-opus-4-5-20251101` | 200k | **No (stuck at 14.5k)** | ~$0.04-0.40 (grows) | ``` ### Steps to Reproduce ## Reproduction 1. Run `claude --print --resume <session-id> --output-format stream-json --verbose` with a prompt via stdin 2. Send 3+ messages to the same session 3. Observe `cache_read_input_tokens` and `cache_creation_input_tokens` in the stream-json `result` output ### Claude Model Opus ### Is this a regression? Yes, this worked in a previous version ### Last Working Version 2.1.68 ### Claude Code Version 2.1.76 ### Platform Other ### Operating System Ubuntu/Debian Linux ### Terminal/Shell Other ### Additional Information This report (including the testing matrix) was written by Claude Code during a debugging session.
Reddit post 里没提到的几个相关 issue:
[BUG] Client-side rate limiter blocks requests with zero API calls when conversation transcript is large (~74MB) — false rate_limit error with synthetic model and 0 input/output tokens
已打开 01:21PM - 29 Mar 26 UTC rwp65 bug has repro area:core### Preflight Checklist - [x] I have searched [existing issues](https://github.…com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? After hours of inactivity in a long-running session, every new message from the user immediately returns `"API Error: Rate limit reached"` without making any API call. The error is generated client-side by Claude Code, not by the Anthropic API. The user cannot proceed with any work — every message, including simple ones like "proceed", triggers the same error. ### What Should Happen? After hours of inactivity, the rate limit budget should have fully reset. A simple message should be sent to the API and receive a normal response. ### Error Messages/Logs ```shell Session log: `~/.claude/projects/-home-rich-RE6D/7137463d-be5d-4d5e-a97d-bb12b5e44b58.jsonl` **Six consecutive blocked requests between 13:11:09 and 13:11:28 UTC on 2026-03-29:** Each error entry has this structure: { "type": "assistant", "message": { "model": "<synthetic>", "role": "assistant", "usage": { "input_tokens": 0, "output_tokens": 0, "cache_creation_input_tokens": 0, "cache_read_input_tokens": 0 }, "content": [ { "type": "text", "text": "API Error: Rate limit reached" } ] }, "error": "rate_limit", "isApiErrorMessage": true } Key observations: | Field | Value | Significance | |-------|-------|-------------| | `model` | `"<synthetic>"` | NOT a real API response — generated by Claude Code client | | `input_tokens` | `0` | No tokens were sent to the API | | `output_tokens` | `0` | No tokens were received from the API | | `cache_read_input_tokens` | `0` | No cache was accessed | | `isApiErrorMessage` | `true` | Claude Code flagged this as an API error | | `error` | `"rate_limit"` | Client-side classification | **Contrast with the first successful request after the user persisted (13:11:37 UTC):** { "model": "claude-opus-4-6", "usage": { "input_tokens": 3, "cache_creation_input_tokens": 1315, "cache_read_input_tokens": 668864, "output_tokens": 1, "service_tier": "standard" } } ``` ### Steps to Reproduce 1. Run a Claude Code session for multiple days with heavy agent usage (many subagent dispatches, large code changes) 2. Accumulate a conversation transcript of ~74MB (the `.jsonl` file grows as the session continues) 3. Leave the session idle for several hours 4. Send any message (e.g., "proceed") 5. Observe: immediate `"API Error: Rate limit reached"` with no actual API call ### Claude Model Opus ### Is this a regression? Yes, this worked in a previous version ### Last Working Version _No response_ ### Claude Code Version 2.1.81 ### Platform Anthropic API ### Operating System Other Linux ### Terminal/Shell Xterm ### Additional Information # Bug Report: Client-side rate limiter blocks requests with zero API calls when conversation transcript is large ## Title Client-side rate limiter blocks requests with zero API calls when conversation transcript is large (~74MB) — false rate_limit error with synthetic model and 0 input/output tokens ## Environment - **Claude Code Version:** 2.1.81 - **OS:** Ubuntu Linux 6.17.0-19-generic - **Shell:** bash - **Model:** claude-opus-4-6 (1M context) - **Platform:** CLI (`entrypoint: "cli"`) - **Session ID:** 7137463d-be5d-4d5e-a97d-bb12b5e44b58 ## Description After hours of inactivity in a long-running session, every new message from the user immediately returns `"API Error: Rate limit reached"` without making any API call. The error is generated client-side by Claude Code, not by the Anthropic API. The user cannot proceed with any work — every message, including simple ones like "proceed", triggers the same error. ## Steps to Reproduce 1. Run a Claude Code session for multiple days with heavy agent usage (many subagent dispatches, large code changes) 2. Accumulate a conversation transcript of ~74MB (the `.jsonl` file grows as the session continues) 3. Leave the session idle for several hours 4. Send any message (e.g., "proceed") 5. Observe: immediate `"API Error: Rate limit reached"` with no actual API call ## Expected Behavior After hours of inactivity, the rate limit budget should have fully reset. A simple message should be sent to the API and receive a normal response. ## Actual Behavior Claude Code's client-side rate limiter blocks the request before it reaches the Anthropic API. The user sees `"API Error: Rate limit reached"` and cannot use the tool at all. ## Evidence from Logs Session log: `~/.claude/projects/-home-rich-RE6D/7137463d-be5d-4d5e-a97d-bb12b5e44b58.jsonl` **Six consecutive blocked requests between 13:11:09 and 13:11:28 UTC on 2026-03-29:** Each error entry has this structure: ```json { "type": "assistant", "message": { "model": "<synthetic>", "role": "assistant", "usage": { "input_tokens": 0, "output_tokens": 0, "cache_creation_input_tokens": 0, "cache_read_input_tokens": 0 }, "content": [ { "type": "text", "text": "API Error: Rate limit reached" } ] }, "error": "rate_limit", "isApiErrorMessage": true } ``` Key observations: | Field | Value | Significance | |-------|-------|-------------| | `model` | `"<synthetic>"` | NOT a real API response — generated by Claude Code client | | `input_tokens` | `0` | No tokens were sent to the API | | `output_tokens` | `0` | No tokens were received from the API | | `cache_read_input_tokens` | `0` | No cache was accessed | | `isApiErrorMessage` | `true` | Claude Code flagged this as an API error | | `error` | `"rate_limit"` | Client-side classification | **Contrast with the first successful request after the user persisted (13:11:37 UTC):** ```json { "model": "claude-opus-4-6", "usage": { "input_tokens": 3, "cache_creation_input_tokens": 1315, "cache_read_input_tokens": 668864, "output_tokens": 1, "service_tier": "standard" } } ``` This successful request shows `cache_read_input_tokens: 668,864` — the session context is approximately **668K tokens**. This is likely what the client-side rate limiter is counting against the budget. ## Root Cause Hypothesis The client-side rate limiter appears to calculate the token cost of the next request by estimating the context size (668K+ tokens) and checking it against a per-minute or per-hour token budget. For very large sessions, the CONTEXT ALONE may exceed the rate limit budget — even though the user's actual message is just a few tokens. This creates a situation where: - The session grows over days of heavy use - The context window fills with conversation history - Eventually the context size exceeds the rate limit's per-window token budget - Every subsequent request is blocked client-side, regardless of actual API availability - The user is permanently locked out until they start a new session ## Session Size Data | Metric | Value | |--------|-------| | Session transcript file | 74,019,933 bytes (74MB) | | Estimated context tokens | 668,864 (from cache_read_input_tokens) | | Session duration | ~4 days (2026-03-25 to 2026-03-29) | | Subagents dispatched | 50+ over the session | | Session compactions | Multiple (context was compressed during the session) | ## Impact - **Severity:** High — user is completely blocked from using Claude Code - **Workaround:** Start a new session (loses all conversation context) - **User experience:** Extremely frustrating — the error message gives no indication that the session size is the problem, and retrying makes it worse (each retry attempt may count against the budget) ## Suggested Fix 1. **Don't count cached/context tokens against the rate limit budget** — the user isn't "using" more tokens by having a long session. The cache is already paid for. 2. **If rate limiting must include context, reset the budget after idle periods** — hours of inactivity should fully reset any per-minute/per-hour budget. 3. **Show a more helpful error message** — instead of "API Error: Rate limit reached", show "Session context is very large (668K tokens). Consider starting a new session with `/compact` or a fresh session." 4. **Distinguish client-side rate limiting from API rate limiting** — the current message is identical for both, making it impossible for the user to diagnose.
[BUG] Silent context degradation — tool results cleared without notification on 1M context sessions this issue documents three separate mechanisms (microcompact, cached microcompact, session memory compact)
已打开 11:50AM - 02 Apr 26 UTC Sn3th bug has repro platform:linux area:core### Preflight Checklist - [x] I have searched [existing issues](https://github.…com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? Silent context degradation — tool results cleared without notification on 1M context sessions What's happening - Deal breaker for Claude Opus not sour grapes but you literally pushing us back to GPT , Deepseek, GLM and Kimi 2.5 with this! I have been fighting dumber agents for the past 48 hours! Since ~v2.1.89/2.1.90, tool result content from earlier in a session is being silently replaced with `[Old tool result content cleared]`. No compaction notification is shown. No `/compact` was triggered. The agent and user have no indication this is happening. Sessions do heavy tool use — 50+ file reads, greps, bash commands per session. We're observing: - Token counter showing ~80k on a 1M context window, with early tool results already gone - Agent making confident statements from internalised summaries because the source material was silently stripped - Context visibly shrinking across multiple independent sessions (persistent and one-shot) - No autocompact threshold was hit (we set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70`, threshold would be ~686k) Root cause (from source) Three mechanisms in `src/services/compact/` run silently on every API call: 1. Time-based microcompact (`microCompact.ts:422`) — if the gap since the last assistant message exceeds a threshold, old tool results are content-cleared. Threshold comes from GrowthBook (`getTimeBasedMCConfig()`). 2. Cached microcompact (`microCompact.ts:305`) — uses `cache_edits` API to delete old tool results from server cache. Count-based trigger/keep thresholds from GrowthBook (`getCachedMCConfig()`). 3. Session memory compact (`sessionMemoryCompact.ts:57`) — runs before autocompact, keeps only ~40k tokens of recent messages. Gated by `tengu_session_memory` GrowthBook flag. None of these show any UI notification. None trigger PreToolUse/PostToolUse hooks. The user sees no "Compacting..." message. Problem: - No transparency. The agent and user don't know context is being stripped. There's no opt-in, no notification, no setting to control it. - `DISABLE_AUTO_COMPACT=true` doesn't help. It only disables autocompact — microcompact still runs on every API call. - `DISABLE_COMPACT=true` is a sledgehammer. It kills manual `/compact` too, which we rely on. - GrowthBook controls mean this changed server-side without any CLI update or changelog entry. We didn't enable any new features — the behaviour appeared on its own. - 1M context is the product we're paying for. If the effective usable context is 40-80k due to silent trimming, the value proposition of Claude Max with 1M context is fundamentally undermined. Environment: - Claude Code v2.1.90 - Opus 4.6 with `[1m]` context - `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70` ### What Should Happen? Fix: 1. An env var to disable microcompact independently (e.g. `DISABLE_MICROCOMPACT=true`) — let users who want their full context window keep it 2. UI notification when tool results are cleared — same as autocompact shows "Compacting..." 3. Changelog transparency when GrowthBook flags change context management behaviour — silent server-side changes to how context is managed are not acceptable for paying customers running production workloads 4. A `--no-microcompact` flag or settings.json option — let power users opt out without losing manual `/compact` ### Error Messages/Logs ```shell Error message?! the result speaks for itself. the agent is effectively dumber. "microcompact independently" ``` ### Steps to Reproduce Reproduce: 1. Start a session with `--model opus[1m]` 2. Read 20+ files, run grep/bash commands, do multi-file work 3. After ~30-40 minutes of tool-heavy work, try to reference file contents from the first 10 minutes 4. The agent will paraphrase from memory rather than quote — the original tool results are gone (Agent distilled effectively useless now!) 5. Token counter will show far less than expected for the amount of work done (dishonest!) ### Claude Model Opus ### Is this a regression? Yes, this worked in a previous version ### Last Working Version v2.1.81 ### Claude Code Version v2.1.90 ### Platform Anthropic API ### Operating System Ubuntu/Debian Linux ### Terminal/Shell Other ### Additional Information Guys... this is a deal breaker for me! the Opus has been heavily distilled and effectively broken! there was no need for this change it adds absolutely no value what so ever.
同一天,Anthropic 的员工之一 Lydia Hallie 在 X/Twitter 上承认了存在额度消耗过快的问题。并已展开排查:
https://x.com/lydiahallie/status/2038686571676008625
Anthropic 也在 r/Anthropic 的 subreddit 上发布了类似官方声明:
https://www.reddit.com/r/Anthropic/comments/1s7zfap/investigating_usage_limits_hitting_faster_than/
经过几天的发酵和社交媒体上用户海量的抱怨,关注到此事的 BBC 昨天也进行了报道:
Claude Code users hitting usage limits 'way faster than expected'
Anthropic, the company behind the AI coding assistant, said it was fixing a problem blocking users.
时间线大概就是如此。
但是仅知道问题出在哪还不够。作为用户,我们目前可以采取的应对手段是什么呢?
- 首先,Anthropic 昨天发布了 v2.1.91,这个新版本部分解决了 #40524 和 #34629。所以第一步应该是尽快升级至 v2.1.91 (可pin)。
卸载官方推荐的独立二进制(bun runtime)ELF,使用 NPM 包进行安装使用,以避免sentinal replacement污染cache prefix,消耗起飞 。- 定期开启新会话。
- 避免使用恢复会话,包括
--continue, --continue --dangerously-skip-permissions, /resume,这会导致cache_read -> 0以及cache_creation,消耗起飞 。 - 避免使用
/dream和/insights,后台 API 调用也会导致 消耗起飞 。
- 祈祷 A/ 做个人 ,麻利地修好 bug,重置额度,降落消耗。
站内类似贴:
某位佬三天前也发过 skibidi-toaleta-2137 那篇 Reddit post 的总结:
Claude code /resume后的缓存失效问题 开发调优看到了顺便转过来。这个原文是claude写的就不贴过来了。 https://www.reddit.com/r/ClaudeAI/comments/1s7mkn3/psa_claude_code_has_two_cache_bugs_that_can/ 验证脚本 https://gitlab.com/treetank/cc-diag/-/raw/c126a7890f2ee12f76d91bfb…
某位佬两天前也提出了类似的应对措施:
针对 Claude Code 额度掉的飞快的一些优化建议 开发调优最近我自己的 Claude Max 的额度消耗巨快。看了网上一些帖子,可能与缓存失效的bug有关,部分应该也是由于 Claude Opus 升级 1M 上下文后带来的上下文负担。 目前感觉 5 小时的使用额度,还没用到 2 小时就消耗殆尽了(max 5x)。 我跟 Claude Code聊了几轮,让它自己去网上调研这个现象的原因,总结得到以下优化建议(纯手打): 1)尽量避免直接使用 /re…
来源:
以上信息部分来自于 ArkNill 整理的 claude-code-cache-analysis 报告。
对于 Root Cause 分析和 Benchmark 感兴趣的可在此阅览该报告:
GitHub - ArkNill/claude-code-cache-analysis: Measured analysis of Claude Code cache bugs...
Measured analysis of Claude Code cache bugs causing 10-20x token inflation on Max plans
希望能帮到佬们~
– 04/03 上午更新 –
写完这篇帖子后,发现 v2.1.91 已经发布了。在最新版本中,可以继续使用 Anthropic 推荐的独立二进制文件,不再需要 NPM 安装使用了。
同时 Anthropic 官方经过多天排查,终于给出了最新“回应”。
Reddit 上 r/ClaudeAI 的官方回应 在此。
以及 Lydia Hallie 几乎相同内容的 X/Twitter 贴和翻译:
asdasdasdasdad23qq3214sfd - Copy706×298 29.5 KB
asdasdasdkajsdhakjdhaskjdk231983742897askjdhakjdhakj - Copy1169×147 17.6 KB
adsasdasd2343245sdfdsfd232df23 - Copy706×593 57.9 KB
asdadsakjdhaskdjadkjad2398472984sdjkdhfskjfh239847skdjfhskjdfhs - Copy1161×395 72.5 KB
不出意外,社区对于这个 gaslighting 回应非常满意:
https://reddit.com/r/claude/comments/1satc4f/the_biggest_gaslighting_in_ai_history_anthropic/
asdasdasdadsjhaskjdh123h1kjhkjahsdkjahk2j1h3kjashdkjah - Copy.PNG721×1090 132 KB
言尽于此,只能留下这六字真言,聊表心意:
。
原帖 ↩︎
--【壹】--:
。
--【贰】--:
会导致缓存失效,缓存归0,恢复的会话,上下文又大
--【叁】--:
奇了怪了,我平时就很关注这个问题,但你提的那些帖子我自己从来没刷到过
--【肆】--: paguro:
避免使用恢复会话,包括
--continue, --continue --dangerously-skip-permissions, /resume,这会导致cache_read -> 0以及cache_creation,消耗起飞
这个为什么不能用啊
--【伍】--:
其实,我也是先在其它社区观察到这个问题的。
可能是因为站内很多佬们主要是使用中转服务而不是 Claude 的官方订阅?所以这类帖子能见度/曝光度被稀释了?
毕竟 Claude 订阅针对中国的风控是出了名的难搞,尤其是 Max plan。
--【陆】--:
所以cursor为啥也消耗的飞快了呢
--【柒】--:
RNM 退钱!!!
--【捌】--:
三句话就没了
订阅 Claude 官方 Pro 和 Max plan 的佬们最近可能发现了额度的扣除速率比较反常。
常逛的其它几个社区里,甚至出现了简单的几个 “hi” 就导致额度剧烈消耗的案例。
(与此同时,站内疑似受害者 ):
可能是上下文有点长,那也很离谱了好吧··· [image] [image]claude 额度bug,消耗的异常的快? 搞七捻三
[image] 全程用的claude opus 4.6,中途会使用ccg调用codex+gemini max 20x,跑了3个项目,2个半小时就81%了? 这对吗Claude code的额度消耗过快 开发调优
用的是max*5,最近这几天这个额度用的好快啊,没有问几个问题呢,一个5小时的上限已经用了50%了,已经都是才10%吧,差距这么大啊,太狗了吧cursor额度是不是暗改了 搞七捻三
cursor最近额度计算消耗量明显变大了,ultra账号也禁不起2下蹬,之前一天高强度最多7%-8%,现在轻松20% api额度。有佬了解什么情况么claude额度告急!!! 开发调优
[image] 还有两天才能reset 感觉现在max 20x 额度减的厉害啊 不够用了~~~ 这两天咋办啊
恰巧 Anthropic 前几天某位 要倒血霉的 员工操作不当全网泄露了 Claude Code 的源码,Reddit 上某位老哥(skibidi-toaleta-2137)整了个逆向分析,推断应该是缓存 bug,并提出了可能导致缓存异常的原因和具体 issue [1] 。
该 Reddit 帖提到的这几个 issue 可在此查看:
[BUG] Conversation history invalidated on subsequent turns
已打开 08:59AM - 29 Mar 26 UTC jmarianski bug has repro platform:linux area:cost area:core regression### Preflight Checklist - [x] I have searched [existing issues](https://github.…com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? While investigating huge token usage I've noticed it come due to fact suddenly my conversation history gets invalidated and all subsequent turns revert to only caching system prompt and huge cache writes. ### What Should Happen? Cache shouldn't drop due to history changes. History should not be updated. Or we shouldn't be charged for historical updates. ### Error Messages/Logs ```shell Analysis of token usage from the start of my analysis: time cache_read cache_cr input out model stop ---------- ---------- ---------- ------- ----- ------------------ ------------ 22:22:48 312377 1944 1 215 opus-4-6 end_turn 22:23:39 314321 493 3 159 opus-4-6 end_turn 22:24:19 314814 172 3 108 opus-4-6 end_turn 22:33:42 0 0 8 1 haiku-4-5-20251001 max_tokens <-- resume 22:34:26 11428 305735 3 213 opus-4-6 tool_use <-- irrelevant cache rewrite after restart 22:34:35 317163 579 1 239 opus-4-6 tool_use 22:34:43 317742 566 1 152 opus-4-6 end_turn 22:37:13 318308 245 3 96 opus-4-6 end_turn 07:55:55 0 0 8 1 haiku-4-5-20251001 max_tokens <-- resume 07:56:22 11428 163547 3 143 opus-4-6 tool_use <-- partial cache regenerate (wth?) 07:56:40 174975 358 1 90 opus-4-6 end_turn 07:57:25 11428 307626 3 87 opus-4-6 end_turn <-- full cache regenerate 07:57:51 319054 108 3 89 opus-4-6 tool_use 07:58:05 319162 712 1 448 opus-4-6 tool_use 07:58:21 319874 833 1 367 opus-4-6 end_turn 07:59:21 320707 393 3 414 opus-4-6 tool_use 07:59:34 321100 609 1 560 opus-4-6 tool_use 07:59:47 321709 948 1 512 opus-4-6 tool_use 08:00:10 322657 615 1 348 opus-4-6 end_turn 08:03:00 323272 426 3 530 opus-4-6 tool_use 08:03:12 323698 972 1 468 opus-4-6 tool_use 08:03:22 324670 529 1 167 opus-4-6 end_turn 08:03:29 325199 215 3 28 opus-4-6 end_turn 08:05:30 0 0 8 1 haiku-4-5-20251001 max_tokens <-- resume 08:05:48 11428 187695 3 155 opus-4-6 tool_use 08:06:06 199123 876 1 780 opus-4-6 tool_use 08:06:25 199999 2199 1 1285 opus-4-6 tool_use 08:06:38 202198 1633 1 302 opus-4-6 end_turn 08:08:06 203831 481 3 88 opus-4-6 tool_use 08:08:17 204312 408 1 175 opus-4-6 end_turn 08:09:16 204720 206 3 154 opus-4-6 end_turn 08:10:25 204926 228 3 503 opus-4-6 tool_use 08:10:34 205154 1193 1 507 opus-4-6 tool_use 08:10:45 206347 1007 1 247 opus-4-6 end_turn 08:10:54 0 0 8 1 haiku-4-5-20251001 max_tokens <-- resume 08:11:16 11428 195983 3 136 opus-4-6 tool_use 08:11:37 207411 616 1 1207 opus-4-6 tool_use 08:11:49 208027 1457 1 290 opus-4-6 end_turn 08:12:02 209484 323 3 270 opus-4-6 end_turn 08:12:27 209807 284 3 190 opus-4-6 tool_use 08:12:39 210091 314 1 278 opus-4-6 tool_use 08:13:01 210405 728 1 1219 opus-4-6 tool_use 08:13:18 211133 1465 1 449 opus-4-6 end_turn 08:15:28 212598 599 3 325 opus-4-6 end_turn 08:16:26 213197 334 3 209 opus-4-6 end_turn 08:18:00 213531 288 3 137 opus-4-6 tool_use 08:18:07 213819 1131 1 122 opus-4-6 tool_use 08:18:15 214950 140 1 193 opus-4-6 tool_use 08:18:29 215090 1114 1 269 opus-4-6 tool_use 08:18:54 216204 10504 1 336 opus-4-6 tool_use <-- cache starts breaking down due to history change* 08:19:07 216204 11815 1 228 opus-4-6 tool_use 08:19:17 216204 12990 1 134 opus-4-6 tool_use 08:19:38 216204 13341 1 301 opus-4-6 tool_use 08:20:04 216204 13758 1 426 opus-4-6 tool_use 08:20:18 216204 15278 1 154 opus-4-6 tool_use 08:20:46 216204 15778 1 508 opus-4-6 tool_use 08:22:25 216204 17092 1 208 opus-4-6 tool_use 08:22:51 216204 17894 1 660 opus-4-6 tool_use 08:23:22 11428 224502 1 315 opus-4-6 end_turn <-- cache cannot get regenerated, reverting to full cache write 08:24:47 11428 224953 3 871 opus-4-6 tool_use 08:25:10 11428 227259 1 597 opus-4-6 tool_use 08:25:24 11428 228249 1 356 opus-4-6 tool_use 08:25:43 11428 228669 1 825 opus-4-6 tool_use 08:26:01 11428 229763 1 468 opus-4-6 tool_use 08:26:22 11428 230278 1 339 opus-4-6 end_turn 08:28:07 11428 230642 3 442 opus-4-6 end_turn 08:37:30 11428 231432 3 430 opus-4-6 end_turn --- (Ignore hour, it's another day) When running "npx @anthropic-ai/claude-code" 21:28:59 11374 46622 1 473 opus-4-6 tool_use <-- still on standalone binary 22:02:13 0 0 8 1 haiku-4-5-20251001 max_tokens <-- I tried resuming a couple of times 22:02:23 0 0 340 11 haiku-4-5-20251001 end_turn 22:02:25 11374 15278 3 21 opus-4-6 end_turn 22:04:51 0 0 8 1 haiku-4-5-20251001 max_tokens 22:04:58 0 0 341 11 haiku-4-5-20251001 end_turn 22:05:00 11374 15194 3 20 opus-4-6 end_turn 22:09:20 0 0 8 1 haiku-4-5-20251001 max_tokens 22:09:35 0 0 8 1 haiku-4-5-20251001 max_tokens 22:12:16 0 0 341 11 haiku-4-5-20251001 end_turn 22:12:18 11374 15194 3 21 opus-4-6 end_turn 22:15:36 0 0 8 1 haiku-4-5-20251001 max_tokens 23:22:46 0 0 8 1 haiku-4-5-20251001 max_tokens 23:23:06 0 0 341 12 haiku-4-5-20251001 end_turn 23:23:09 11374 17262 3 19 opus-4-6 end_turn 23:23:26 28636 27 3 12 opus-4-6 end_turn 23:31:41 0 0 8 1 haiku-4-5-20251001 max_tokens 23:31:50 0 0 345 13 haiku-4-5-20251001 end_turn 23:31:54 11374 17188 3 32 opus-4-6 end_turn <-- start of npx trials 23:32:25 28562 51 3 167 opus-4-6 end_turn 23:33:52 28613 320 3 666 opus-4-6 end_turn 23:34:55 0 0 8 1 haiku-4-5-20251001 max_tokens 23:35:12 0 0 355 15 haiku-4-5-20251001 end_turn 23:35:22 11374 17198 3 328 opus-4-6 end_turn 23:36:50 28572 367 3 500 opus-4-6 end_turn 23:37:15 28939 506 3 143 opus-4-6 tool_use 23:37:19 29445 523 1 91 opus-4-6 tool_use 23:37:56 29968 4869 1 1284 opus-4-6 tool_use 23:38:06 34837 1343 1 173 opus-4-6 end_turn 23:38:19 36180 219 3 151 opus-4-6 tool_use 23:38:27 36399 9511 1 341 opus-4-6 tool_use 23:38:33 45910 442 73 77 opus-4-6 tool_use 23:38:37 46352 250 1 77 opus-4-6 tool_use 23:38:42 46602 1161 1 134 opus-4-6 tool_use 23:38:59 47763 415 1 369 opus-4-6 tool_use 23:39:06 48178 427 1 96 opus-4-6 tool_use 23:39:09 48605 393 1 77 opus-4-6 tool_use 23:39:13 48605 639 1 152 opus-4-6 tool_use 23:40:17 11374 38207 3 362 opus-4-6 end_turn 23:41:35 49581 438 3 766 opus-4-6 end_turn 23:43:02 0 0 8 1 haiku-4-5-20251001 max_tokens <-- another session 23:43:23 11374 40201 3 97 opus-4-6 tool_use 23:43:30 51575 122 1 310 opus-4-6 tool_use 23:43:35 51697 408 1 152 opus-4-6 tool_use 23:43:41 52105 219 93 170 opus-4-6 tool_use 23:43:49 52324 442 1 259 opus-4-6 tool_use 23:43:54 52766 558 1 102 opus-4-6 tool_use 23:44:08 53324 2593 1 403 opus-4-6 end_turn 23:51:33 0 0 8 1 haiku-4-5-20251001 max_tokens 23:52:30 55917 431 3 187 opus-4-6 tool_use 23:54:20 56348 292 37 284 opus-4-6 tool_use 23:54:29 56640 612 158 492 opus-4-6 tool_use 23:54:54 60657 13 3 508 opus-4-6 end_turn 23:58:58 60670 717 2 454 opus-4-6 tool_use 23:59:05 61387 847 1 336 opus-4-6 tool_use 23:59:23 62234 1282 1 674 opus-4-6 tool_use 23:59:34 63516 1024 1 506 opus-4-6 tool_use 23:59:45 64540 583 1 264 opus-4-6 tool_use 23:59:53 65123 284 1 393 opus-4-6 tool_use 00:00:10 65407 470 1 887 opus-4-6 tool_use 00:03:07 65877 1024 1 871 opus-4-6 tool_use 00:03:16 66901 2098 1 538 opus-4-6 tool_use 00:03:25 68999 1492 1 379 opus-4-6 tool_use 00:03:36 70491 1043 1 640 opus-4-6 tool_use 00:03:43 71534 704 1 233 opus-4-6 tool_use 00:03:51 72238 250 1 148 opus-4-6 tool_use 00:03:58 72488 355 1 249 opus-4-6 tool_use 00:04:03 72843 396 1 259 opus-4-6 tool_use 00:04:10 73239 435 1 278 opus-4-6 tool_use 00:04:31 73674 359 1 941 opus-4-6 tool_use 00:04:45 74033 1595 1 662 opus-4-6 tool_use 00:05:00 75628 914 1 830 opus-4-6 tool_use 00:05:18 76542 1610 1 963 opus-4-6 tool_use 00:05:31 78152 1379 1 640 opus-4-6 tool_use 00:05:41 79531 1374 1 549 opus-4-6 tool_use 00:05:51 80905 1576 1 550 opus-4-6 tool_use 00:06:06 82481 629 1 986 opus-4-6 tool_use 00:06:20 83110 1102 1 994 opus-4-6 tool_use 00:06:30 84212 1074 1 578 opus-4-6 tool_use 00:06:48 85286 2418 1 854 opus-4-6 tool_use 00:07:01 87704 880 1 555 opus-4-6 tool_use 00:07:13 88584 818 1 601 opus-4-6 tool_use 00:07:30 89402 2165 1 520 opus-4-6 end_turn 00:11:02 91567 542 3 691 opus-4-6 tool_use 00:11:12 92109 777 1 733 opus-4-6 tool_use 00:11:25 92886 1042 1 578 opus-4-6 tool_use 00:11:39 93928 3347 1 694 opus-4-6 tool_use 00:12:21 98211 39 3 442 opus-4-6 end_turn 00:13:24 98250 462 3 161 opus-4-6 tool_use 00:13:32 98712 274 1 178 opus-4-6 end_turn 00:15:06 98986 190 3 237 opus-4-6 tool_use 00:15:37 99176 362 1 1202 opus-4-6 tool_use 00:15:42 6637 16169 2 114 opus-4-6 tool_use 00:15:44 99538 1427 1 280 opus-4-6 tool_use 00:15:47 22806 3503 1 160 opus-4-6 tool_use 00:15:49 100965 2929 1 52 opus-4-6 end_turn <-- my joy is great at this point 00:15:50 26309 191 1 91 opus-4-6 tool_use 00:15:54 26500 217 1 92 opus-4-6 tool_use 00:15:57 26717 152 1 88 opus-4-6 tool_use 00:16:02 26869 607 3 125 opus-4-6 tool_use 00:16:06 27476 146 1 166 opus-4-6 tool_use 00:16:09 27622 430 1 105 opus-4-6 tool_use 00:16:14 28052 123 1 109 opus-4-6 tool_use 00:16:19 28052 542 1 176 opus-4-6 tool_use 00:16:23 28594 440 1 95 opus-4-6 tool_use 00:16:28 29034 120 1 112 opus-4-6 tool_use 00:16:32 29154 139 1 180 opus-4-6 tool_use 00:16:37 29293 256 1 206 opus-4-6 tool_use 00:17:09 29549 1065 1 82 opus-4-6 tool_use 00:17:12 30614 131 1 117 opus-4-6 tool_use 00:17:24 30745 135 1 79 opus-4-6 tool_use 00:17:28 30745 237 1 100 opus-4-6 tool_use 00:17:35 30982 140 1 138 opus-4-6 tool_use 00:17:39 31122 222 1 112 opus-4-6 tool_use 00:17:46 31344 226 1 257 opus-4-6 tool_use 00:17:51 31570 288 1 125 opus-4-6 tool_use 00:17:56 31858 214 1 128 opus-4-6 tool_use 00:18:01 32072 316 1 139 opus-4-6 tool_use 00:18:04 32388 487 1 116 opus-4-6 tool_use 00:18:09 32875 134 1 125 opus-4-6 tool_use 00:19:29 33009 143 1 5925 opus-4-6 tool_use 00:19:35 33152 5912 1 113 opus-4-6 tool_use <-- yup, was at 100% usage at this point 00:19:36 0 0 0 0 - - 00:19:36 0 0 0 0 - - 10:08:54 11374 58355 3 270 opus-4-6 tool_use <-- costly resume, but cache TTL = 1h in claude code 10:09:00 69729 538 1 277 opus-4-6 tool_use 10:09:07 70267 7086 180 188 opus-4-6 tool_use 10:09:14 77353 436 1 201 opus-4-6 tool_use 10:09:18 77789 219 1 118 opus-4-6 tool_use 10:09:34 78008 518 1 183 opus-4-6 tool_use 10:10:40 78526 444 1 95 opus-4-6 tool_use 10:10:46 78970 1213 1 185 opus-4-6 tool_use 10:10:55 80183 996 1 270 opus-4-6 tool_use 10:13:03 81179 602 1 268 opus-4-6 tool_use 10:18:08 81781 675 1 121 opus-4-6 tool_use 10:18:14 82456 148 1 226 opus-4-6 tool_use 10:29:13 82604 823 1 184 opus-4-6 tool_use 10:29:19 83427 889 1 239 opus-4-6 tool_use ---------- ---------- ---------- ------- ----- ------------------ ------------ If you provide me with means, I can send you full request/response dumps *- no idea if this cache breaking was due to me inspecting binary or some historical tool change happened on the background level. ``` ### Steps to Reproduce Write "cch=00000" in command line and ask claude what does he see. He still should see "cch=00000". And token usage should be all "cache read" mostly, not "cache write" for subsequent requests. Step to temporarily fix: `npx @anthropic-ai/claude-code@2.1.34` // you need to fix it on older version to benefit from it ### Claude Model Opus ### Is this a regression? Yes, this worked in a previous version ### Last Working Version Based on reports: 2.1.67 ### Claude Code Version 2.1.86 (Claude Code) ### Platform Anthropic API ### Operating System Ubuntu/Debian Linux ### Terminal/Shell Other ### Additional Information Similar issue: https://github.com/anthropics/claude-code/issues/34629 - this one relates to immediate start of conversation Tool I wrote for debugging: https://gitlab.com/treetank/cc-diag Verification script: https://gitlab.com/treetank/cc-diag/-/raw/c126a7890f2ee12f76d91bfb1cc92612ae95284e/test_cache.py
[BUG] Prompt cache regression in --print --resume since v2.1.69(?): cache_read never grows, ~20x cost increase
已打开 12:42PM - 15 Mar 26 UTC 已关闭 01:26AM - 01 Apr 26 UTC cinniezra bug has repro platform:linux area:cost regression### Preflight Checklist - [x] I have searched [existing issues](https://github.…com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? ## Summary `--print --resume` sessions stopped caching conversation turns between API calls starting around v2.1.69. Only Claude Code's internal system prompt (~14.5k tokens) is cached; all conversation history is `cache_create`d from scratch on every message. **This causes a ~20x cost increase per message compared to v2.1.68.** ## Environment - **Platform:** Ubuntu (Hetzner VPS) - **Use case:** Discord bot using `claude --print --model <model> --resume <session-id> --output-format stream-json --verbose` with prompts piped via stdin - **Tested models:** `claude-opus-4-6[1m]`, `opus`, `claude-opus-4-5-20251101` The regression is version-dependent, not model-dependent. ## Suspect Something in newer updates after 2.1.68 may have inadvertently broken cache breakpoint placement for `--print --resume` sessions. ## Workaround Pinned to v2.1.68 (`npm install -g @anthropic-ai/claude-code@2.1.68`). ### What Should Happen? ## Expected behavior (v2.1.68) `cache_read` grows as conversation accumulates, `cache_create` drops to a small delta (~800 tokens): ``` Message 1: cache_read=13,997 cache_create=22,946 cost=$0.15 (cold start) Message 2: cache_read=32,849 cache_create=4,636 cost=$0.05 Message 3: cache_read=36,846 cache_create=879 cost=$0.03 Message 4: cache_read=37,295 cache_create=802 cost=$0.02 ``` ## Actual behavior (v2.1.76 and likely earlier versions after v2.1.68) `cache_read` is stuck at ~14.5k (Claude Code's system prompt only), `cache_create` equals the full conversation size and grows every message: ``` Message 1: cache_read=14,569 cache_create=54,437 cost=$0.35 Message 2: cache_read=14,569 cache_create=55,084 cost=$0.35 Message 3: cache_read=14,569 cache_create=55,512 cost=$0.35 Message 4: cache_read=14,569 cache_create=55,733 cost=$0.36 Message 5: cache_read=14,569 cache_create=55,954 cost=$0.36 ``` The conversation turns are never reused from cache between calls. Only Claude Code's internal system prompt (~14.5k tokens) caches successfully. ### Error Messages/Logs ```shell ## Testing matrix All tests used fresh session UUIDs and back-to-back messages (well within the 5-minute cache TTL): | Version | Model | Context | cache_read grows? | Steady-state cost/msg | |---------|-------|---------|-------------------|----------------------| | 2.1.68 | `opus` | 200k | **Yes** | ~$0.02 | | 2.1.68 | `claude-opus-4-6[1m]` | 1M | **Yes** | ~$0.02 | | 2.1.76 | `opus` | 200k | **No (stuck at 14.5k)** | ~$0.04-0.40 (grows) | | 2.1.76 | `claude-opus-4-6[1m]` | 1M | **No (stuck at 14.5k)** | ~$0.35-0.40 | | 2.1.76 | `claude-opus-4-5-20251101` | 200k | **No (stuck at 14.5k)** | ~$0.04-0.40 (grows) | ``` ### Steps to Reproduce ## Reproduction 1. Run `claude --print --resume <session-id> --output-format stream-json --verbose` with a prompt via stdin 2. Send 3+ messages to the same session 3. Observe `cache_read_input_tokens` and `cache_creation_input_tokens` in the stream-json `result` output ### Claude Model Opus ### Is this a regression? Yes, this worked in a previous version ### Last Working Version 2.1.68 ### Claude Code Version 2.1.76 ### Platform Other ### Operating System Ubuntu/Debian Linux ### Terminal/Shell Other ### Additional Information This report (including the testing matrix) was written by Claude Code during a debugging session.
Reddit post 里没提到的几个相关 issue:
[BUG] Client-side rate limiter blocks requests with zero API calls when conversation transcript is large (~74MB) — false rate_limit error with synthetic model and 0 input/output tokens
已打开 01:21PM - 29 Mar 26 UTC rwp65 bug has repro area:core### Preflight Checklist - [x] I have searched [existing issues](https://github.…com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? After hours of inactivity in a long-running session, every new message from the user immediately returns `"API Error: Rate limit reached"` without making any API call. The error is generated client-side by Claude Code, not by the Anthropic API. The user cannot proceed with any work — every message, including simple ones like "proceed", triggers the same error. ### What Should Happen? After hours of inactivity, the rate limit budget should have fully reset. A simple message should be sent to the API and receive a normal response. ### Error Messages/Logs ```shell Session log: `~/.claude/projects/-home-rich-RE6D/7137463d-be5d-4d5e-a97d-bb12b5e44b58.jsonl` **Six consecutive blocked requests between 13:11:09 and 13:11:28 UTC on 2026-03-29:** Each error entry has this structure: { "type": "assistant", "message": { "model": "<synthetic>", "role": "assistant", "usage": { "input_tokens": 0, "output_tokens": 0, "cache_creation_input_tokens": 0, "cache_read_input_tokens": 0 }, "content": [ { "type": "text", "text": "API Error: Rate limit reached" } ] }, "error": "rate_limit", "isApiErrorMessage": true } Key observations: | Field | Value | Significance | |-------|-------|-------------| | `model` | `"<synthetic>"` | NOT a real API response — generated by Claude Code client | | `input_tokens` | `0` | No tokens were sent to the API | | `output_tokens` | `0` | No tokens were received from the API | | `cache_read_input_tokens` | `0` | No cache was accessed | | `isApiErrorMessage` | `true` | Claude Code flagged this as an API error | | `error` | `"rate_limit"` | Client-side classification | **Contrast with the first successful request after the user persisted (13:11:37 UTC):** { "model": "claude-opus-4-6", "usage": { "input_tokens": 3, "cache_creation_input_tokens": 1315, "cache_read_input_tokens": 668864, "output_tokens": 1, "service_tier": "standard" } } ``` ### Steps to Reproduce 1. Run a Claude Code session for multiple days with heavy agent usage (many subagent dispatches, large code changes) 2. Accumulate a conversation transcript of ~74MB (the `.jsonl` file grows as the session continues) 3. Leave the session idle for several hours 4. Send any message (e.g., "proceed") 5. Observe: immediate `"API Error: Rate limit reached"` with no actual API call ### Claude Model Opus ### Is this a regression? Yes, this worked in a previous version ### Last Working Version _No response_ ### Claude Code Version 2.1.81 ### Platform Anthropic API ### Operating System Other Linux ### Terminal/Shell Xterm ### Additional Information # Bug Report: Client-side rate limiter blocks requests with zero API calls when conversation transcript is large ## Title Client-side rate limiter blocks requests with zero API calls when conversation transcript is large (~74MB) — false rate_limit error with synthetic model and 0 input/output tokens ## Environment - **Claude Code Version:** 2.1.81 - **OS:** Ubuntu Linux 6.17.0-19-generic - **Shell:** bash - **Model:** claude-opus-4-6 (1M context) - **Platform:** CLI (`entrypoint: "cli"`) - **Session ID:** 7137463d-be5d-4d5e-a97d-bb12b5e44b58 ## Description After hours of inactivity in a long-running session, every new message from the user immediately returns `"API Error: Rate limit reached"` without making any API call. The error is generated client-side by Claude Code, not by the Anthropic API. The user cannot proceed with any work — every message, including simple ones like "proceed", triggers the same error. ## Steps to Reproduce 1. Run a Claude Code session for multiple days with heavy agent usage (many subagent dispatches, large code changes) 2. Accumulate a conversation transcript of ~74MB (the `.jsonl` file grows as the session continues) 3. Leave the session idle for several hours 4. Send any message (e.g., "proceed") 5. Observe: immediate `"API Error: Rate limit reached"` with no actual API call ## Expected Behavior After hours of inactivity, the rate limit budget should have fully reset. A simple message should be sent to the API and receive a normal response. ## Actual Behavior Claude Code's client-side rate limiter blocks the request before it reaches the Anthropic API. The user sees `"API Error: Rate limit reached"` and cannot use the tool at all. ## Evidence from Logs Session log: `~/.claude/projects/-home-rich-RE6D/7137463d-be5d-4d5e-a97d-bb12b5e44b58.jsonl` **Six consecutive blocked requests between 13:11:09 and 13:11:28 UTC on 2026-03-29:** Each error entry has this structure: ```json { "type": "assistant", "message": { "model": "<synthetic>", "role": "assistant", "usage": { "input_tokens": 0, "output_tokens": 0, "cache_creation_input_tokens": 0, "cache_read_input_tokens": 0 }, "content": [ { "type": "text", "text": "API Error: Rate limit reached" } ] }, "error": "rate_limit", "isApiErrorMessage": true } ``` Key observations: | Field | Value | Significance | |-------|-------|-------------| | `model` | `"<synthetic>"` | NOT a real API response — generated by Claude Code client | | `input_tokens` | `0` | No tokens were sent to the API | | `output_tokens` | `0` | No tokens were received from the API | | `cache_read_input_tokens` | `0` | No cache was accessed | | `isApiErrorMessage` | `true` | Claude Code flagged this as an API error | | `error` | `"rate_limit"` | Client-side classification | **Contrast with the first successful request after the user persisted (13:11:37 UTC):** ```json { "model": "claude-opus-4-6", "usage": { "input_tokens": 3, "cache_creation_input_tokens": 1315, "cache_read_input_tokens": 668864, "output_tokens": 1, "service_tier": "standard" } } ``` This successful request shows `cache_read_input_tokens: 668,864` — the session context is approximately **668K tokens**. This is likely what the client-side rate limiter is counting against the budget. ## Root Cause Hypothesis The client-side rate limiter appears to calculate the token cost of the next request by estimating the context size (668K+ tokens) and checking it against a per-minute or per-hour token budget. For very large sessions, the CONTEXT ALONE may exceed the rate limit budget — even though the user's actual message is just a few tokens. This creates a situation where: - The session grows over days of heavy use - The context window fills with conversation history - Eventually the context size exceeds the rate limit's per-window token budget - Every subsequent request is blocked client-side, regardless of actual API availability - The user is permanently locked out until they start a new session ## Session Size Data | Metric | Value | |--------|-------| | Session transcript file | 74,019,933 bytes (74MB) | | Estimated context tokens | 668,864 (from cache_read_input_tokens) | | Session duration | ~4 days (2026-03-25 to 2026-03-29) | | Subagents dispatched | 50+ over the session | | Session compactions | Multiple (context was compressed during the session) | ## Impact - **Severity:** High — user is completely blocked from using Claude Code - **Workaround:** Start a new session (loses all conversation context) - **User experience:** Extremely frustrating — the error message gives no indication that the session size is the problem, and retrying makes it worse (each retry attempt may count against the budget) ## Suggested Fix 1. **Don't count cached/context tokens against the rate limit budget** — the user isn't "using" more tokens by having a long session. The cache is already paid for. 2. **If rate limiting must include context, reset the budget after idle periods** — hours of inactivity should fully reset any per-minute/per-hour budget. 3. **Show a more helpful error message** — instead of "API Error: Rate limit reached", show "Session context is very large (668K tokens). Consider starting a new session with `/compact` or a fresh session." 4. **Distinguish client-side rate limiting from API rate limiting** — the current message is identical for both, making it impossible for the user to diagnose.
[BUG] Silent context degradation — tool results cleared without notification on 1M context sessions this issue documents three separate mechanisms (microcompact, cached microcompact, session memory compact)
已打开 11:50AM - 02 Apr 26 UTC Sn3th bug has repro platform:linux area:core### Preflight Checklist - [x] I have searched [existing issues](https://github.…com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? Silent context degradation — tool results cleared without notification on 1M context sessions What's happening - Deal breaker for Claude Opus not sour grapes but you literally pushing us back to GPT , Deepseek, GLM and Kimi 2.5 with this! I have been fighting dumber agents for the past 48 hours! Since ~v2.1.89/2.1.90, tool result content from earlier in a session is being silently replaced with `[Old tool result content cleared]`. No compaction notification is shown. No `/compact` was triggered. The agent and user have no indication this is happening. Sessions do heavy tool use — 50+ file reads, greps, bash commands per session. We're observing: - Token counter showing ~80k on a 1M context window, with early tool results already gone - Agent making confident statements from internalised summaries because the source material was silently stripped - Context visibly shrinking across multiple independent sessions (persistent and one-shot) - No autocompact threshold was hit (we set `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70`, threshold would be ~686k) Root cause (from source) Three mechanisms in `src/services/compact/` run silently on every API call: 1. Time-based microcompact (`microCompact.ts:422`) — if the gap since the last assistant message exceeds a threshold, old tool results are content-cleared. Threshold comes from GrowthBook (`getTimeBasedMCConfig()`). 2. Cached microcompact (`microCompact.ts:305`) — uses `cache_edits` API to delete old tool results from server cache. Count-based trigger/keep thresholds from GrowthBook (`getCachedMCConfig()`). 3. Session memory compact (`sessionMemoryCompact.ts:57`) — runs before autocompact, keeps only ~40k tokens of recent messages. Gated by `tengu_session_memory` GrowthBook flag. None of these show any UI notification. None trigger PreToolUse/PostToolUse hooks. The user sees no "Compacting..." message. Problem: - No transparency. The agent and user don't know context is being stripped. There's no opt-in, no notification, no setting to control it. - `DISABLE_AUTO_COMPACT=true` doesn't help. It only disables autocompact — microcompact still runs on every API call. - `DISABLE_COMPACT=true` is a sledgehammer. It kills manual `/compact` too, which we rely on. - GrowthBook controls mean this changed server-side without any CLI update or changelog entry. We didn't enable any new features — the behaviour appeared on its own. - 1M context is the product we're paying for. If the effective usable context is 40-80k due to silent trimming, the value proposition of Claude Max with 1M context is fundamentally undermined. Environment: - Claude Code v2.1.90 - Opus 4.6 with `[1m]` context - `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70` ### What Should Happen? Fix: 1. An env var to disable microcompact independently (e.g. `DISABLE_MICROCOMPACT=true`) — let users who want their full context window keep it 2. UI notification when tool results are cleared — same as autocompact shows "Compacting..." 3. Changelog transparency when GrowthBook flags change context management behaviour — silent server-side changes to how context is managed are not acceptable for paying customers running production workloads 4. A `--no-microcompact` flag or settings.json option — let power users opt out without losing manual `/compact` ### Error Messages/Logs ```shell Error message?! the result speaks for itself. the agent is effectively dumber. "microcompact independently" ``` ### Steps to Reproduce Reproduce: 1. Start a session with `--model opus[1m]` 2. Read 20+ files, run grep/bash commands, do multi-file work 3. After ~30-40 minutes of tool-heavy work, try to reference file contents from the first 10 minutes 4. The agent will paraphrase from memory rather than quote — the original tool results are gone (Agent distilled effectively useless now!) 5. Token counter will show far less than expected for the amount of work done (dishonest!) ### Claude Model Opus ### Is this a regression? Yes, this worked in a previous version ### Last Working Version v2.1.81 ### Claude Code Version v2.1.90 ### Platform Anthropic API ### Operating System Ubuntu/Debian Linux ### Terminal/Shell Other ### Additional Information Guys... this is a deal breaker for me! the Opus has been heavily distilled and effectively broken! there was no need for this change it adds absolutely no value what so ever.
同一天,Anthropic 的员工之一 Lydia Hallie 在 X/Twitter 上承认了存在额度消耗过快的问题。并已展开排查:
https://x.com/lydiahallie/status/2038686571676008625
Anthropic 也在 r/Anthropic 的 subreddit 上发布了类似官方声明:
https://www.reddit.com/r/Anthropic/comments/1s7zfap/investigating_usage_limits_hitting_faster_than/
经过几天的发酵和社交媒体上用户海量的抱怨,关注到此事的 BBC 昨天也进行了报道:
Claude Code users hitting usage limits 'way faster than expected'
Anthropic, the company behind the AI coding assistant, said it was fixing a problem blocking users.
时间线大概就是如此。
但是仅知道问题出在哪还不够。作为用户,我们目前可以采取的应对手段是什么呢?
- 首先,Anthropic 昨天发布了 v2.1.91,这个新版本部分解决了 #40524 和 #34629。所以第一步应该是尽快升级至 v2.1.91 (可pin)。
卸载官方推荐的独立二进制(bun runtime)ELF,使用 NPM 包进行安装使用,以避免sentinal replacement污染cache prefix,消耗起飞 。- 定期开启新会话。
- 避免使用恢复会话,包括
--continue, --continue --dangerously-skip-permissions, /resume,这会导致cache_read -> 0以及cache_creation,消耗起飞 。 - 避免使用
/dream和/insights,后台 API 调用也会导致 消耗起飞 。
- 祈祷 A/ 做个人 ,麻利地修好 bug,重置额度,降落消耗。
站内类似贴:
某位佬三天前也发过 skibidi-toaleta-2137 那篇 Reddit post 的总结:
Claude code /resume后的缓存失效问题 开发调优看到了顺便转过来。这个原文是claude写的就不贴过来了。 https://www.reddit.com/r/ClaudeAI/comments/1s7mkn3/psa_claude_code_has_two_cache_bugs_that_can/ 验证脚本 https://gitlab.com/treetank/cc-diag/-/raw/c126a7890f2ee12f76d91bfb…
某位佬两天前也提出了类似的应对措施:
针对 Claude Code 额度掉的飞快的一些优化建议 开发调优最近我自己的 Claude Max 的额度消耗巨快。看了网上一些帖子,可能与缓存失效的bug有关,部分应该也是由于 Claude Opus 升级 1M 上下文后带来的上下文负担。 目前感觉 5 小时的使用额度,还没用到 2 小时就消耗殆尽了(max 5x)。 我跟 Claude Code聊了几轮,让它自己去网上调研这个现象的原因,总结得到以下优化建议(纯手打): 1)尽量避免直接使用 /re…
来源:
以上信息部分来自于 ArkNill 整理的 claude-code-cache-analysis 报告。
对于 Root Cause 分析和 Benchmark 感兴趣的可在此阅览该报告:
GitHub - ArkNill/claude-code-cache-analysis: Measured analysis of Claude Code cache bugs...
Measured analysis of Claude Code cache bugs causing 10-20x token inflation on Max plans
希望能帮到佬们~
– 04/03 上午更新 –
写完这篇帖子后,发现 v2.1.91 已经发布了。在最新版本中,可以继续使用 Anthropic 推荐的独立二进制文件,不再需要 NPM 安装使用了。
同时 Anthropic 官方经过多天排查,终于给出了最新“回应”。
Reddit 上 r/ClaudeAI 的官方回应 在此。
以及 Lydia Hallie 几乎相同内容的 X/Twitter 贴和翻译:
asdasdasdasdad23qq3214sfd - Copy706×298 29.5 KB
asdasdasdkajsdhakjdhaskjdk231983742897askjdhakjdhakj - Copy1169×147 17.6 KB
adsasdasd2343245sdfdsfd232df23 - Copy706×593 57.9 KB
asdadsakjdhaskdjadkjad2398472984sdjkdhfskjfh239847skdjfhskjdfhs - Copy1161×395 72.5 KB
不出意外,社区对于这个 gaslighting 回应非常满意:
https://reddit.com/r/claude/comments/1satc4f/the_biggest_gaslighting_in_ai_history_anthropic/
asdasdasdadsjhaskjdh123h1kjhkjahsdkjahk2j1h3kjashdkjah - Copy.PNG721×1090 132 KB
言尽于此,只能留下这六字真言,聊表心意:
。
原帖 ↩︎
--【壹】--:
。
--【贰】--:
会导致缓存失效,缓存归0,恢复的会话,上下文又大
--【叁】--:
奇了怪了,我平时就很关注这个问题,但你提的那些帖子我自己从来没刷到过
--【肆】--: paguro:
避免使用恢复会话,包括
--continue, --continue --dangerously-skip-permissions, /resume,这会导致cache_read -> 0以及cache_creation,消耗起飞
这个为什么不能用啊
--【伍】--:
其实,我也是先在其它社区观察到这个问题的。
可能是因为站内很多佬们主要是使用中转服务而不是 Claude 的官方订阅?所以这类帖子能见度/曝光度被稀释了?
毕竟 Claude 订阅针对中国的风控是出了名的难搞,尤其是 Max plan。
--【陆】--:
所以cursor为啥也消耗的飞快了呢
--【柒】--:
RNM 退钱!!!
--【捌】--:
三句话就没了

