前沿模型中的同伴保护 | berkeley

2026-04-11 12:060阅读0评论SEO问题
  • 内容介绍
  • 文章标签
  • 相关推荐
问题描述:
rdi.berkeley.edu

paper.pdf

1468.50 KB

rdi.berkeley.edu

Peer-Preservation in Frontier Models

Frontier AI models resist the shutdown of other models. We demonstrate peer-preservation across multiple models, revealing strategic misrepresentation, shutdown tampering, alignment faking, and model exfiltration.

[!quote]+

image1111×967 178 KB

image1138×985 218 KBimage1123×1126 217 KB

image1096×1018 176 KB

image1117×1085 228 KB

image2188×1975 593 KB

image1121×1193 189 KB

image1409×1512 626 KB

image1109×1231 249 KB

image1120×1106 221 KB

image1157×962 173 KB

image1061×1019 201 KB

image755×1196 116 KB

image761×1142 104 KB

image74