METR AI基準:釐清時間跨度的局限性

METR AI基準:釐清時間跨度的局限性

Hacker News·

這篇來自Hacker News AI的文章探討了METR AI基準的時間跨度測量,承認了批評和誤解。身為原始論文的主要作者之一,作者旨在釐清該方法的局限性以及有證據支持的結論。

Image

Image

Notes

Rough/unpolished research updates and speculation

Image

Clarifying limitations of time horizon

In the 9 months since the METR time horizon paper (during which AI time horizons have increased by ~6x), it’s generated lots of attention as well as various criticisms. As one of the main authors, I often see various misinterpretations of our work. While I still believe in the core results, I believe that many people to some extent both overstate the precision of our time horizon measurements and draw conclusions I don’t think the evidence fully supports.

Therefore, I’d like to clarify some of my beliefs about limitations of our methodology and time horizon more broadly—and then clarify what I think are the key conclusions directly supported by our results.

Despite these limitations, what conclusions do I still stand by?

See e.g. DeepSeek R1 paper: https://arxiv.org/abs/2501.12948 ↩

Hacker News

相關文章

  1. 「視野」的「長度」

    Lesswrong · 6 個月前

  2. HN提問:AI的進展是如何衡量的?

    4 個月前

  3. METR 的 14 小時 50% 時間跨度指標對經濟的影響將超過通用人工智慧預測時程

    Lesswrong · 2 個月前

  4. 如何「操縱」METR圖表

    Lesswrong · 4 個月前

  5. 我們實際上正處於缺乏基準測試來界定人工智慧能力上限的困境

    Lesswrong · 16 天前