2024年回顧：前15名文章的深度評論

Lesswrong·3 個月前

我們正在延長 2024 年度回顧的討論階段，以鼓勵在最終投票開始前，對排名靠前的文章進行更多深入且具批判性的評估。

我們正在延長 2024 年度回顧（Annual Review）的討論階段。

我特別希望看到的一點，是針對目前看來很有可能進入前 10 名左右的文章，能有更多深入的評論（尤其是批判性的評論）。（理想情況下是針對整個前 50 名，但對那些獲得最多贊同的想法進行認真的評估，似乎特別有價值）。

粗略瀏覽目前的評論和熱門文章後，我認為這些文章需要比現在得到更透徹、更具評估性的關注。

今年關於我們應該更在乎「文章的優點」還是「文章的不足」，已經有過一些爭論。理想情況下，我們希望文章既優秀又沒有重大缺陷；而在我理想的世界中，年度回顧能促使那些有顯著錯誤的文章得到修正。

在實踐中，人們對於什麼算作錯誤，或者哪些錯誤特別嚴重存在分歧。有些錯誤作者可以很快修復，有些則比較棘手，或者作者可能並不認同那在多大程度上算是一個錯誤。

我們目前找到的解決方案是讓評論在「Best Of LessWrong」文章中更加突出，並努力實現這樣一個目標：如果一篇論文存在重大分歧、爭議或重要的考量因素，未來的人們將能看到這些分歧。

目前，我們透過在任何出現「精選（Spotlight）」的地方包含一行評論來實現這一點。隨著時間推移，我們可能會在這方面投入更多。

這也意味著，如果你寫了一篇獲得 10 個以上業力（karma）的評論，那麼優化第一行內容可能很值得，以便向隨意瀏覽網站的人傳達你想讓他們看到的資訊。

你可以查看目前的評論者排行榜，了解你的哪些評論可能值得進一步潤飾。^([1])

如果你知道有人已經針對某些作品寫了部落格文章或其他公開回應，那麼寫一篇簡短的評論並附上連結，說明其最重要的啟示，將會非常有幫助。

截至 1 月 1 日的前 50 名

我們不會持續更新提名投票的結果，以防止策略性投票。但是，以下是幾週前的結果。

你可能想留意是否有你認為被高估或低估的文章，並想撰文支持。

文章至少需要 1 則評論才能進入最終投票階段。

#	標題	評論者
1	Alignment Faking in Large Language Models	ryan_greenblatt, Jan_Kulveit, johnswentworth, Marcelo Tibau
2	On green	Joe Carlsmith, Raymond Douglas
3	Believing In	AnnaSalamon, Ben Pace
4	The hostile telepaths problem	Valentine, Valentine, Martin Randall, Ruby, Gordon Seidoh Worley, Hugo L, Lucie Philippon
5	Reliable Sources: The Story of David Gerard	TracingWoodgrains, Thomas Kwa, Screwtape
6	Overview of strong human intelligence amplification methods	TsviBT, kave
7	The case for ensuring that powerful AIs are controlled	ryan_greenblatt, Alexa Pan
8	The impossible problem of due process	mingyuan, AnnaSalamon
9	Deep atheism and AI risk	Joe Carlsmith, 尚無評論^([2])
10	Neutrality	sarahconstantin, AnnaSalamon
11	And All the Shoggoths Merely Players	Zack_M_Davis, Martin Randall, Seth Herd
12	Truthseeking is the ground in which other principles grow	Elizabeth, Sherrinford
13	My Clients, The Liars	ymeskhout, Screwtape
14	Gentleness and the artificial Other	Joe Carlsmith, 尚無評論^([2])
15	"How could I have thought that faster?"	mesaoptimizer, 尚無評論^([2])
16	Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)	Andrew_Critch, Alexa Pan, Zac Hatfield-Dodds, the gears to ascension
17	What Goes Without Saying	sarahconstantin, Zac Hatfield-Dodds
18	Repeal the Jones Act of 1920	Zvi, Thomas Kwa
19	Would catching your AIs trying to escape convince AI developers to slow down or undeploy?	Buck, Alexa Pan
20	On attunement	Joe Carlsmith, Zac Hatfield-Dodds
21	There is way too much serendipity	Malmesbury, Gordon Seidoh Worley
22	The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!	abstractapplic, Screwtape, abstractapplic
23	Towards a Broader Conception of Adverse Selection	Ricki Heicklen, eggsyntax, Nathan Young
24	The Field of AI Alignment: A Postmortem, and What To Do About It	johnswentworth, Thomas Kwa
25	Arithmetic is an underrated world-modeling technology	dynomight, Ben Pace
26	“Alignment Faking” frame is somewhat fake	Jan_Kulveit, Jan_Kulveit
27	Simple versus Short: Higher-order degeneracy and error-correction	Daniel Murfet, Daniel Murfet, Zack_M_Davis, niplav
28	'Empiricism!' as Anti-Epistemology	Eliezer Yudkowsky, Ben Pace
29	Catching AIs red-handed	ryan_greenblatt, Buck
30	A Three-Layer Model of LLM Psychology	Jan_Kulveit, Gunnar_Zarncke, Jan_Kulveit
31	My AI Model Delta Compared To Christiano	johnswentworth, Ruby
32	Superbabies: Putting The Pieces Together	sarahconstantin, 尚無評論^([2])
33	AI catastrophes and rogue deployments	Buck, Buck
34	Circular Reasoning	abramdemski, plex
35	Thresholding	Review Bot, Screwtape
36	Interpreting Quantum Mechanics in Infra-Bayesian Physicalism	Yegreg, Vanessa Kosoy
37	Preventing model exfiltration with upload limits	ryan_greenblatt, Noosphere89
38	shortest goddamn bayes guide ever	lemonhope, Screwtape
39	Transformers Represent Belief State Geometry in their Residual Stream	Adam Shai, Adam Shai
40	Why I’m not a Bayesian	Richard_Ngo, Richard_Ngo, Richard Korzekwa, Nathan Young
41	Hierarchical Agency: A Missing Piece in AI Alignment	Jan_Kulveit, Vanessa Kosoy
42	Being nicer than Clippy	Joe Carlsmith, 尚無評論^([2])
43	You don't know how bad most things are nor precisely how they're bad.	Solenoid_Entity, Solenoid_Entity
44	Struggling like a Shadowmoth	Raemon, Raemon
45	Why Don't We Just... Shoggoth+Face+Paraphraser?	Daniel Kokotajlo, Daniel Kokotajlo
46	The Inner Ring by C. S. Lewis	Saul Munn, Nathan Young
47	Anvil Shortage	Screwtape, Screwtape, Thomas Kwa, Lorxus
48	Raising children on the eve of AI	juliawise, 尚無評論^([2])
49	Priors and Prejudice	MathiasKB, Screwtape
50	[Intuitive self-models] 6. Awakening / Enlightenment / PNSE	Steven Byrnes, lsusr

^(^)（註：這些業力分數已扣除你自己的點贊）
^(^)沒有評論的文章將不會出現在最終投票階段

— Lesswrong

其他收藏 · 0

你的個人知識庫

2024年回顧：前15名文章的深度評論

截至 1 月 1 日的前 50 名