Anthropic 必須持續修改技術面試測驗，以防範利用 Claude 作弊

Techcrunch·3 個月前

Anthropic 在技術面試流程中面臨挑戰，因為像 Claude 這樣先進的 AI 模型現已超越人類應試者，迫使該公司不斷修改其帶回家測驗，以防止 AI 輔助作弊。

Topics

Latest

Amazon

Apps

Biotech & Health

Climate

Cloud Computing

Commerce

Crypto

Enterprise

EVs

Fintech

Fundraising

Gadgets

Gaming

Google

Government & Policy

Hardware

Instagram

Layoffs

Media & Entertainment

More from TechCrunch

Staff

Events

Startup Battlefield

StrictlyVC

Newsletters

Podcasts

Videos

Partner Content

TechCrunch Brand Studio

Crunchboard

Posted:

Anthropic has to keep revising its technical interview test so you can’t cheat on it with Claude

Since 2024, Anthropic’s performance optimization team has given job applicants a take-home test to make sure they know their stuff. But as AI coding tools have gotten better, the test has had to change a lot to stay ahead of AI-assisted cheating.

Team lead Tristan Hume described the history of the challenge in a blog post on Wednesday. “Each new Claude model has forced us to redesign the test,” Hume writes. “When given the same time limit, Claude Opus 4 outperformed most human applicants. That still allowed us to distinguish the strongest candidates — but then, Claude Opus 4.5 matched even those.”

The result is a serious candidate-assessment problem. Without in-person proctoring, there’s no way to ensure someone isn’t using AI to cheat on the test — and if they do, they’ll quickly rise to the top. “Under the constraints of the take-home test, we no longer had a way to distinguish between the output of our top candidates and our most capable model,” Hume writes.

The issue of AI cheating is already wreaking havoc at schools and universities around the world, so ironic that AI labs are having to deal with it too. But Anthropic is also uniquely well-equipped to deal with the problem.

In the end, Hume designed a new test that had less to do with optimizing hardware, making it sufficiently novel to stump contemporary AI tools. But as part of the post, he shared the original test to see if anyone reading could come up with a better solution.

“If you can best Opus 4.5,” the post reads, “we’d love to hear from you.”

Topics

Plan ahead for the 2026 StrictlyVC events. Hear straight-from-the-source candid insights in on-stage fireside sessions and meet the builders and backers shaping the industry. Join the waitlist to get first access to the lowest-priced tickets and important updates.

Newsletters

Subscribe for the industry’s biggest tech news

Every weekday and Sunday, you can get the best of TechCrunch’s coverage.

TechCrunch Mobility is your destination for transportation news and insight.

Startups are the core of TechCrunch, so get our best coverage delivered weekly.

Provides movers and shakers with the info they need to start their day.

By submitting your email, you agree to our Terms and Privacy Notice.

Blue Origin schedules third New Glenn launch for late February, but not to the moon

Spotify brings AI-powered Prompted Playlists to the U.S. and Canada

Snapchat gives parents new insights into teens’ screen time and friends

Latest in AI

Anthropic has to keep revising its technical interview test so you can’t cheat on it with Claude

Spotify brings AI-powered Prompted Playlists to the U.S. and Canada

Quadric rides the shift from cloud AI to on-device inference — and it’s paying off

— Techcrunch

其他收藏 · 0

你的個人知識庫

Anthropic 必須持續修改技術面試測驗，以防範利用 Claude 作弊

Topics

More from TechCrunch

Anthropic has to keep revising its technical interview test so you can’t cheat on it with Claude

Newsletters

Related

Blue Origin schedules third New Glenn launch for late February, but not to the moon

Spotify brings AI-powered Prompted Playlists to the U.S. and Canada

Snapchat gives parents new insights into teens’ screen time and friends

Latest in AI

Anthropic has to keep revising its technical interview test so you can’t cheat on it with Claude

Spotify brings AI-powered Prompted Playlists to the U.S. and Canada

Quadric rides the shift from cloud AI to on-device inference — and it’s paying off