讓人工智慧真正實用的枯燥工作

Hacker News·3 個月前

文章指出，儘管頂尖的AI模型正迅速商品化，但其真正的商業價值不在於模型本身，而在於與外部工具和數據的整合。這種將AI連接到企業特定情境的「枯燥工作」才是企業應當關注的焦點。

Dead Neurons

The Boring Work That Makes AI Actually Useful

Frontier models are ready. Your tools aren't.

Here’s a boring observation that I think matters: in the last eight weeks, we got Claude Opus 4.5, GPT-5.2, and Gemini 3 Pro. All three are remarkably capable. All three score within a few percentage points of each other on most benchmarks. Claude leads on SWE-bench at 80.9%, GPT-5.2 leads on abstract reasoning at 54.2% on ARC-AGI-2, Gemini leads on multimodal understanding. The differences are real but increasingly marginal for most practical purposes.

The model layer is commoditising. This is good news if you’re building products, and it clarifies where enterprises should actually focus their attention.

Without tools, AI is just a chatbot

An LLM on its own can answer questions about things it learned during training. That’s useful, but it’s not transformative for a business. The model doesn’t know your customers, your inventory, your calendar, your pricing, or your internal processes. It’s intelligent but ignorant of your context.

What makes AI genuinely useful is the ability to call tools. All three frontier providers now heavily emphasise function calling, MCP integrations, and agentic execution. They’re essentially saying: we’ll provide the reasoning engine, you provide the tools that represent your business.

There’s an ongoing debate about whether MCP is the right standard, whether it’s overengineered, who should own it. This is missing the point. The protocol is incidental. What matters is that your business capabilities are callable by an AI somehow. Get that right and you can swap transport layers later. Get it wrong and no protocol will save you.

This reframes the whole AI adoption question. It’s less about “which model should we use?” and more about “what tools should we build?”

The two kinds of tools

When you break it down, there are really only two things tools do. They gather context, or they cause actions. Read, or write. That’s it.

Context tools answer questions about the state of the world: what’s in the database, what’s on the calendar, what did the customer say, what’s the current status of this order, what does our inventory look like. Action tools change the state of the world: send the email, update the record, create the ticket, place the order, schedule the meeting.

The AI sits in the middle, reasoning about what context it needs, gathering it through tool calls, deciding what to do, then executing actions through more tool calls. The intelligence comes from the model. The usefulness comes from the tools.

AI adoption is really systems integration

This might be slightly deflationary, but I think the practical work of AI adoption in enterprises is mostly systems integration work. It’s exposing existing business capabilities in a way that’s AI-accessible.

What does your CRM know? Wrap that in a function. What can your ERP do? Wrap that in a function. What reports do people run manually? What lookups do they do fifty times a day? What actions do they take based on those lookups? Each of these is a candidate tool.

The irony is that a lot of this is work companies should have been doing anyway. Clean APIs, well-documented services, accessible data, sensible abstractions over legacy systems. AI just creates a forcing function to finally do it properly.

The new job for technical leaders

Someone in the organisation needs to be doing a new kind of audit. Where do people look things up? Where do people take actions? What systems hold the truth? What decisions get made repeatedly based on predictable inputs?

This work sits awkwardly between IT, ops, and product. It requires someone to walk around asking unglamorous questions like “what do you look up all day?” and “what do you do with that information once you have it?” The answers to those questions become your tool roadmap.

You don’t need to boil the ocean. Start with the high-frequency, low-complexity stuff. The things people do dozens of times a day that are boring and procedural. Those make good early tools because they’re easy to define, easy to test, and the value is immediate and measurable.

The models are ready, waiting

The reasoning capabilities of frontier models are already good enough. GPT-5.2 scores 70.9% on GDPval, outperforming human professionals on well-specified knowledge work tasks across 44 occupations. Claude Opus 4.5 can maintain context across 30+ hour autonomous operations. Gemini 3 can ingest a million tokens of context.

These models know how to plan, reason, and call tools. That capability is just sitting there. The bottleneck isn’t model intelligence. The bottleneck is that most organisations haven’t built anything for the models to call.

The businesses that get on with doing the unglamorous work of exposing their capabilities as tools will be positioned to benefit from every future model improvement automatically. The ones that just “use ChatGPT” as a better search box will wonder what all the fuss was about.

No posts

Ready for more?

— Hacker News

其他收藏 · 0