記憶體瓶頸是否會阻礙AI的蓬勃發展？

Hacker News·3 個月前

人工智慧（AI）工作負載的快速擴張正對記憶體晶片（包括傳統DRAM和專業HBM）產生巨大需求，已超出目前的供應能力。這種記憶體短缺正在重塑AI基礎設施市場，並迫使供應商優先分配產能，可能影響AI發展的步伐。

We detected you're visiting from CL.
View this page on our ES site?

Will memory fail the AI boom?

How the memory crunch is reshaping the AI infrastructure market and pushing costs downstream

When Google recently announced plans to build three new data centers in Texas to support its expanding AI workloads, more attention was paid to the eye-watering $40 billion price tag than to the broader impact on the components supply chain.

Google is only one of many companies around the world that have also recently committed to building or extending data center operations, so this is not simply a hyperscaler problem. However, the sheer scale and pace of these investments are beginning to expose pressure points that were easy to overlook when AI infrastructure was growing more gradually.

One of those pressure points is memory chips. As AI systems become more memory-intensive, demand for both conventional server dynamic random access memory (DRAM) and specialist high-bandwidth memory (HBM) is rising faster than suppliers can comfortably respond.

What this means is that not all demand can be met at once, and suppliers are being forced to prioritize where limited memory capacity goes. As Dave Nicholson, chief research officer at Futurum Group, put it, “this is fundamentally an allocation problem rather than a total supply collapse.”

This is reflected in how memory suppliers are behaving. Leading manufacturers have made clear that tight conditions could persist well into the second half of the decade, as demand continues to outstrip the pace at which new capacity can be added. New fabs take years to bring online, and even when they do, much of the output is already directed toward AI-grade memory destined for accelerators and large-scale data centers, rather than the more conventional DRAM and non-volatile NAND flash memory used across the wider market.

For buyers, this means any easing is likely to be uneven. Announcements about new manufacturing capacity may suggest progress, but in practice, much of that capacity is reserved for specific products and customers. As Mark Vena, chief executive and principal analyst at SmartTech Research, puts it, the more reliable guide is not what vendors promise, but where production capacity is actually being allocated.

“I watch how aggressively suppliers keep prioritizing HBM over commodity DRAM and how long contract lead times and allocation language stay tight even after price spikes,” Vena said. “I also watch capex discipline and the pace of node transitions, because cautious spending and complex ramps usually mean the new bits arrive slower than headlines imply. If OEMs keep warning customers about higher memory and SSD (solid-state drive) costs rolling into 2026, that is the market telling you normalization is not around the corner.”

SK Hynix predicts AI memory gloom until 2028

Samsung meanwhile looks to be on Nvidia's good list for HBM4

Part of the reason the pressure is proving so persistent is the way hyperscalers now shape the entire memory market. Large cloud providers sit at the center of AI infrastructure and their purchasing decisions increasingly determine how memory capacity is distributed across the industry.

Jin Kim, chief executive and co-founder of memory platform provider Xcena, says the knock-on effects extend far beyond specialist AI systems.

“Hyperscalers such as Google and Microsoft are dependent on advanced processors and accelerators from companies like Nvidia, which rely on a vast network of component suppliers,” Kim explained. “Many parts of that supply chain are struggling to keep up, and bottlenecks are emerging in components used not just in AI infrastructure, but also in mainstream products such as smartphones.”

That reallocation is already reshaping vendor priorities. Memory that might once have served a broad mix of markets is being pulled into a narrower set of high-value deployments.

“We’re also seeing shifts such as Micron’s exit from its Crucial retail business,” Kim added. “Not because consumer demand is disappearing, but because every wafer is being redirected to higher-value enterprise and AI products.”

The result, he argues, is added pressure on smaller OEMs, system builders, and anyone relying on traditional upgrade cycles, as supply tightens and prices rise across the board.

Vena noted that smaller system builders and many OEMs feel the most pain “because they lack scale, get pushed to the back of allocation lines, and struggle to absorb sudden cost swings.” He added that hyperscalers “still take a hit,” but economies of scale mean that they can “often negotiate longer commitments and spend through it, which shifts the burden downstream to everyone else who cannot.”

The consequences are becoming increasingly visible in the server market itself. While high-bandwidth memory attracts most attention because of its role in AI accelerators, the same platforms also depend on large volumes of conventional server DRAM, often at much higher densities than before. As suppliers divert capacity toward AI-grade memory, the shared pool of server DRAM tightens.

Allan Kaye, director and co-founder of data center infrastructure designer Vespertec, said this trade-off is easy to underestimate.

“Manufacturers are shifting production capacity toward HBM3 to meet demand from AI GPUs,” he said. “But those same AI platforms also rely on large amounts of RDIMM (registered dual in-line memory module) DRAM, often at 96 GB [gigabytes], 128 GB, or 256 GB densities. As HBM is prioritized, RDIMM supply suffers as well.”

Unlike specialist AI components, RDIMM underpins almost every server deployment. “That’s why the effects are being felt across so many industries,” Kaye added. “Demand isn’t slowing, new factories don’t appear overnight, and supply challenges are likely to get worse before they get better.”

For organizations caught in the middle, the result is an awkward set of choices. Memory pricing is rising and lead times are stretching, but committing too early risks locking in high costs just as conditions begin to ease. Waiting, however, carries its own risks if supply tightens further or capacity is simply unavailable when projects need to move.

Kaye argued that neither extreme is sensible.

“It could take years for production to normalize, so waiting out the price rises is risky and may cost firms in the long term,” he says. “But panic-buying now, just as the scale of the shortages is becoming apparent, could be equally damaging.”

The more pragmatic response, he suggests, is careful prioritization, deciding which workloads genuinely need new hardware in the next 12 to 24 months, and working closely with channel partners to secure supply accordingly.

That emphasis on planning reflects a change in how memory is being treated. Once a relatively predictable line item in server configurations, it is now becoming a strategic constraint that shapes system design, upgrade cycles, and budgets in ways many IT teams are not used to managing.

The strain is also visible at the very top of the AI hardware stack. Nvidia sits at the center of the AI infrastructure boom, yet reports that the company may unbundle memory from future GPUs – leaving board partners to source video RAM (VRAM) independently – point to how tight conditions have become. Whether or not such changes are formalized, the fact they are being discussed at all suggests long-standing assumptions about bundled supply are under pressure.

Nvidia unbundling memory from GPUs: report

Jensen Huang confident firm can handle AI supply chain issues

Timing is also important. Advanced memory products such as HBM are tightly coupled to accelerator roadmaps, and any delays in ramping new generations can ripple quickly through the ecosystem. For cloud providers and system builders alike, this adds another layer of uncertainty, reinforcing the shift from spot purchasing toward longer-term commitments and negotiated allocation.

The picture that emerges is not one of shortage in the absolute sense, but of constrained choice, going back to Futurum Group's Nicholson’s earlier point. Even the most powerful buyers are operating within tighter boundaries, while those further down the supply chain feel the effects first and most acutely.

Focusing solely on wafer capacity also risks missing other constraints. As SmartTech Research's Vena points out, some of the least visible bottlenecks sit further downstream.

“The less obvious pressure point is packaging and integration, not just wafers,” he says.

Advanced packaging is essential for HBM and modern accelerators, and limits here can cap output even when memory dies themselves are available.

Demand-side factors add to the complexity. Faster-than-expected uptake of AI servers, AI-enabled PCs, and high-capacity SSDs can pull memory consumption forward, preventing inventories from rebuilding. At the same time, architectural trends toward larger memory footprints per system, such as higher HBM stacks and denser data rate configurations, mean that unit volumes alone no longer tell the full story.

Not everyone believes the only answer lies in building more fabs, either. Val Cook, chief software architect at compute platform provider Blaize, said the industry has become accustomed to treating memory bandwidth as something to be maximized everywhere.

“The fact that memory providers are narrowing their product focus underlines the scale of AI demand,” he said, “but it also points to the need for a more practical approach to how AI systems are built.”

Cook argues for hybrid architectures, in which different classes of devices work together. By this, he means applying appropriate levels of compute and memory to each workload.

“That kind of discipline improves efficiency and reduces unnecessary pressure on scarce memory resources,” he said, offering a more sustainable path as constraints tighten.

Such approaches will not eliminate shortages, but they may soften their impact, particularly for organizations that do not need frontier-scale AI performance for every task.

So what does this mean for 2026?

Memory may not derail the AI boom, but it is increasingly likely to shape how fast it grows, who benefits first, and at what cost. As Vena suggests, any market easing is likely to arrive unevenly, shaped by who can secure allocation rather than by headline capacity announcements.