LiveKit 完成 1 億美元 C 輪融資，旨在打造語音 AI 時代的基礎架構

Hacker News·3 個月前

LiveKit 宣布完成 1 億美元 C 輪融資，公司估值達 10 億美元。此輪融資由 Index Ventures 領投，旨在支持 LiveKit 打造快速發展的語音 AI 領域的基礎架構。

Series C: Towards the voice-driven era of computing

Today, I get the privilege of announcing LiveKit’s Series C. With this funding round, we’ve reached an important milestone: a $1 billion valuation. Index Ventures is leading the $100M investment, joined by Salesforce Ventures, Hanabi Capital, and our longtime supporters Altimeter and Redpoint Ventures.

To our customers, users, OSS contributors, investors, and core team: thank you, we wouldn’t be here without you.

Voice AI: The coming wave

Voice is the most natural interface we have—it’s the one we use with each other every day. And for the first time in history, we can interact with computers in the same way.

When we announced our Series B in April 2025, voice AI had gone from a feature inside ChatGPT to thousands of applications across financial services, healthcare, retail, customer support, education, and robotics. Startups were building voice agents that could perform tasks like processing claims, tutoring students, triaging patients, supporting customers, and interviewing candidates.

Today, large enterprises are evaluating and building voice agents to automate workflows, improve customer experiences, and unlock new revenue. While many of these use cases are still in the proof-of-concept stage, some are moving into production and operating at real scale: Agentforce voice agents run customer support for the world's top brands and Tesla uses voice AI for sales, support, insurance, and roadside assistance.

We anticipate 2026 will be the year voice AI will be broadly deployed across thousands of use cases around the world.

But there's still a lot to build to support this new paradigm of computing.

A new kind of application

Voice AI applications are not like web applications. The protocol underlying every web application is HTTP, which was designed for reliably moving text data between computers. Every HTTP request is independent and stateless, meaning that a web backend by default has no historic information. When needed, web applications load state from a database, which may take additional time.

For an application you can talk to like a person, traditional web infrastructure breaks.

Voice AI applications are realtime and stateful. A conversation with a voice agent might last a few minutes or a few hours and the agent is continuously listening, thinking, and responding while maintaining context across the entire session.

That shift at the application layer, from using a keyboard and mouse to speaking with a voice agent, changes everything underneath. You can’t build your application the same way. You can’t test it the same way. You can’t deploy and run it the same way. You can’t monitor it the same way. The whole stack has to be rebuilt for realtime, stateful applications with human-native interfaces.

We’re building that stack. Every piece of it, designed to work together seamlessly.

The agent development lifecycle

Agents are still applications, and like all applications, they go through a set of stages from design to production:

Build

Like web applications, voice AI applications have a frontend and a backend. To build the former, LiveKit offers client SDKs across every single platform.

On the backend, LiveKit Agents—modeled from our work on ChatGPT Voice Mode and downloaded over 1M times a month—gives you full programmatic control over agent orchestration, access to hundreds of AI model integrations, and automatically handles conversational dynamics like turn detection and interruptions.

Not everyone wants to start in code though. Sometimes it’s preferable to quickly start from a template, tune your prompts or sketch out a workflow, and share a link with friends or colleagues—for that vibe, we recently launched Agent Builder.

Test and Evaluate

AI models are stochastic. Given the same input, they don't produce the same output, thus you can't write simple assertions against non-deterministic code. You have to test these systems statistically, the same way we evaluate human performance through exams and interviews.

LiveKit Agents now includes support for writing unit tests against your agent code, and you can wire up traces to OpenTelemetry for deeper analysis.

For simulation (the AI version of an integration test), where a voice agent calls another voice agent and runs through thousands of conversations permuting prompt, language, and voice attributes to build statistical confidence in agent behavior, we've partnered with Bluejay, Hamming, and Roark. In parallel, we’re also exploring how to integrate simulation more directly into LiveKit’s platform.

Deploy and Run

While deploying a web application and voice agent both involve pushing code to the cloud, the similarities end there. Your user might speak with an agent for an indeterminate amount of time and there may be unplanned spikes in demand across your user base. This requires a different approach for capacity, connection, and change management, load balancing, and failover. Earlier this year, we released serverless agents to make agent deployment turnkey for builders everywhere.

Voice AI applications also require new network infrastructure, purpose-built for transporting voice data with as low latency as possible between your agent and wherever your users are located. We’ve built out a global network of data centers that act as a unified fabric optimized for routing voice and video data. Today, our network handles billions of calls a year between voice agents and users, across web and mobile applications and phone calls.

Recently, we’ve partnered with telephony carriers around the world to link LiveKit’s network directly to the PSTN. This enables us to deliver the lowest latency experience when speaking to a voice agent over the phone.

Once your voice agent is on a live call, there are usually multiple models strung together for every conversational turn: speech-to-text, turn and interruption detection, LLM, and text-to-speech. These models may be running on the same server or data center as your agent code, but most of the time they’re accessed by your agent via cloud API from a model provider. Model providers host their models in different regions around the world, potentially far away from your agent. Sometimes a provider’s service gets backlogged and inference requests queue up, other times a provider’s service goes down. These orchestration challenges can negatively affect end-to-end latency, making agent conversations choppy and unreliable.

LiveKit Inference abstracts away much of this complexity. The same system we built to monitor and route voice in real time can also route inference between model providers. We’ve also started to host our models across our data centers so they’re colocated with the agents you deploy to LiveKit Cloud.

Observe

Until we launched Agent Observability, there was no Datadog-equivalent built for voice agents—no way to truly understand how an agent and user are interacting on a live call. How long did it take for the agent to answer the call? What did the agent “hear” when the user specified the prescription drug they’ve been taking? Did the user press the “0” to speak to a human operator? What was the average turn latency across the call? Did the agent invoke the correct tool to schedule an appointment for the user?

Answering these questions through session replays, traces, time-aligned transcripts of conversations, and error logs is the critical final phase of the lifecycle. As a developer, armed with these learnings and insights, you can then go back to the Build step, modify your agent code, and run through another iteration.

Building the voice AI stack

Voice is one of the biggest paradigm shifts in computing. It’s still early—and it starts where voice is already the interface: phone calls, cars, and smart speakers.

Over the next few years, as new form factors emerge and models get better at turn-taking, tool use, and reliability, voice-native applications will move from novelty to default. Software will feel less like something you navigate, and more like something you delegate to.

LiveKit is building the development stack and runtime between foundation models and end-user applications. Our goal is simple: make building and scaling voice AI as easy as building and scaling on the web.

This round helps us move faster. We’re excited to build the voice-driven era of computing with our customers, community, and partners.

LiveKit SDK for ESP32: bringing voice AI to embedded devices

Since the launch of the LiveKit Agents framework, we’ve seen developers build voice AI experiences on web pages, mobile apps, and even embedded Linux devices like the Raspberry Pi Zero 2W. But we kept getting asked: can LiveKit run on even smaller microcontrollers like the ESP32? Can you build

Introducing the Grok Voice Agent API in partnership with xAI

Every day, millions of people around the world talk to Grok via first-party apps and in Tesla vehicles. The underlying model that brings Grok to life is a voice-to-voice model which understands the expressive range of human speech, and can generate correspondingly expressive responses; it can laugh and whisper and

Improved end-of-turn model cuts Voice AI interruptions 39%

We're excited to release a new iteration of our transformer-based end-of-turn detection model, v0.4.1-intl, which pushes the boundaries of accuracy and responsiveness. This update focuses on detecting speech completion for structured inputs and better generalization across languages.

The latest MultilingualModel has been deployed to agents running

Product

Developers

Company

— Hacker News