NVIDIA 透過 DGX Spark 與 Reachy Mini 賦能 AI 代理

Huggingface·4 個月前

NVIDIA 在 CES 2026 上發布了包括 DGX Spark 和 Reachy Mini 在內的新開放模型，旨在讓使用者能夠創建可在線上和實體世界運作的個人化 AI 代理。

NVIDIA brings agents to life with DGX Spark and Reachy Mini

Today at CES 2026, NVIDIA unveiled a world of new open models to enable the future of agents, online and in the real world. From the recently released NVIDIA Nemotron reasoning LLMs to the new NVIDIA Isaac GR00T N1.6 open reasoning VLA and NVIDIA Cosmos world foundation models, all the building blocks are here today for AI Builders to build their own agents.

But what if you could bring your own agent to life, right at your desk? An AI buddy that can be useful to you and process your data privately?

In the CES keynote today, Jensen Huang showed us how we can do exactly that, using the processing power of NVIDIA DGX Spark with Reachy Mini to create your own little office R2D2 you can talk to and collaborate with.

This blog post provides a step-by-step guide to replicate this amazing experience at home using a DGX Spark and Reachy Mini.

Let’s dive in!

Ingredients

If you want to start cooking right away, here’s the source code of the demo.

We’ll be using the following:

Feel free to adapt the recipe and make it your own - you have many ways to integrate the models into your application:

Giving agentic powers to Reachy

Turning an AI agent from a simple chat interface into something you can interact with naturally makes conversations feel more real. When an AI agent can see through a camera, speak out loud, and perform actions, the experience becomes more engaging. That’s what Reachy Mini makes possible.

Reachy Mini is designed to be customizable. With access to sensors, actuators, and APIs, you can easily wire it into your existing agent stack, by simulation or real hardware controlled directly from Python.

This post focuses on composing existing building blocks rather than reinventing them. We combine open models for reasoning and vision, an agent framework for orchestration, and tool handlers for actions. Each component is loosely coupled, making it easy to swap models, change routing logic, or add new behaviors.

Unlike closed personal assistants, this setup stays fully open. You control the models, the prompts, the tools, and the robot’s actions. Reachy Mini simply becomes the physical endpoint of your agent where perception, reasoning, and action come together.

Building the agent

In this example, we use the NVIDIA NeMo Agent Toolkit, a flexible, lightweight, framework-agnostic open source library, to connect all the components of the agent together. It works seamlessly with other agentic frameworks, like LangChain, LangGraph, CrewAI, handling how models interact, routing inputs and outputs between them, and making it easy to experiment with different configurations or add new capabilities without rewriting core logic. The toolkit also provides built-in profiling and optimization features, letting you track token usage efficiency and latency across tools and agents, identify bottlenecks, and automatically tune hyperparameters to maximize accuracy while reducing cost and latency.

Step 0: Set up and get access to models and services

First, clone the repository that contains all the code you’ll need to follow along:

To access your intelligence layer, powered by the NVIDIA Nemotron models, you can either deploy them using NVIDIA NIM or vLLM, or connect to them through remote endpoints available at build.nvidia.com.

The following instructions assume you are accessing the Nemotron models via endpoints. Create a .env file in the main directory with your API keys. For local deployments, you do not need to specify API keys and can skip this step.

Step 1: Build a chat interface

Start by getting a basic LLM chat workflow running through NeMo Agent Toolkit’s API server. NeMo Agent Toolkit supports running workflows via nat serve and providing a config file. The config file passed here contains all the necessary setup information for the agent, which includes the models used for chat, image understanding, as well as the router model used by the agent. The NeMo Agent Toolkit UI can connect over HTTP/WebSocket so you can chat with your workflow like a standard chat product. In this implementation, the NeMo Agent Toolkit server is launched on port 8001 (so your bot can call it, and the UI can too):

Next, verify that you can send a plain text prompt through a separate terminal to ensure everything is setup correctly:

Reviewing the agent configuration, you’ll notice it defines far more capabilities than a simple chat completion. The next steps will walk through those details.

Step 2: Add NeMo Agent Toolkit’s built-in ReAct agent for tool calling

Tool calling is an essential part of AI agents. NeMo Agent Toolkit includes a built-in ReAct agent that can reason between tool calls and use multiple tools before answering. We route “action requests” to a ReAct agent that’s allowed to call tools (for example, tools that trigger robot behaviors or fetch current robot state).

Some practical notes to keep in mind:

Take a look at this portion of the config it defines the tools (like Wikipedia search) and specifies the ReAct agent pattern used to manage them.

Step 3: Add a router to direct queries to different models

The key idea: don’t use one model for everything. Instead, route based on intent:

You can implement routing a few ways (heuristics, a lightweight classifier, or a dedicated routing service). If you want the “production” version of this idea, the NVIDIA LLM Router developer example is the full reference implementation and includes evaluation and monitoring patterns.

A basic routing policy might work like this:

These sections of the config define the routing topology and specify the router model.

NOTE: If you want to reduce latency/cost or run offline, you can self-host one of the routed models (typically the “fast text” model) and keep the VLM remote. One common approach is serving via NVIDIA NIM or vLLM and pointing NeMo Agent Toolkit to an OpenAI-compatible endpoint.

Step 4: Add a Pipecat bot for real-time voice + vision

Now we go real time. Pipecat is a framework designed for low-latency voice/multimodal agents: it orchestrates audio/video streams, AI services, and transports so you can build natural conversations. In this repo, the bot service is responsible for:

You will find all the pipecat bot code in the reachy-personal-assistant/bot folder.

Step 5: Hook everything up to Reachy (hardware or simulation)

Reachy Mini exposes a daemon that the rest of your system connects to. The repo runs the daemon in simulation by default (--sim). If you have access to a real Reachy you can remove this flag and the same code will control your robot.

Run the full system

You will need three terminals to run the entire system:

If you are using the physical hardware, remember to omit the --sim flag from the command.

If the NeMo Agent Toolkit service is not already running from Step 1, start it now in Terminal 3.

Once all the terminals are set up, there are two main windows to keep track of:

Reachy Sim – This window appears automatically when you start the simulator daemon in Terminal 1. This is applicable if you’re running Reachy mini simulation in place of the physical device.

Pipecat Playground – This is the client-side UI where you can connect to the agent, enable microphone and camera inputs, and view live transcripts. In Terminal 2, open the URL exposed by the bot service: http://localhost:7860/. Click “CONNECT” in your browser. It may take a few seconds to initialize, and you’ll be prompted to grant microphone (and optionally camera) access.

Once both windows are up and running:

At this point, you can start interacting with your agent!

Try these example prompts

Here are a few simple prompts to help you test your personal assistant. You can start with these and then experiment by adding your own to see how the agent responds!

Text-only prompts (routes to the fast text model)

Vision prompts (routes to the VLM)

Where to go next

Instead of a "black-box" assistant, this builds a foundation for a private, hackable system where you can control both the intelligence and the hardware. You can inspect, extend, and run it locally, with full visibility into data flow, tool permissions, and how the robot perceives and acts.

Depending on your goals, here are a few directions to explore next:

Want to try it right away? Deploy the full environment here. One click and you're running.

你的個人知識庫