開放式回應：我們需要知道什麼

Huggingface·3 個月前

Hugging Face 推出「開放式回應」(Open Responses) 計畫，旨在取代現有的聊天完成格式，為代理式 AI 工作流程建立更強大的開放標準。此舉建立在 OpenAI 的回應 API 之上，以促進更廣泛的互通性和協作。

Open Responses: What we need to know

The era of the chatbot is long gone, and agents dominate inference workloads. Developers are shifting toward autonomous systems that reason, plan, and act over long-time horizons. Despite this shift, much of the ecosystem still uses the Chat Completion format, which was designed for turn-based conversations and falls short for agentic use cases. The Responses format was designed to address these limitations, but it is closed and not as widely adopted. The Chat Completion format is still the de facto standard despite the alternatives.

This mismatch between the agentic workflow requirements and entrenched interfaces motivates the need for an open inference standard. Over the coming months, we will collaborate with the community and inference providers to implement and adapt Open Responses to a shared format, practically capable of replacing chat completions.

Open Responses builds on the direction OpenAI has set with their Responses API launched in March 2025, which superseded the existing Completion and Assistants APIs with a consistent way to:

What is Open Responses?

Open Responses extends and open-sources the Responses API, making it more accessible for builders and routing providers to interoperate and collaborate on shared interests.

Some of the key points are:

What do we need to know to build with Open Responses?

We’ll briefly explore the core changes that impact most community members. If you want to deep dive into the specification, check out the Open Responses documentation.

Client Requests to Open Responses

Client requests to Open Responses are similar to the existing Responses API. Below we demonstrate a request to the Open Responses API using curl. We're calling a proxy endpoint that routes to Inference Providers using the Open Responses API schema.

Changes for Inference Clients and Providers

Clients that already support the Responses API can migrate to Open Responses with relatively little effort. The main changes are:

For Model Providers, implementing the changes for Open Responses should be straightforward if they already adhere to the Responses API specification. For Routers, there is now the opportunity to standardize on a consistent endpoint and support configuration options for customization where needed.

Over time, as Providers continue to innovate, certain features will become standardized in the base specification.

In summary, migrating to Open Responses will make the inference experience more consistent and improve quality as undocumented extensions, interpretations, and workarounds of the legacy Completions API are normalized in Open Responses.

You can see how to stream reasoning chunks below.

Here’s the difference between Open Response and Responses for reasoning deltas:

Open Responses for Routing

Open Responses distinguishes between “Model Providers” - those who provide inference, and “Routers” - intermediaries who orchestrate between multiple providers.

Clients can now specify a Provider along with provider-specific API options when making requests, allowing intermediary Routers to orchestrate requests between upstream providers.

Tools

Open Responses natively supports two categories of tools: internal and external. Externally hosted tools are implemented outside the model provider’s system. For example, client side functions to be executed, or MCP servers. Internally hosted tools are within the model provider’s system. For example, OpenAI’s file search or Google Drive integration. The model calls, executes, and retrieves results entirely within the provider's infrastructure, requiring no developer intervention.

Sub Agent Loops

Open Responses formalizes the agentic loop which is usually made up of a repeating cycle of reasoning, tool invocation, and response generation that enables models to autonomously complete multi-step tasks.

image source: openresponses.org

The loop operates as follows:

For internally-hosted tools, the provider manages the entire loop; executing tools, returning results to the model, and streaming output. This means that multi-step workflows like "search documents, summarize findings, then draft an email" use a single request.

Clients control loop behavior via max_tool_calls to cap iterations and tool_choice to constrain which tools are invocable:

json

The response contains all intermediate items: tool calls, results, reasoning.

Next Steps

Open Responses extends and improves the Responses API, providing richer and more detailed content definitions, compatibility, and deployment options. It also provides a standard way to execute sub-agent loops during primary inference calls, opening up powerful capabilities for AI Applications. We are looking forward to working with the Open Responses team and the community at large on future development of the specification.

![acceptance test][https://huggingface.co/huggingface/documentation-images/resolve/main/openresponses/image2.png]

You can try Open Responses with Hugging Face Inference Providers today. We have an early access version available for use on Hugging Face spaces - try it with your Client and Open Responses Compliance tool today!

你的個人知識庫