推出WP-Bench：WordPress AI基準測試

Hacker News·3 個月前

WordPress AI團隊推出了WP-Bench，這是一個官方基準測試，旨在評估語言模型對WordPress開發的理解程度，涵蓋從核心API到插件架構和安全實踐的各個方面。

WordPress AI

Welcome!

The WordPress AI Team is dedicated to exploring and coordinating artificial intelligence projects across the WordPress ecosystem.

Get Involved

Whether you’re an engineer, designer, researcher, or just curious about AI, we’d love to have you involved as we shape the future of AI in WordPress.

Here’s how you can join us:

What’s Next

We use Slack for real-time communication. Contributors live all over the world, so there are discussions happening at all hours of the day.

Meetings alternate Thursday at 16:00 UTC in the #core-ai channel on Slack.

Team Members

#core-ai

Introducing WP-Bench: A WordPress AI Benchmark

How well do language models actually understand WordPress? To answer this, we’re introducing WP-Bench – the official WordPress AI benchmark.

WP-Bench evaluates how well AI models understand WordPress development, from coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. APIs and coding standards to pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party architecture and security best practices.

Why WP-Bench Matters

WordPress powers over 40% of the web, yet AI models are typically evaluated on general programming tasks. WP-Bench fills this gap by measuring WordPress-specific capabilities.

Understanding today’s models. Whether you’re building AI-powered plugins or using coding assistants, knowing which models excel at WordPress helps you make better tooling decisions.

Shaping tomorrow’s models. We want WP-Bench to become a standard benchmark that AI labs use when developing new models. When providers like OpenAI, Anthropic, and Google run pre-release evaluations, we want WordPress performance on their radar – not as an afterthought. This creates incentive to optimize for the millions of developers and site owners who depend on WordPress.

Building an open sourceOpen Source Open Source denotes software for which the original source code is made freely available and may be redistributed and modified. Open Source must be delivered via a licensing model, see GPL. leaderboard. We’re working toward a public leaderboard tracking model performance on WordPress tasks. This will provide transparent results for the community, inform how the WordPress project engages with AI providers, and help developers choose the right tools for their projects.

How It Works

WP-Bench measures AI capabilities across two dimensions:

The benchmark uses WordPress itself as the grader, running generated code in a sandboxed environment. This ensures we measure both theoretical understanding and practical abilityAbility A registered, self-documenting unit of WordPress functionality that can be discovered and invoked through multiple contexts (REST API, Command Palette, MCP). Includes authorization and input/output specifications. to produce working, standards-compliant code.

Current State & Known Limitations

WP-Bench is an early release, and we’re being transparent about where it needs work:

These limitations are exactly why we’re releasing now rather than waiting. We know that the WordPress community is uniquely positioned to help build a robust, representative benchmark.

Quick Start

Configure your model providerProvider An AI service offering models for generation, embeddings, or other capabilities (e.g., Anthropic, Google, OpenAI). API keys in a .env file, and results are written to output/results.json. The harness supports running multiple models in a single pass for easy comparison.

Supporting the AI Building Blocks

WP-Bench complements the other AI Building Blocks for WordPress by measuring how well AI models work with WordPress. As we build out the Abilities API, MCP AdapterMCP Adapter Translates WordPress abilities into Model Context Protocol format, allowing AI assistants like Claude and ChatGPT to discover and invoke WordPress capabilities as tools, resources, and prompts., and other infrastructure, a standardized benchmark helps ensure these tools integrate with the best available models.

Get Involved

WP-Bench needs your help. The benchmark is only as good as its test cases, and the WordPress community has decades of collective knowledge about what makes WordPress development challenging.

Ways to contribute:

If you work at an AI lab, we’d love to collaborate on integrating WP-Bench into your evaluation pipeline.

Resources:

Our goal is for WP-Bench to become the standard evaluation AI providers use when releasing new models – creating a virtuous cycle where WordPress performance improves with each generation. Join us in #core-ai to discuss, share results, and help shape the future of AI in WordPress.

Props to @jason_the_adams for leading development on WP-Bench.

#ai-building-blocks, #core-ai

Like this:

Site resources

AI Chat

Note: All chats happen on Slack.The AI Team meets every other Thursday at 4:00 PM UTC. You can view all upcoming meetings here.

Agendas | Summaries

Email Updates

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Type your email…

Recent Activity

Team Pledges

130 people have pledged time to contribute to Core AI Team efforts! When looking for help on a project or program, try starting by reaching out to them!

— Hacker News

你的個人知識庫

推出WP-Bench：WordPress AI基準測試

Welcome!

Get Involved

What’s Next

Team Members

Related

Introducing WP-Bench: A WordPress AI Benchmark

Why WP-Bench Matters

How It Works

Current State & Known Limitations

Quick Start

Supporting the AI Building Blocks

Get Involved

Share this:

Like this:

Related

Leave a Reply Cancel reply

Post navigation

Site resources

AI Chat

Email Updates

Recent Activity

Team Pledges