為機器人學、電腦視覺與實體人工智慧創建代理技能庫

Hacker News·3 個月前

本文探討了實體人工智慧的現狀與未來潛力，指出其在實際工業應用方面與演示效果之間存在的差距，並提出「代理技能庫」的概念作為解決方案。

Creating an Agentic Skill Library for Robotics, Computer Vision, and Physical AI

Listen

Physical AI is exploding.

The field is being pushed forward by several converging trends: the rising promise of Vision Language Action (VLA) models, Robotic Foundation Models and Reinforcement Learning (RL) policies trained on large-scale, GPU-parallelized simulation platforms. Together, these advances suggest a future where robots can adapt to real-world variation instead of breaking the moment something changes.

That promise is exciting, but as a hardcore roboticist, some natural questions emerge:

Is this the start of a new robotics revolution, or the beginning of a bubble?
Will VLAs deliver true general-purpose robot intelligence?
Can RL alone replace traditional robot programming?

Every week seems to bring a new demo and a new claim about autonomous robots (especially humanoids) solving everything from manufacturing to home assistance.

So we stepped away from the demos and went directly to factories.

We tried to solve some of the hardest industrial challenges we could find. One example was wire soldering: more than 700 variants of one-millimeter wires, all different colors, that needed to be inserted into small relay holes with a tolerance of 0.25 millimeters. Another was high-mix manufacturing, where production parts change constantly and automation systems must adapt every day.

The question became very concrete: can current VLAs handle real industrial and manufacturing problems?

Our tests revealed: not yet.

What we found led us to an opinionated perspective on the growth of Physical AI. Just like with Large Language Models (LLMs), the model by itself will not be enough. Over time, Robotics Foundation Models (RFMs) will continue to improve and become easier to access. Different VLAs will be nearly interchangeable. Policy networks trained with RL will be replaceable. AI models for robotics will be commoditized.

Our thesis: A Large-Scale Skill Library Powered by Physical AI Agents

Instead of betting on a single massive end-to-end policy, we believe the future of Physical AI will be built on reusable Skills, often known as robotic movement primitives.

A Skill is a modular unit of behavior that combines robot perception, motion planning, control, and decision logic to accomplish a specific physical task. Detect an object in a point cloud. Plan a grasp. Align a component. Execute a trajectory with force feedback. A Skill could be a RL policy, a VLA model, a classical visuo-motor policy or even a logic block. The key is that each Skill has a clear, consistent interface and can be reused across applications.

On top of this low-level Skill Library sits a Physical AI Agentic layer for long-horizon reasoning.

These Physical AI Agents, typically powered by Large Language Models (LLMs) or Vision Language Models (VLMs), are responsible for interpreting high-level goals and turning them into real-time task plans. Rather than trying to control the robot directly, these agents plan sequences of composable Skills and adapt based on feedback from the environment. This separation lets the AI handle strategic planning while the Skills provide reliable and safe execution.

Diving Deeper into our Skill Library and Physical AI Agents

If you’ve ever built a real robotics system, you know the feeling: you’re not just “building a robot”. You’re stitching together a patchwork ecosystem.

A perception library here. A motion planner there. A dataset tool somewhere else. A deep learning model that expects a format nothing else uses. A sensor driver written in a different decade. A control stack that assumes a different world model than your planner.

Robotics software is fragmented by nature. Different teams, different conventions, different APIs, different languages — and every integration point becomes a potential source of delay, bugs, and failure.

That’s where Telekinesis takes a different approach. Simple to describe, hard to execute, but game-changing when it works:

1. Make robotics capabilities modular, reusable, and consistent

In Telekinesis, the foundation is a Skill: a reusable, self-contained operation that performs a specific task in a Physical AI application . A Skill can span across perception, motion planning, control, and decision logic.

Under the hood, a Skill can use classical algorithms, deep learning, or foundation models, but the interface stays predictable so you can compose systems without rewriting integration glue.

2. Let AI plan across those Skills

On top of the Skill Library sits a Physical AI Agent, typically an LLM or VLM, that does the reasoning and task planning. These agents don’t control the robot directly. Instead, they:

In short: Agents don’t “do everything.” They orchestrate capabilities you can trust. The magic happens when AI and Skills work together, each doing what it does best.

3. Reduce the fear factor: simulate, debug, deploy

One reason robotics remains “expert-only” is simple: failure is expensive. Robots crash, production cells go down, hardware gets damaged, safety rules get complicated.

That’s why we invest heavily in tooling that makes iteration safe and fast. Simulation, real-time digital twins, and testing workflows let teams validate behavior before deploying on actual robots. We also focus on accessibility: workflows that translate intent into executable behavior make it possible to create robust behaviors without writing mountains of custom code.

The Telekinesis Developer SDK: The Skill and Physical AI Agents Library

Pursuing our thesis, we are developing a large scale Skill Library, known as the Telekinesis Developer SDK. The documentation to the Skill Library can be obtained in the following link:

Telekinesis Docs

Documentation of Telekinesis products.

docs.telekinesis.ai

Telekinesis Developer SDK is a large-scale skill library for Physical AI. It provides a unified, composable set of algorithms for robotics and computer vision, covering:

The SDK is designed so that roboticists, computer vision engineers, and hobbyists exploring Physical AI can access these capabilities without spending time integrating fragmented libraries, letting you focus on building solutions instead of patching together incompatible components.

The SDK is organized into modules. Each module contains Skills relevant to a particular domain of Physical AI. The modules are named after parts of the brain (for reasoning) and eyes (for perception). Here is a quick promo video of the Vitreous module:

The current list of modules of Skills include:

As a reader, this can be a lot, but don’t worry, we will gradually dive into all of these individual modules in different blog posts. Through this journey, you can get a complete overview of the Telekinesis universe for Physical AI.

The Telekinesis Physical AI Platform: Brainwave

On top of the SDK sits Brainwave: our Physical AI Cloud Platform.

In the brochure below we describe Brainwave which brings a user interface to our Skill Library and Physical AI Agents, combining a real-time digital twin, an AI-driven workflow for generating robot programs, a no-code Skill sequencing, and a plug-and-play asset library.

Brainwave

Brochure

t2xgx.share-eu1.hsforms.com

The Telekinesis Community

If you zoom out, the Telekinesis Developer SDK and Brainwave are just the beginning.

Our bigger vision is to build a vibrant community of contributors who help grow the Physical AI Skill ecosystem.

We want you to join us. Maybe you’re a researcher who just published a paper and built some code you’re proud of. Maybe you’re a hobbyist tinkering with robots in your garage. Maybe you’re an engineer tackling tough automation challenges every day. Whatever your background, if you have a Skill, whether it’s a perception module, a motion planner, or a clever robot controller, we want to see it.

The idea is simple: release your Skill, let others use it, improve it, and see it deployed in real-world systems. Your work could go from a lab or workshop into factories, helping robots do things that were previously too dangerous, repetitive, or precise for humans.

Our vision is about building something practical, reusable, and meaningful. Together, we can make robotics software accessible, scalable, and trustworthy. And we’d love for you to be part of it.

If you’re curious and want to explore:

Who we are?

We’re a team of passionate robotics and computer vision experts who care about the details. Industry veterans who know the frustration of systems that just don’t work. All of us asking the same question: why is robotics still so hard to use?

Telekinesis began as a spin-off from the Intelligent Autonomous Systems Lab at TU Darmstadt, led by Prof. Jan Peters, and is supported by people with years of experience at KUKA and Universal Robots.

If you’re curious, want to experiment, or have a Skill to share, join us, together we can make robotics more accessible and unlock what humans and robots can build together.

Written by Telekinesis AI

Any robot. Any task. One Physical AI platform. Web: https://telekinesis.ai/ LinkedIn: https://www.linkedin.com/company/telekinesis-ai/

Responses (1)

Help

Status

About

Careers

Press

Blog

Privacy

Rules

Terms

Text to speech

— Hacker News

其他收藏 · 0