Show HN:Z80-μLM,一個能裝進 40KB 的『對話式 AI』

Show HN:Z80-μLM,一個能裝進 40KB 的『對話式 AI』

Hacker News·

一個名為 Z80-μLM 的新專案,推出了一個 2 位元量化的語言模型,其體積小到足以在 8 位元 Z80 處理器上運行,讓復古電腦也能實現對話式 AI。

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

To see all available qualifiers, see our documentation.

Z80-μLM is a 2-bit quantized language model small enough to run on an 8-bit Z80 processor. Train conversational models in Python, export them as CP/M .COM binaries, and chat with your vintage computer.

Uh oh!

There was an error while loading. Please reload this page.

HarryR/z80ai

Folders and files

Latest commit

History

Repository files navigation

Z80-μLM: A Retrocomputing Micro Language Model

Z80-μLM is a 'conversational AI' that generates short character-by-character sequences, with quantization-aware training (QAT) to run on a Z80 processor with 64kb of ram.

The root behind this project was the question: how small can we go while still having personality, and can it be trained or fine-tuned easily? With easy self-hosted distribution?

The answer is Yes! And a 40kb .com binary (including inference, weights & a chat-style UI) running on a 4MHz processor from 1976.

It won't pass the Turing test, but it might make you smile at the green screen.

For insight on how to best train your own model, see TRAINING.md.

Examples

Two pre-built examples are included:

tinychat

A conversational chatbot trained on casual Q&A pairs. Responds to greetings, questions about itself, and general banter with terse personality-driven answers.

guess

A 20 Questions game where the model knows a secret topic and answers YES/NO/MAYBE to your questions. Guess correctly to WIN.

Includes tools for generating training data with LLMs (Ollama or Claude API) and balancing class distributions.

Features

Interaction Style

The model doesn't understand you. But somehow, it gets you.

Your input is hashed into 128 buckets via trigram encoding - an abstract "tag cloud" representation. The model responds to the shape of your input, not the exact words:

This is semantically powerful for short inputs, but there's a limit: longer or order-dependent sentences blur together as concepts compete for the same buckets. "Open the door and turn on the lights" will likely be too close to distringuish from "turn on the door and open the lights."

Small Responses, Big Meaning

A 1-2 word response can convey surprising nuance:

This isn't necessarily a limitation - it's a different mode of interaction. The terse responses force you to infer meaning from context or ask probing direct yes/no questions to see if it understands or not (e.g. 'are you a bot', 'are you human', 'am i human' displays logically consistent memorized answers)

What It's Good At

What It's Not

It's small, but functional. And sometimes that's exactly what you need

Architecture

Quantization Constraints

The Z80 is an 8-bit CPU, but we use its 16-bit register pairs (HL, DE, BC) for activations and accumulators. Weights are packed 4-per-byte (2-bit each) and unpacked into 8-bit signed values for the multiply-accumulate.

The 16-bit accumulator gives us numerical stability (summing 256 inputs without overflow), but the model's expressiveness is still bottlenecked by the 2-bit weights, and naive training may overflow or act 'weirdly' without QAT.

Z80 Inner Loops

The core of inference is a tight multiply-accumulate loop. Weights are packed 4-per-byte:

The multiply-accumulate handles the 4 possible weight values:

After each layer, arithmetic right-shift by 2 to prevent overflow:

That's the entire neural network: unpack weight, multiply-accumulate, shift. Repeat ~100K times per character generated.

License: MIT or Apache-2.0 as you see fit.

About

Z80-μLM is a 2-bit quantized language model small enough to run on an 8-bit Z80 processor. Train conversational models in Python, export them as CP/M .COM binaries, and chat with your vintage computer.

Topics

Resources

Uh oh!

There was an error while loading. Please reload this page.

Stars

Watchers

Forks

Releases

Languages

Footer

Footer navigation

Hacker News

相關文章

  1. 從零開始理解量化技術

    大約 1 個月前

  2. 為 AI 應用程式建立記憶體 - 艱難的途徑

    4 個月前

  3. Show HN:我打造了一個微型大型語言模型,旨在揭開語言模型運作原理的神祕面紗

    大約 1 個月前

  4. Show HN:我如何僅用兩張遊戲顯卡奪得 HuggingFace 開源大語言模型排行榜冠軍

    大約 2 個月前

  5. Show HN:Latent Signal - 精選 AI 新聞動態

    4 個月前