
本地認知記憶系統 – 設備端 AI 記憶
本文認為,目前的 AI 代理缺乏真正的狀態管理,經常遺忘或誤解資訊。文章提出,AI 記憶應被視為一種認知作業系統,而非僅是增強的上下文檢索,並探討了其對設備端 AI 的影響。
Sign up
Sign in
Sign up
Sign in

Local Cognitive Memory Systems

--
Listen
Share
If you’ve ever tried to build a truly stateful assistant, you’ve felt the gap immediately.
Your agent is smart in the moment...and then it forgets. Or better yet it “remembers” the wrong thing, permanently, because it formed a belief too quickly. It treats a one-off as identity. It turns a stray sentence into a durable trait.
That’s the real problem hiding underneath the hype: agents don’t just need context; they need managed state.
And managed state isn’t an additional feature, its infrastructure.
Real Memory isn’t just more context. It’s a cognitive operating system.
For too long, the industry has used “memory” to mean one of two things:
This particular framework is fine when you’re building chatbots.
But agents are different in the sense that they loop. Agents are able to plan, act, observe action, and update beliefs. This means the right system architecture for AI memory isn’t just vector database searches and knowledge graph traversals.
It’s closer to an operating system:
Once you accept this new framework, the question arises:
Where should that lifecycle run?
The current AI memory landscape
Most “memory” stacks today converge on the same pipeline:
capture → summarize/extract → embed → store (vectors + metadata) → retrieve → inject into the prompt → write-backs
That loop isn’t inherently wrong, but it serves only as a base-line skeleton for context retrieval which does not capture the true essence of stateful agents.
Another problem is that in many deployments, the most sensitive parts of the loop run outside the user’s device. Raw interactions get shipped to the cloud early, before you’ve minimized or redacted anything, and before you’ve decided what should become durable.
That “where the loop runs” decision changes everything.
The risks of cloud-dependent memory
When memory goes cloud-first, the security model becomes harder to reason about, not because anyone is malicious, but because memory creates durable, searchable identity.
- Memory multiplies across systems
A single interaction doesn’t stay as “one record.” It usually becomes multiple artifacts consisting of embeddings, metadata, retrieval traces, and summaries.
Even if each piece is “safe enough” alone, the combined system can reconstruct a user’s or team’s private history.
- Persistent memory increases the blast radius of prompt injection
Prompt injection isn’t that hypothetical anymore; it’s a known class of attacks on production systems. Researchers have documented a real-world, zero-click prompt injection vulnerability (“EchoLeak”) used to exfiltrate data from a Microsoft’s very own Copilot. Disastrous effects can arise because indirect prompt injection can poison long-term memory, meaning malicious instructions repeatedly pop up later.(EchoLeak).
This is the subtle but critical point:
When your agent treats retrieved memories as “trusted context,” retrieval becomes a trust boundary.
- Centralized memory becomes a honeypot
A centralized, searchable memory store is an unusually high-value target. It isn’t just user data, but also user intent, preferences, relationships, and meaningful history organized for retrieval.
Centralization is undoubtedly useful for engineering, but it’s also what attackers want...
- Cloud introduces latency coupling
Every cloud round-trip adds fluctuation, even when cloud is “fast enough”, it’s still coupled to network quality, service uptime, rate limits, and request queues.
Microsoft’s guidance puts it quite straightforward: by running models locally, you can reduce latency because your data doesn’t need to be sent over to a network.(Microsoft Learn)
The edge case is the real case
If an agent is going to remember you and your workflows, it’s going to hold on to something personal whether or not you gave it “PII”. That's pretty much what long-term state is: a model of you and your projects overtime.
Our position is simple:
If memory is identity then it simply shouldn’t default to leaving the device.
Ultrathink: a local-first memory engine
At Mycelic, we treat memory as on-device infrastructure.
That means from ingestion to consolidation to retrieval, our memory loop runs locally inside the user’s very own computer.
Rather than calling on a frontier model API for tasks such as handling summarization or embedding every interaction, we run these memory-maintenance tasks on the edge using optimal small language models from the Qwen family as well as local vector search.
But the deeper difference isn’t just “local vs cloud.” It’s that we don’t treat memory as a flat pile of stored text.
We treat memory like a cognitive operating system.
Why real memory needs cognition, not just storage
The user says one thing once → system stores a trait forever.
That’s not how stable memory works.
What we use is an intermediate layer, a staging area where patterns accumulate evidence before they become durable facts.
In recent research, you can see this idea formalized as multi-tier memory architectures: short-term memory, mid-term memory, long-term memory, plus explicit update and eviction logic.
Memory OS of AI Agents
By default, this lifecycle gets much harder when raw interactions are shipped off device because your strongest controls can only run after the data has already left. When memory is local, you can gate, sanitize, and selectively promote information before it becomes durable (or ever leaves the environment).
Do we really need big models for memory?
“Sure, local is nice, but don’t you need frontier models to understand what matters?”
We don’t think so. Memory comprises of mostly constrained work:
That’s a very different job than “write me something from scratch.” Memory is closer to careful bookkeeping: distill what happened, attach the right tags/time, dedupe it, and decide whether it’s worth keeping.
In other words, models don’t need to be brilliant; they need to be consistent, that’s exactly what on-device memory pipelines demand.(MobileLLM)
So instead of defaulting to “call the biggest model for everything,” we take the opposite approach:
This isn’t just a cost play but rather an architectural decision to separate cognition from maintenance.
Why RAM & Disk matter for a true cognitive memory system
If you want an agent that persists across sessions, survives restarts, and carries state over weeks, that state must be durable.
Context windows are transient. RAM is transient. Even if you keep a long context, it’s still not stable storage.
In practice, “persistent” means disk-backed (or some other durable store). Not because disk is magical, but because that’s how you get:
This is exactly why the “memory OS” metaphor is useful: it forces you to think in terms of durable storage and governed lifecycle, not just retrieval tricks.
The edge isn’t a constraint. It’s the point.
Local-first memory gives you three things cloud-first memory struggles to implement in practice:
Once you build AI memory with real infrastructure meaning tiers, consolidation, and intentional forgetting you stop treating memory as “extra context” and start treating it as what it really is: the system that gives an agent a temporal existence.
We don’t need a supercomputer to remember a user’s preferences and history.
We need a memory layer that’s consistent, local, structured, and governable, a cognitive operating system that lives where the user lives.
That’s the direction we’re building at Mycelic.
--
--


Written by Ali Novruzov
Interested in Azerbaijani History.
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech
相關文章