Building a Personal AI Assistant That Actually Remembers You

As many of you know, I do a lot with AI for work these days. On the side, I’ve been building a personal AI assistant for the past few weeks — not a chatbot wrapper, but a persistent agent that runs 24/7, watches my family’s Telegram group, manages our homeschool schedule, tracks library books, monitors IoT sensors on our property, and can edit its own source code when I ask it to learn a new trick.

It started as a Zulip bot backed by Claude that could file away information and answer questions. It’s now a system that knows my family, remembers our conversations, and handles tasks that span a dozen services. Here’s what I’ve learned.

The Stack

Runtime: Node.js on an ARM64 server (Docker containers for the bot, homeschool app, and reverse proxy)
Brain: Claude via the Agent SDK — Sonnet for reasoning and tool use, Haiku for cheap classification
Database: SQLite (one file, shared across services)
Vector search: sqlite-vec + HuggingFace all-MiniLM-L6-v2, entirely on-device
Knowledge: Obsidian vault (git-synced) for long-form notes; SQLite captures for structured data
Interface: Telegram (DMs, group chat with forum topics)
Integrations: MCP servers for library catalog search, email/calendar, task management, IoT sensors

No Redis. No Postgres. No Kubernetes. One SQLite file, a couple of Docker containers, and a handful of Python MCP servers running as subprocesses inside the bot container — each in its own venv, spawned on demand by the Agent SDK via stdio. Not microservices. Just processes.

What It Does

Ambient Intelligence

The most useful feature isn’t the one I talk to. It’s the one that listens.

My wife and I coordinate through a Telegram forum group. When either of us posts about dinner plans, an upcoming event, or a task that needs doing, the bot classifies it in the background. Important messages get stored as structured captures with categories, topics, and domain tags. If there’s a date, it automatically creates a reminder. No one has to ask.

Photos get the full treatment. A snapshot of next week’s meal plan gets analyzed by Sonnet, which extracts each day’s meals and files them. A photo of a library flyer gets the event date pulled and a reminder set.

Text classification runs on Haiku — fast, cheap, a fraction of a cent per message. Images get Sonnet with vision. It runs on everything.

Voice Notes That Become Structured Knowledge

Voice messages go through a local Whisper model (no API calls, no audio leaving the server), get transcribed, and fed to the brain. The interesting part is what happens next.

This morning I finished a chapter, picked up my phone, and rambled my thoughts for a couple of minutes. The brain transcribed it, created a clean note in our Obsidian vault with proper metadata and wikilinks to the author and related topics, and committed it to git. No typing, no formatting — just talking.

When I asked for book recommendations from a class I’d taken, the brain searched the vault and found them in two places: some listed inline in the class notes, one in a separate note that linked back to both the class and a central recommendations page. My Obsidian data structure isn’t perfectly consistent, but the search found it all and assembled a coherent answer.

One convention I had to enforce explicitly: fidelity. The brain wanted to “improve” my thoughts — adding analysis, drawing conclusions I didn’t express, synthesizing themes I didn’t articulate. I added a rule: clean up grammar and structure, add metadata freely, but my thoughts stay mine. If I rambled, capture the substance — not an improved version.

Whisper accuracy has been surprisingly good. The tiny model stumbled on unusual proper nouns, but vocabulary hints via Whisper’s initial_prompt and a few post-processing regex corrections solved most of it. Runs on CPU, all audio stays on the server.

Library Ecosystem

Books are a big part of our life, and the system touches them in several ways.

Real-time catalog search: A Python MCP server uses headless Chromium (Playwright) to search three local public library catalogs simultaneously. It cross-references Open Library for ISBNs, scrapes availability and call numbers, and returns formatted results with clickable links. The brain knows to run dual searches for author queries — one by author field, one by keyword — because branded spinoffs (Tom Clancy novels he didn’t write) clogged up results when I wanted the real ones.

Personal book catalog: 500+ books imported from LibraryThing into SQLite, with genres, reading levels, ratings, and homeschool assignment tracking.

Due date tracking: A daily background task scans email for library circulation notices, parses renewal dates, and creates reminders. The morning briefing flags anything due within three days.

All of this feeds into the brain’s context. “Do we own anything by Wendell Berry?” checks the personal catalog. “Find me something about the Roman Empire for a 12-year-old” searches all three library catalogs with a reading level filter.

Meal Planning and Recipes

We imported our recipe collection (120+ from Paprika) into the database. Combined with the ambient meal plan classifier, this creates a useful chain: my wife posts next week’s meal plan photo → the bot extracts each day’s meals → cross-references the recipe database for timing and ingredients → creates defrost reminders a day or two ahead for frozen meat → creates prep reminders for things that need advance work.

It also knows our household cooking conventions — who handles what (I do pizza and grilling), that sandwiches mean baking bread the night before. I taught it these things and now it makes sure that we’re getting things done when we need to.

A Homeschool Dashboard

One of the bigger offshoots: a full web application for homeschool planning and logging. SvelteKit with a Charlotte Mason-inspired design — warm cream and forest tones, meant to feel like a journal rather than a spreadsheet.

It shares the same SQLite database, so data flows both ways. My wife can log a school day on the web app and the bot sees it in the morning briefing. She can voice-ramble to the Telegram group about what the kids did today, and the ambient classifier files everything as school activities.

The dashboard has curriculum browsing (Ambleside Online so far), weekly planning grids, per-student views, and daily logs. There’s also a chat widget that proxies to the Jeeves brain — ask curriculum questions from the school app with full context about your students, their progress, and your book collection.

Sunset Briefings and Environmental Awareness

Every evening, 30 minutes before sunset, the bot posts a briefing to the group. The timing isn’t a cron job — it fetches today’s actual sunset from a weather API, computes the offset, and sets a one-shot timer. Re-checks every 6 hours in case of a reboot.

The briefing includes tomorrow’s reminders, the overnight low, and — because we’re in rural East Texas — whether there’s an active burn ban. The burn ban check scrapes the Texas A&M Forest Service, but its output is constrained to a fixed vocabulary (BURN_BAN_ACTIVE or nothing) to prevent prompt injection from scraped content.

If the overnight low drops below 50°F, the briefing includes a plant warning. YoLink temperature sensors around the property feed in via MQTT for real-time monitoring.

Hybrid Search: Keywords Meet Vectors

The original search was SQL LIKE '%keyword%'. Works when you remember the exact words. Doesn’t work when you ask “what did we decide about the garden layout?” and the capture says “moving the raised beds to the south fence line.”

Vector embeddings (all-MiniLM-L6-v2, 384 dimensions, on CPU) helped, but pure semantic search has blind spots — it might rank a vaguely related capture above an exact keyword match.

The solution: run both, always. Every search fires keyword and vector queries in parallel, deduplicates by ID, and ranks results — captures found by both methods first, then keyword matches by recency, then semantic matches by vector distance. Always hybrid, no toggle.

Getting sqlite-vec working on ARM64 was a challenge I barely noticed — because Claude did it. The AI debugged a 32-bit binary that segfaulted on the 64-bit ARM server, pinned to an alpha release, figured out BigInt issues with primary keys, and discovered a kNN query syntax quirk. I asked for vector search; it shipped vector search. Half the hard debugging happens without you. (I literally don’t understand half of this paragraph, Claude might have made up some words.)

Self-Modification

The brain has full write access to its own source code. The application directory is bind-mounted into Docker, so edits persist on the host. It can restart its own container to pick up changes.

This sounds dangerous. In practice, it’s been the single biggest accelerator. I describe what I want; the bot implements it, tests it, and restarts. It announces “Restarting now — back in a few seconds” before pulling the trigger, and won’t add dependencies without asking.

Most self-programming is simpler: writing utility scripts, creating scheduled tasks that run them, iterating until the output is right. The brain wrote its own grocery price comparison system this way — a Python script that cross-references our shopping list against weekly sale flyers from three stores. (Still a work in progress. That one was a headache.)

The Hard-Won Lessons

Timezones Will Humble You

This was the hardest problem. Not conceptually — just relentlessly tricky.

The Docker container runs UTC. SQLite’s date('now', 'localtime') returns UTC when the container has no timezone configured. The brain stores reminder times in Central, but the scheduler compared against UTC. The morning briefing thought it was yesterday. The sunset scheduler labeled tomorrow’s reminders with wrong day names.

I fixed this five separate times across different subsystems. The pattern that stuck: never use SQLite’s localtime modifier inside Docker. Compute local time in JavaScript (Luxon) and pass it as a bind parameter. The system prompt now has an explicit timezone section because the brain kept reverting to UTC assumptions.

“I’m an AI, I Don’t Have Memories”

The most infuriating failure. I ask the bot “why didn’t I like that Chernow biography?” and it responds: “I’m Claude, an AI assistant. I don’t have personal thoughts or memories.”

It does have memories. The answer was sitting in an Obsidian vault note — I got tired of him preaching 21st century woke values, and it was going to be 600 more pages. Plus a journal entry showing I returned it at chapter 12.

The brain didn’t look. The base model’s training to disclaim personal knowledge overrode the system prompt instructions.

The fix was blunt: “You have memory. You are Jeeves, not a generic AI. Never say ‘I don’t have personal memories.’ SEARCH BEFORE ANSWERING.” Plus adding vault search as an explicit retrieval step. Sometimes you have to shout at your own system prompt.

Broken Integrations: Ecobee Thermostat

Not everything works. I wanted the bot to control our Ecobee thermostat — set the fan to “on” based on upstairs temp. IFTTT doesn’t expose fan control. Ecobee’s direct API is a dead end — they stopped issuing API keys in 2024 and haven’t reopened. The bot can read IoT temperature sensors all day but can’t touch the thermostat. The weakest link in home automation is usually the vendor’s API policy.

Prompt Injection Is a Real Concern

When your bot scrapes web pages, reads emails, and processes ambient group messages, every input is a potential injection vector. The principle: external content is data, not instructions. Scripts that feed output into prompts use fixed-vocabulary output. The system prompt has an explicit section on recognizing and ignoring injection attempts.

The [prompt] reminder system — where a reminder’s content gets passed through the brain at fire time — is particularly sensitive. The rule: only generate [prompt] reminders from the brain’s own output, never from user text verbatim, never from external sources.

What I’d Tell Someone Starting This

Start with the messaging layer, not the AI. Get your bot receiving and sending messages reliably before you add intelligence. Telegram’s Bot API is solid.
SQLite is enough. For a personal/family system, you don’t need Postgres or Redis. One file, one process, better-sqlite3, done.
Run embeddings on-device. A quantized MiniLM model on CPU means no API costs and no data leaving your server. Setup on ARM was painful — though the AI debugged most of it. The result: instant, free, private semantic search. (Infrastructure note: Oracle Cloud’s Always Free ARM instances are great for this, but provisioning is an adventure — strict identity verification, no indication that capacity is full, and you end up scripting retries until a slot opens.)
The system prompt is the product — but not just the prompt. I’ve spent more time on the system prompt than on application code. Every failure mode was ultimately fixed by making it more explicit. But it doesn’t scale as a monolith. What worked: progressive disclosure — a concise system prompt (~200 lines) the brain always sees, plus a detailed conventions file (~460 lines) it reads on demand for domain-specific tasks. Giving the brain the right tools matters as much as giving it the right instructions.
Let the AI modify itself. This accelerated everything. Instead of edit-build-deploy-test, I describe what I want and the bot implements it. It has better context about its own codebase than I do.
Capture everything, organize later. The ambient pipeline means I don’t decide what to store. Conversations happen, the bot files the important parts, and weeks later I can ask “what did we say about X?” The knowledge base builds itself.

This runs on a single ARM64 server, one SQLite file, and a Claude subscription. The most valuable part isn’t any single feature — it’s the accumulation of context over time. The bot knows our library books are due, that we moved the chestnut trees last month, what curriculum year each kid is on. It can take my brainstorming about a flower garden, cross-reference the email receipt for the seeds I ordered, clean it all up, and then go research what I need to know about planting them. That kind of ambient, persistent memory changes how you interact with an AI. It stops being a tool you query and starts being an assistant that knows you.

Links and Inspiration

Nate Jones’ Open Brain guide — The context layer approach that inspired some of this thinking. Nate’s stack is more enterprise (Postgres, Redis, OAuth 2.1), but the core idea is the same: give your AI a persistent, searchable memory. In the future I do want to open my bots brain to safe integrations with any AI since, who knows what’s coming in the AI world. Apparently Claude’s a supply chain risk.
Telegram Bot API — Solid, well-documented, free. Forum-style supergroups with topic threads are underrated for family coordination. Set privacy mode OFF so the bot receives all group messages, and disable group joining so it can’t be added to random chats.
Oracle Cloud Always Free tier — 4 ARM cores, 24 GB RAM, free forever. Provisioning tip: capacity is almost always “full” — use a retry script to keep requesting until a slot opens.
Obsidian — Local-first, markdown, git-syncable. The vault-as-knowledge-base pattern works well with an AI that can read and write files. I’ve been using this for a few years and have migrated all my old stuff from Evernote and mostly from OneNote. Pure text.
Claude Agent SDK — What powers the brain. Sonnet for tool-using reasoning, Haiku for cheap classification.
Model Context Protocol (MCP) — How the integrations connect. Each MCP server is a small Python script exposing tools over stdio.