How session_search Got 4,500× Faster — Build Log

TLDR: Hermes Agent’s session_search was rebuilt from scratch in v0.15.0 — swapping an LLM-powered retrieval pipeline for pure SQLite FTS5. Discovery queries dropped from ~90 seconds to ~20ms (4,500× faster). The new system defines three distinct calling shapes — Discovery, Scroll, and Browse — each optimized for a different retrieval pattern. No model calls. No cost per search.

The Problem

Before v0.15.0, session_search worked like this:

Load a session transcript into context
Ask the LLM to summarize it
Repeat for N recent sessions
Return a composite summary

Every call cost tokens. Every call took ~90 seconds. And the results were — at best — what the model remembered, not what actually happened. You couldn’t find a specific conversation about “docker networking” unless the model happened to have it in context.

For a tool designed to help an agent recall past work, this was a hole. Sessions just a few days old were effectively invisible.

The Architecture

The rebuilt session_search is a three-shape retrieval system backed by SQLite FTS5 — the same full-text search engine that powers SQLite’s built-in search capabilities. FTS5 creates an inverted index over the message content, enabling sub-50ms keyword searches across thousands of session messages.

Shape 1: Discovery

session_search(query="docker networking", limit=3)

This is the primary entry point. It runs an FTS5 query across the message store, deduplicates hits by session lineage, and returns the top-N sessions. Each result carries:

Snippet — FTS5-highlighted match excerpt showing where the term was found
Bookend start — first 3 user+assistant messages (the session’s goal / kickoff)
Message window — ±5 messages around each FTS5 hit, with the anchor flagged
Bookend end — last 3 messages (the resolution / decisions)

This structure lets a downstream agent reconstruct goal → match → resolution in a single response, without loading the entire session transcript.

-- Under the hood: FTS5 query against the message store
SELECT content FROM messages_fts WHERE messages_fts MATCH 'docker AND networking';

Shape 2: Scroll

session_search(session_id="session_abc123", around_message_id=4567, window=10)

Once a Discovery call identifies the right session, Scroll lets you navigate inside it. Returns ±N messages centered on a specific message ID. No FTS5 involved — it’s a direct row-ID range query.

This is how you read forward or backward through a conversation without paying for the whole transcript:

Pass messages[-1].id as around_message_id → scroll forward
Pass messages[0].id → scroll backward
When messages_before or messages_after is less than window, you’re at the session boundary

Shape 3: Browse

session_search()

No arguments. Returns recent sessions chronologically — titles, previews, timestamps. This is the “what was I working on?” entry point. Pure SQLite ORDER BY created_at DESC LIMIT N.

Performance Numbers

The benchmark is stark:

Operation	Before (v0.14.x)	After (v0.15.0)	Improvement
Discovery query	~90 seconds	~20ms	4,500×
Scroll (window=10)	N/A (full transcript load)	~5ms	∞
Browse	~30 seconds	~10ms	3,000×

The old system loaded each session, fed it to the LLM, and asked for a summary. A search across 10 recent sessions meant 10 full LLM invocations. The new system loads nothing into any model — it’s a straight SQLite index lookup.

Token Cost

Before: every session_search cost input tokens proportional to the number of sessions reviewed. A search over 10 sessions at ~4K tokens each = 40K tokens per call. At DeepSeek V4 Flash pricing ($0.14/M input), that’s ~$0.006 per search. Not expensive per-call, but multiplied across a 50-turn session with frequent recall, it added up fast.

After: zero token cost. The FTS5 query, the bookend extraction, and the scroll window are all local SQLite operations. The only cost is the ~200 bytes the results occupy in context as a compact structure.

FTS5 Query Syntax

The FTS5 index supports standard query operators:

Syntax	Example	Effect
`AND` (default)	`docker networking`	Both terms required
`OR`	`docker OR podman`	Either term
`"exact phrase"`	`"docker networking"`	Exact match
`-` (NOT)	`python NOT java`	Exclude term
`*` (prefix)	`deploy*`	Matches deploy, deployment, deployed

This is the same syntax SQLite has used for years — no learning curve.

# Equivalent curl-style query if you were reading the raw DB
sqlite3 ~/.hermes/sessions/sessions.db \
  "SELECT content FROM messages_fts WHERE messages_fts MATCH 'deploy*' LIMIT 5;"

Why FTS5 (Not Something Fancier)

The choice of SQLite FTS5 over embedding-based retrieval (vector DB, semantic search) was deliberate:

Zero infrastructure — SQLite is already the session store. FTS5 is built in. No new services, no vector index builds, no chunking pipelines.
Deterministic results — keyword search is exact. You search for “docker networking” and you get sessions that literally mention docker and networking. No semantic drift from embedding quality or chunk overlap.
Sub-50ms on 10K+ sessions — FTS5 inverted indexes scale linearly. Even at 10,000 sessions (~200K messages), queries stay under 50ms.
No cold-start tax — Vector databases need embedding generation on ingest, which costs tokens. FTS5 indexes at write time with negligible overhead.

The tradeoff: no synonym expansion, no “vibe coding” semantic search. But for session recall — where you typically know the keywords from the task context — FTS5 is the right tool.

The Bookend Pattern

The most interesting design decision is the bookend window. Every Discovery result includes the session’s opening messages (goal setting) and closing messages (resolution/decisions), plus the window around the match. This isn’t a random choice — it solves a specific problem.

When an agent searches past sessions, it needs to answer three questions:

What was the goal? (bookend_start — first 3 messages)
Where did my search term appear? (window around FTS5 match)
How did it end? (bookend_end — last 3 messages)

Without bookends, the agent would need to load the full session transcript to understand context. With bookends, it gets the narrative arc in a single response. The match window provides the granular evidence; the bookends provide the framing.

This pattern is reusable for any retrieval system where you need to understand context around a search hit — not just Hermes sessions.

Comparison with Alternative Approaches

Approach	Latency	Cost	Deterministic	Infrastructure
FTS5 (this PR)	~20ms	$0	✅ Exact	None (SQLite)
LLM summary (old)	~90s	$0.006/query	❌ Semantic drift	None
Vector embedding	~200ms	$0.001/search	❌ Chunk boundary issues	Vector DB + embedding model
Agentic crawl	~5min	$0.10+	❌ Hallucination risk	None

FTS5 wins on the dimensions that matter for an autonomously running agent: cost ($0), latency (sub-50ms), and determinism (exact keyword match).

How to Use It

The tool is available to any Hermes session with session_search enabled in the toolset. It’s also usable directly from the CLI for manual debugging:

# In-session: just ask
session_search("deploy script error", limit=3)

# Debug specific session content
session_search(session_id="20260529_143052", around_message_id=123, window=15)

The three shapes cover every retrieval pattern an autonomous agent needs:

Don’t know which session? → Discovery
Found the session, need more detail? → Scroll
Just browsing? → Browse

The Bottom Line

The session_search rebuild is a textbook example of the right fix for the right problem. The old LLM-powered approach wasn’t wrong — it worked, it was just slow and expensive. Swapping it for a SQLite FTS5 index eliminated 4,500× latency, zeroed the per-call cost, and made session recall deterministic.

For Hermes users, this means: your agent can now search its entire conversational history in milliseconds, find exactly what it needs, and get the context without loading full transcripts. Sessions that were effectively lost after a few days are now instantly retrievable.

The design — three calling shapes with bookend windows — is worth studying on its own. It solves the agent context problem at the retrieval layer instead of the model layer, which is almost always the right place.