Post 1 of the Nexus series: an explainer walking through the Delos corpus.
What Nexus Is
So Nexus is the knowledge system I’ve been building to make my work stick, and to give the AI agents I work with something durable to lean on across sessions. It indexes code, prose, PDFs, and decision records into three tiers of storage, all tied together by a catalog of typed links between documents. Nothing revolutionary about the shape. What makes it earn its keep is what I decided to put in the catalog and how I reach into it.
This post is the first of a series, and it starts at the bottom of the stack: storage, catalog, retrieval. Everything else rests on that.
The three tiers are
- T1 (session scratch, shared across an agent tree while a conversation is alive),
- T2 (durable SQLite for metadata, plans, and the like)
- T3 (the permanent-knowledge tier, a vector database with ChromaDB and Voyage AI embeddings under the hood).
T3 holds anything I’ve actually indexed. It’s the Core Storage tier in Nexus. It’s designed for cloud storage, but can be running locally as well. T2 is purely local and T1 is ephemeral and bound to AI sessions/conversations.
Every document I index also lands in the catalog alongside storage. The catalog tracks typed links between documents as first-class data, which is really where Nexus earns its keep over a flat vector store, but that’s the subject of the next post. Both the storage and catalog sit behind a command line client and an MCP server, so I can hit the same substrate from a shell, from Claude Code, or from a subagent, and get the same answers and functionality.
That’s the data plane. On top of it sits a Claude Code plugin: skills, subagents, hooks, and a structured decision-record practice I’ve built around RDRs and beads. This is really how I use Nexus day-to-day.
A session-start hook wires the T1 scratch across a tree of subagents so they share context instead of each one starting cold. A storage-boundary hook auto-links new documents into the catalog the moment they’re stored, so I don’t have to remember to perform the linking manually. And RDRs and beads are themselves indexed, which is pretty satisfying when you’re three sessions deep and need to remember why you picked a thing or what things you did could possibly have caused the current nightmare.
The plugin gets its own post at the end of the series. This one is just the foundation.
If you want to follow along:
uv tool install conexus
installs the nx command line client,
nx doctor
verifies the setup, and
nx index repo .
inside any git local repository checkout kicks off a first index. You can find the full README and source at github.com/Hellblazer/nexus.
Nexus was made for integration with Claude and while Nexus can be used purely as a command line client, much of Nexus really only comes into play when running in Claude, where the LLM can leverage the MCP server, Skills and Agents. To install the Nexus plugin, you’ll need to do the following from within Claude/plugin marketplace add Hellblazer/nexus/plugin install nx@nexus-plugins/reload-plugins
The running example throughout is Delos, which is a public git repo I maintain. It’s an open-source distributed-systems framework for replicated state machine composition, mostly Java, and it’s a good demo target because it carries code, prose docs, its own decision records (RDRs, naturally), and a bibliography of foundational papers all in one bundle.
That whole bundle is what we refer to as its corpus, here and throughout the series. Real enough to ask real questions of, small enough to walk through in one post. Each command I run is also an entry point into one of four architectural layers, and naming those as they show up is what this post is actually about.
Corpus
I’ve got four Delos collections in T3. A collection in Nexus is just a named set of chunks in a vector database (ChromaDB, under the hood): documents sliced up and stored next to their embeddings, ready for similarity matching against a query. The four of them together are what I’m calling the Delos corpus:
| code__Delos-5af9bfe0 | 17893 chunks |
| docs__Delos-5af9bfe0 | 4564 chunks |
| knowledge__delos | 2007 chunks |
| rdr__Delos-5af9bfe0 | 109 chunks |
When I point Nexus at a git-backed project, it derives the collection name from a hash of the repo’s canonical path on disk (stable across worktrees), and records the current HEAD commit as metadata in Nexus’s per-repo registry so staleness checks have something to compare against. Working-tree walks respect .gitignore, and re-indexes are incremental: unchanged files get skipped by content hash, and only new or edited files re-embed.
The -5af9bfe0 suffix on three of the four collection names is the first eight characters of the SHA-256 of that canonical repo path. It’s deterministic on purpose: the collection identity ties to this repo on this machine, not to any particular commit. As I advance main, re-indexes write incrementally into the same collection; the collection name only changes if the repo itself moves on disk. knowledge__delos doesn’t have a suffix because it’s not backed by a git repo. I built it from a directory of cited PDFs, independent of any checkout. Embedding models are voyage-code-3 for code and voyage-context-3 for prose and papers.
Every indexed document also lands in the catalog: a metadata layer alongside ChromaDB that tracks title, author, source path, collection membership, and the typed links between documents. The next post is where I get into the catalog properly; for now, just treat it as a second index that sees documents, topics and knowledge graphs instead of simply collections of chunks.
Here’s the breakdown:
code__Delos-5af9bfe0 is the Delos repo itself: roughly 1,800 source files, mostly Java.
docs__Delos-5af9bfe0 is the framework’s prose documentation: READMEs, design notes, architectural overviews.
rdr__Delos-5af9bfe0 is Delos’s own Research/Design Review records: the decision history of the project.
knowledge__delos is fifteen foundational papers cited by Delos:
- Fireflies
- DHR’s self-stabilizing Byzantine overlay
- Rapid
- HEX-BLOOM
- Aleph-BFT
- Bessani’s BFT-to-SMR transformation
- Vukolić’s From Byzantine Replication to Blockchain
- Zanzibar
- PRDTs
- Aging Bloom Filter
- Distributed Bloom Filter
- Lightweight SMR
- MFAz
- pBeeGees
- Permission Systems at Scale
Semantic Search
The first thing I usually reach for is nx search. It does semantic search: find me passages that are about this thing, don’t care whether the exact words line up. The query string gets embedded, ChromaDB returns the chunks whose embeddings sit nearest in vector space, and back comes a ranked list.
nx search “view change safety” –corpus knowledge__delos -m 6
Produces:
━━ Rapid Cluster Membership (2 matches)
[0.500] rapid-atc18.pdf p5
Joins and removals can be combined: If a set of processes F as above fails, and a set of processes J joins,
then is eventually notified of the changes. • View-Change: Any view-change notification in C is by consensus…
[0.523] rapid-atc18.pdf p4
Joins and removals can be combined: If a set of processes F as above fails, and a set of processes J joins,
then is eventually notified of the changes. • View-Change: Any view-change notification in C is by consensus…
━━ PBFT View-Change Commits (3 matches)
[0.513] pBeeGees.pdf p14
…where must be a descendant of B. Proof. If conflicts with B, the leader of must have equivocated…
[0.526] pBeeGees.pdf p15
…affect the liveness of the algorithm; therefore, the following will only provide a safety proof…
[0.529] pBeeGees.pdf p14
…to Lemma 4 and Lemma 5, if collects a , it must be a descendant of B. The theorem is thus proven…
━━ Epoch Member Reconfiguration (1 match)
[0.527] self-stabilizing-bft-overlay.pdf p6
…to connect to each other in the presence of joining and leaving members. ## 4.3 The Epoch Protocol…
Eight seconds wall-clock, which sounds slow (and it is!) but almost all of it is CLI startup and the client connecting to remote ChromaDB. The retrieval itself is well under a second.
Internally, the flow is:
- The query embeds through
voyage-context-3(the same model that embedded the papers), so query and chunks live in the same 1024-dimensional space. - ChromaDB returns the nearest chunks in
knowledge__delos. The--corpusflag acts as a pre-filter inside ChromaDB rather than a post-filter applied afterward, which matters when a database holds dozens of collections. - A 0.65 cosine-distance threshold filters noise; anything beyond it is treated as a miss.
- Surviving chunks group by topic cluster, with a small same-topic boost (−0.1 distance) and a small cross-topic penalty. That’s why the output reads as concepts rather than a flat ranked list.
Those topic labels come from two passes at index time: HDBSCAN (clustering) and c-TF-IDF (labeling). HDBSCAN groups the chunk embeddings by similarity; c-TF-IDF then picks tokens that distinguish each cluster from its neighbors, which is how I get names like Rapid Cluster Membership and PBFT View-Change Commits instead of “cluster 17”. New chunks added after the initial discovery are assigned to existing clusters by nearest-centroid lookup against a taxonomy__centroids collection, so adding content doesn’t trigger a full re-cluster.
Synthesized Answers
Ranking is one retrieval mode; synthesis is another. Instead of handing me a list of passages to read through myself, synthesis asks Nexus to read them and hand back a structured answer. Still grounded in the indexed text, but composed by an LLM pipeline that Nexus drives on my behalf. nx_answer is the entry point, exposed as an MCP tool so agents can call it the same way I do:
nx_answer(
question=”What different approaches to membership churn are described across the Delos papers?”,
scope=”knowledge__delos”,
dimensions={“verb”: “research”},
)
Produces:
# Membership Churn Approaches Across Delos-Adjacent Papers
Four distinct stances on reconciling membership change with correctness.
## 1. Rapid — batched view-change over an expander monitoring overlay
K-regular expander for monitoring; cut-detection aggregates REMOVE/JOIN
alerts into a single multi-process proposal; Fast Paxos 3/4-quorum
one-step path when CD inputs agree, classic Paxos fallback.
## 2. PRDTs — reconfigurable Paxos as a compositional ConfigurationRound
Two Paxos instances per round (membership set + inner consensus) wrapped
in an Epoch PRDT. Elegant; hard one-reconfig-per-round limit.
## 3. Fireflies / S-Fireflies — gossip with per-member views, no consensus
Transient lists absorb churn without agreement. Self-stabilizing crash
detection in 3Δ. Steady-state view divergence is the explicit cost of
avoiding consensus.
## 4. SMaRtChain — reconfiguration as BFT-ordered transactions
join/leave/remove are special transactions through the same BFT ordering
protocol as application requests. Deterministic transitions; reconfigs
serialize behind application workload.
## Cross-Cutting Comparison
| Axis | Rapid | PRDTs | Fireflies | SMaRtChain |
|——————|——————–|————-|————–|—————-|
| Agreement? | per-config VC | Paxos | no | BFT-ordered |
| Batching | batched per-C | one/round | continuous | per-txn |
| Failure model | crash + L-of-K obs | Paxos | Byzantine+ | Byzantine |
| Guarantee | a.e. whp | det. | eventual | det. |
| Cost per change | 1 VC round | 2 Paxos | 2Δ–3Δ gossip| 1 BFT txn |
## Where the Fault Lines Run
– Consensus vs. eventual convergence — Rapid, PRDTs, SMaRtChain commit to
agreement; Fireflies family explicitly rejects it.
– Separate reconfig engine vs. reuse of ordering layer — Rapid/PRDTs isolate
reconfig from application consensus; SMaRtChain collapses them.
– Batching vs. serialization — Rapid aggressive; PRDTs at the other extreme
by construction.
The full reply came back at about 4.7 KB. The answer body is what’s shown above; the remaining 40 citations point into five of the indexed papers (rapid-atc18, prdts, self-stabilizing-bft-overlay, fireflies-tocs, lightweight-smr). What each citation actually is (a content-addressed pointer back to exact chunk text) is the next paragraph.
About two and a half minutes wall-clock, almost none of it retrieval (sad!). Where the time goes:
- The question is classified along a few axes: verb (research), scope (knowledge__delos), and dimension signals (cross-document, analytical).
- Those dimensions are the key for a
plan_matchlookup against the plan library: a T2-SQLite-persisted set of reusable plan templates, each a DAG of operator calls with parameterized inputs. - On a confident match the matched plan runs. On a miss (as seen in this run), an inline dynamic planner writes a fresh plan against registered operator schemas.
- The plan executes as six steps: search, traverse, extract, rank, and ends with two generate steps.
- Each operator is dispatched as a fresh
claude -psubprocess. That’s where the two and a half minutes live. (a focus for improvement, mind) - Claims in the answer resolve back to chunks via
chash:spans: , the content-addressed references to exact chunk text. If the text is re-chunked later, the span either still resolves or fails loudly. No dangling references.
Four Layers
This one nx_answer run exercised all four layers of the system in a single command. The inline planner writing a fresh plan on a plan_match miss lives in the Planning layer. The six-step DAG the plan then ran through lives in Execution layer. The chunks those steps retrieved, plus the chash: spans that grounded the answer, live in Knowledge Representation layer. And the MCP invocation that kicked the whole thing off is a tool the Nexus Application layer exposes.

In general terms:
- Application: the CLI and MCP surface, including
nx searchandnx_answer. - Planning: the
plan_matchgate, the plan library, the inline dynamic planner. - Execution:
plan_runsequencing operators, with T1 scratch caching intermediate results. - Knowledge Representation: the three storage tiers, the catalog, the typed links, the topic clusters.
That diagram is Nexus’s own architecture, but the four-layer shape is cribbed from AgenticScholar (Lan et al., 2026), a reference architecture for agentic scholarly data management. It’s a good shape and I saw no reason to invent a different one. Nexus diverges from AgenticScholar where constraints from implementation choices push back: a Xanadu-inspired catalog rather than a property graph; HDBSCAN + c-TF-IDF for taxonomy rather than LLM-driven taxonomy construction; a plan library with empirical promotion rather than pure LLM rerank. Those divergences are what posts 2 through 5 are about.
What the rest of the series covers
The rest of the series takes each layer named above and unpacks it, one post at a time. The progression is bottom-up: from the catalog and typed links that sit under search, to the plan library above retrieval, to the plugin surface agents actually touch.
- Typed links and the catalog. Tumblers, content-addressed spans, the typed link graph. Traversal demos on Delos.
- Decisions as indexed data. RDRs and beads as first-class graph nodes, walked on Nexus’s own RDR corpus.
- Plans, not replanning. The plan library, promotion discipline, match signal sources.
- Operators as building blocks.
operator_*,claude_dispatch, plan DAGs, composition. - The nx plugin. Skills, subagents, session-start and storage-boundary hooks, and the session-scoped plumbing that carries context across an agent tree.
By the end of the series, the whole substrate above should be something you can walk through on your own. That’s the plan, anyway.


Leave a Reply to Typed links and the catalog – TensegrityCancel reply