Plans, not replanning

Post 4 of the Nexus series: why a runtime library of saved query plans beats cold-start LLM replanning on every query.


The plan you already wrote

Point of view is worth 80 IQ points. (Alan Kay)

The previous post, Decisions as indexed data, was about refusing to let design decisions evaporate into chat history. This post is about refusing to let query plans evaporate as well. We cache them, persist them, reach for them again rather than re-deriving from scratch every time. At least, that’s the theory; ideally the practice too.

A common pattern in LLM agentic retrieval is to cold-start on every query. The agent reads the question, reasons about what it means, picks a strategy, executes the strategy, stitches the results into an answer, and then throws the plan away the moment it returns. The next query does it all over again. Same shape of problem, fresh derivation, possibly different answer. The system isn’t learning anything across queries; the agent meets every analytical question as if for the first time. Every trace of what worked (or didn’t) the last time has been discarded along with the plan.

That’s the problem the AgenticScholar paper addresses, and it’s the reason plan-centric retrieval was in Nexus from the start. I read the paper early on while working out what shape Nexus should take, felt that the framing was excellent, and built the loop in.

The AgenticScholar paper

AgenticScholar: Agentic Data Management with Pipeline Orchestration for Scholarly Corpora (arXiv:2603.13774) describes a four-layer architecture for analytical querying over a scholarly corpus: a taxonomy-anchored knowledge graph, an LLM-driven query planner that compiles natural-language questions into operator DAGs, a composable operator library of about fifteen typed operators, and structured ingestion. Recall the four layer architecture from our first post.

Downstream of the planner sits a predefined plans pool: a library of saved plans matched to incoming queries by confidence threshold. If a stored plan clears the threshold, it runs. Otherwise the query is treated as ad-hoc and handed back to the planner.

Their benchmarks run that architecture against Elicit, Gemini Deep Research, SmolAgent, and naive RAG. On analytical queries, AgenticScholar scored roughly +47% NDCG@3 over RAG, and their ablation attributes a distinct share of that lift to the predefined-plan selection specifically. That’s enough evidence to take the pattern seriously.

Nexus leverages two ideas from the paper directly: the operator-DAG model for analytical retrieval, and the predefined-plans-pool loop. The rest of this post unpacks where Nexus’s implementation diverges:

  • Storage lives in T2 as structured SQLite + FTS5 records (shape, DAG, outcome, tags), sitting alongside memory and the catalog taxonomy. Plans participate in the same query and search machinery as everything else in the project. Which is nice and uniform.
  • Matching stacks three signals where the paper uses one: dimension filter for structural matches against plan shape, embedding-based semantic rerank (cached and low latency), and FTS5 keyword fallback when the shape metadata is sparse. The mixed corpus makes any single signal too brittle on its own.
  • Promotion is empirical, not (only) author-curated: every plan carries match_count, success signals, and cost-and-latency profiles, and those numbers decide what stays in the library and what gets retired.
  • Corpus is mixed-kind, not single-kind: code, prose, RDRs, and papers all sit in one substrate, so plan dimensions can route queries across content types rather than within a single kind.

What a plan carries

A Nexus plan follows the shape the paper lays out. The first time you encounter a class of query, you pick the operators, arrange the DAG, tune the dimensions on which the plan should match. That arrangement is saved. Every subsequent invocation runs the saved arrangement instead of re-deriving it.

Each plan carries three kinds of information:

  • Shape is the set of dimensions that identify when the plan applies: question type, corpus scope, intent, whatever the matcher should key on.
  • Procedure is the DAG of operators to run: search, traverse, extract, rank, generate, and so on.
  • Metrics are the empirical record: match_count, success signals, latency, cost. How this plan has actually performed in the wild.

Five plans are seeded at nx catalog setup. Your own plans land in the same library as you author them. The library grows by capture rather than by re-derivation: each plan is a specific shape of question you’ve worked through and chosen to keep.

How a query finds a plan

Plan-first retrieval: match incoming queries against saved plans; fall through to dynamic planning on a miss.

plan_match is a short, direct lookup with three signal sources layered on top of each other. It filters the library by dimensions (structural match), re-ranks the filtered set semantically against the incoming query’s embedding, and falls back to FTS5 keyword search when dimension metadata is sparse. If a plan clears the threshold, it runs. If nothing does, the agent falls through to dynamic planning. Plan-first means try the library first, not library or nothing.

The three layers each catch a kind of miss the others don’t:

  • Dimension filter is fast and precise, but only as good as the plan’s shape metadata.
  • Semantic rerank handles the cases where dimensions underspecify, matching on concept similarity when the vocabulary has shifted.
  • FTS5 fallback catches the tail where neither dimensions nor semantics land a clean hit but the keyword surface exists. Typically when a plan’s description names the specific tools or corpus it works on.

It’s the same defense-in-depth shape Decisions as indexed data described for the cumulative-design corpus. No single mode is authoritative; they stack.

A walk through the decision-retrieval plan

Anatomy of the decision-retrieval plan from the worked example: shape, DAG, and metrics in one card.

The worked example from Decisions as indexed data was nx_answer("what did we decide about plan matching and why?"). That query matches a saved decision-retrieval plan. Its dimensions key on question-type decision-history, intent explanation, corpus scope rdr. Its DAG: search rdr__* collections, traverse the implements and cites links from the top results in parallel, rank the merged set, summarize the reasoning recorded in each linked RDR, cite back to chash:<sha256> spans. Six operators, arranged once and stored once, running again every time a similar question arrives.

When plan_match picked the plan for that demo query, its match_count ticked up. If a question arrives tomorrow asking why did we go with four storage tiers?, the same plan fires again and the counter ticks again. That usage history is what later tells us whether this plan is pulling its weight. Signal accrues where a one-shot derivation leaves no trace.

Scoping and bridging across corpora

The decision-retrieval plan is the simple case: one question-type, one corpus (RDRs). The more interesting case is where plans route across corpora that depend on each other, and stay out of corpora that don’t.

Four projects, four corpora. Delos is a distributed-systems framework. gpu-support is a shared GPU-compute library. Luciferase depends on both Delos (for membership, partitioning, and consensus primitives) and gpu-support (for compute). ART depends on gpu-support as well, but has no relationship to Delos or Luciferase.

Ask a question about Luciferase internals: a naive retrieval hits code__luciferase and stops. But Luciferase traces into Delos for membership, partitioning, and consensus, and into gpu-support for compute, so the right answer bridges into both. A Delos-only question, on the other hand, should not pull from ART or Luciferase. And a gpu-support question might helpfully bridge into both of its consumers (ART and Luciferase) to show real usage patterns.

Plans encode that routing as a first-class dimension:

  • luciferase-deep-dive bridges code__luciferase, code__Delos, knowledge__delos, and code__gpu-support, following both of Luciferase’s dependencies.
  • art-investigation bridges code__art and code__gpu-support, following ART’s single dependency.
  • delos-internals scopes to code__Delos and knowledge__delos, isolated from unrelated projects.
  • gpu-support-usage scopes to code__gpu-support and bridges into code__luciferase and code__art, for auditing usage patterns across consumers.

plan_match routes an incoming query by detecting which project the question is about (from dimensions, semantics, or keywords), and the matched plan’s corpus scope decides which collections get searched. Without plans, queries either fan out too broadly and return noisy, irrelevant hits, or stay too narrow and miss transitive dependencies into foundational libraries. Plans capture the cross-project topology once and apply it every time.

Promotion discipline

This is Nexus’s own minor addition to the pattern. The paper’s plans pool is curated by its authors; Nexus attaches empirical evidence to every plan and uses it to decide what stays.

Every plan accumulates match_count (how often it fires), success signals (whether the answer landed), and cost-and-latency profiles per run. Plans that match often and perform well stay. Plans that never match, or match poorly, get retired.

This is closer to the discipline RDRs use than to a pure cache. What makes a plan worth keeping isn’t that it’s stored; it’s that it solves a recurring query-shape well. The match counts are the evidence. Promotion is a judgment call informed by the numbers, the same way RDR acceptance is a judgment call informed by the research findings.

The corollary is that adding a plan isn’t a no-cost action. A plan that overlaps poorly with an existing one, matching queries the older plan already handles or handling them worse, makes the library noisier. A small library of plans that each earn their keep is more useful than an ever-growing pile that slowly dilutes the matcher’s precision.

When replanning is still the right call

Plan-first isn’t plan-only. The ad-hoc fallback comes directly from the paper: when plan_match returns nothing above threshold, the agent falls through to dynamic planning. That’s the right behavior for novel query shapes, for one-off questions, for cases where the library genuinely doesn’t have the right tool.

The point isn’t to replace the planner. It’s to route the common shapes away from the planner, so replanning is reserved for the questions that genuinely need it. Most analytical queries in a long-lived project inevitably turn out to be variations on a handful of shapes. Those are where saved plans earn their keep, and where semantic matching across surface variations does the work of recognizing the shape. The long tail still gets the full LLM planner. And when a tail query turns out to be the first instance of a new recurring shape, it’s often worth capturing the resulting plan for next time. That’s actually how the library grows: from the moments where the planner worked from scratch and the result turned out to be worth saving.

Going deeper

What’s next

Up next: Operators as building blocks. Plans are DAGs; DAGs are composed of operators. That post unpacks what an operator actually is, why they’re kept small and single-purpose, and what composition unlocks when you start stacking them.

Follow along with the same setup as Nexus by Example: uv tool install conexus, nx doctor, nx index repo . inside any git checkout. Then nx catalog setup to seed the builtin plans, and nx plan list to see what’s in the library.



Leave a Reply

Discover more from Tensegrity

Subscribe now to keep reading and get access to the full archive.

Continue reading