workspace · practice

Practice — day 12.

7 questions · ~12 min · diagrams · recall · synthesis · gaps
3 / 7
diagram · learn by seeing
clusteragentic loops last reviewed6d est. gain+2.1 pts
question 03 · read the diagram. write the answer.

In this loop, why does the soft-commit boundary make rollback unnecessary?

↳ diagram · agentic loops drawn by supadense · from your 11 fragments
Planner · turn t reads memory Tool call search · code · fetch Planner · t+1 reads memory write MEMORY · ? ? ? read ? the soft-commit boundary
committed (visible) your answer goes here ↳ generated from frag 0421 · 0612 · 0843
your answer draft saves as you type
why this you've captured this shape three times — in postgres MVCC, in speculative decoding, and last week in DeepSeek's paper. supadense believes you can name the pattern but hasn't seen you write it down. name it once and you'll never lose it.
submit ⌘↵submit & reveal answer ⌥Sskip — i don't know
answered today · 2 ↑ retention +1.6 pts
94
"What's the core difference between read-write loops and classical RAG?" — agentic loops
↑ d30 +3.2
2 min
62
"Why does speculative decoding hurt at batch > 8?" — long-context
↑ d30 +0.8
5 min

Gaps in your learning.

14 open · across 6 clusters · 4 marked severe
14
open · −2 this week
view cluster
severe · you've answered wrong twice in a row 4 gaps · est. +6.8 pts
The soft-commit boundary in agentic loops — when does the rollback window close?
You've described it twice as "write then commit" — but the boundary is the read-share moment, not the write. Past-you noted this in frag 0843 in June.
agentic loops last surfaced · 2d 3 connected fragments wrong · 2/2
est. gain +2.1 pts
Why does RAG memory need both BM25 and dense recall — what fails if you drop one?
You've conflated lexical drop-out with embedding miss in three conversations. The failure mode is different — rare nouns vs. paraphrase. You wrote about it in May; field has since moved.
RAG & embeddings last surfaced · 5d 6 connected fragments wrong · 2/2
est. gain +1.9 pts
Speculative decoding — why the draft model's size matters more than its accuracy.
You've twice optimised the wrong knob. The draft model is throughput-bound, not quality-bound. Frag 0612 spells out the math; you haven't opened it since Apr.
agentic loops last surfaced · today 4 connected fragments wrong · 2/2
est. gain +1.5 pts
MVCC vs. soft-commit — same abstraction or different?
You've described them as siblings. Past-you, in postgres MVCC, treated them as siblings too — and then read a paper that said otherwise. The note never updated.
postgres internals last surfaced · yesterday 8 connected fragments wrong · 2/2
est. gain +1.3 pts
drift · forgetting, not yet wrong 7 gaps · est. +4.4 pts
When does prompt caching stop being free?
You captured this in March. You haven't surfaced it in 41 days. Forgetting curve says ~38% recall now.
prompt cache last seen · 41d 2 connected fragments recall · 38%
est. gain +0.8 pts
The two-pointer pattern for streaming joins — what's the invariant?
You shipped this in your own code in Feb. Six months without surfacing. The pattern is in frag 0421.
rust async last seen · 6mo 1 connected fragment recall · 28%
est. gain +0.7 pts
Long-context attention — when do you need NTK scaling vs. YaRN?
You've read three papers on this since Jan. Each one nudged the threshold. Your note doesn't reflect the latest.
long-context last seen · 19d 5 connected fragments recall · 52%
est. gain +0.6 pts
Why does HNSW beat IVF on small datasets — what's the crossover?
You hand-wave the answer. Past-you wrote it precisely. Recall says you'll forget the precise version in ~9 days.
RAG & embeddings last seen · 28d 3 connected fragments recall · 48%
est. gain +0.6 pts
Postgres HOT updates — when does the heap-only optimisation fall back to index update?
You almost got this in your own reply on a HN thread last week. Almost.
postgres internals last seen · 7d 4 connected fragments recall · 61%
est. gain +0.6 pts
RAG re-ranking — when is cross-encoder overkill?
You captured a paper claiming bi-encoder is enough above a certain k. You couldn't quote the k yesterday.
RAG & embeddings last seen · 22d 2 connected fragments recall · 44%
est. gain +0.5 pts
Tool-call loops — bound by depth or by token budget?
You've used both bounds in production. Your note picks one and never explains why.
agentic loops last seen · 14d 2 connected fragments recall · 55%
est. gain +0.5 pts
stale · the field moved, your note didn't 3 gaps · refresh recommended
Your long-context note predates DeepSeek-R2's RoPE variant.
Note last touched 47 days ago. Two new papers since. arxiv 2510.08812 directly contradicts a claim in your summary.
stale · 47d long-context · note 03 2 papers since
2 new fragments waiting
io_uring — your note treats it as Linux-only.
Capability shipped on macOS via libdispatch wrapper in March. Your saved fragment is from a 2023 blog post.
stale · 84d rust async · note 01 1 paper since
1 new fragment waiting
Your embeddings cost curve is pre-text-embedding-3-small.
You wrote it when ada-002 was the default. The cost-quality math is meaningfully different now.
stale · 6mo RAG & embeddings · note 02 3 papers since
3 new fragments waiting

Today's queue.

7 surfacings · 12 minutes · adapted from a staff veteran cadence.
tue · 9:14 am · ist
fragments
847
+ 12 this week
retention · d30
82.4%
↑ 4.1 pts vs last month
connections
3.2×
edges per fragment
streak
12d
longest yet

Your queue · 7 items

est. 12 min
Past-you wondered about RAG memory three times this month — never landed a decision.
review · 4 connected fragments · arxiv 2509.11240
2h
Your LLM scaling note is outdated — a new approach landed last week.
stale · arxiv 2510.08812 · deepseek-r2
tue
Connected your agentic RAG note to 3 things you already know.
graph · auto · 3 edges drawn at 04:12
12d
Quiz ready — 3 questions, 2 minutes. Mixture of experts.
retention · spaced · score so far: 78%
12d
New paper in your stack: "speculative decoding at scale."
stack · arxiv · matched 6 saved notes
fri

Activity

last 20 days
12d streak
your longest yet · break it tomorrow

Recent captures

+ 12 this week
Speculative decoding without rollbacks
arxiv · 2511.14201· 4 connections
io_uring vs epoll: a 2026 update
blog · simonw· 2 connections
Postgres MERGE vs UPSERT
docs · postgres· stale (q3)
Vector store benchmarks: pinecone v3
github · readme· 1 connection
focus ·
Open: retrieval (4 pages) Open: embeddings (4 pages) Open: long-context (3 pages) Open: agentic loops (5 pages) Open: postgres internals (4 pages) Open: prompt cache (2 pages) Open: vector dbs (3 pages) RAG memory retrieval embeddings long-context tool-use agentic loops prompt cache vector dbs
active (last 7 days)
retained
stale · field moved
gap (asked, no answer)
dense knowledge · this graph edges per fragment
fragments 847
density 3.2×
↑ 30d +0.8
RAG memory3.8
Agentic loops3.4
Embeddings2.7
Postgres2.1
Long-context1.4
io_uring0.5
selected · fragment 0421

RAG memory

A arxiv 2509.11240 · captured 12d ago · 4 reviews
The persistent-context problem. "Past-you" wondered three separate times whether to store retrieval state per-session or per-user. Each time you read a paper. Each time you wrote half a Notion page. Each time it went into the graveyard.
7 connections drawn auto
Vector store benchmarks · pinecone v3
your note · jul 11 · 2025
Embeddings: bi-encoder vs cross-encoder
arxiv · 2401.06732 · captured q1
Long-context windows past 1M
deepseek · 2026 · stale ↻
Agentic RAG: read-write loops
your project · open thread
Prompt cache for repeated queries
blog · simonw · captured 4d
arxiv · 2511.14201 · live capture

Speculative decoding
without rollbacks.

Chen, Park, & al. · DeepSeek labs · 14d ago

Speculative decoding accelerates large language model inference by drafting candidate tokens with a smaller model and verifying them with the target model. Existing approaches suffer from a fundamental tension: aggressive drafting produces high speedup but requires expensive rollbacks when speculation fails, while conservative drafting wastes the parallel verification budget.

We introduce a rollback-free variant that treats every draft as a soft commit — verification reshuffles the speculation tree in place without discarding context. Across LLaMA-3 70B, DeepSeek-V3, and Qwen-72B, we observe 2.1–3.4× speedup over vanilla speculative decoding, with no degradation in output quality.

1.1 The rollback problem

Prior work on speculative decoding assumes a strict accept-reject regime: every draft token is either accepted whole or fully rejected. This over-commits to draft-model uncertainty. In production traffic, we measure rollback rates of 18–34% for non-trivial prompts — meaning a third of draft work is, on average, thrown away.

The rollback cost compounds with model size. For 70B+ models the verifier KV cache must be rewound across multiple layers, which dominates inference cost beyond batch size 8.

capturing · live · 3 fragments

From this paper

supadense is parsing as you read
→ frag 0843 just now
"Aggressive drafting produces high speedup but requires expensive rollbacks when speculation fails."
→ frag 0844 4s ago
"Treats every draft as a soft commit — verification reshuffles the speculation tree in place."
→ frag 0845 12s ago
"In production traffic, we measure rollback rates of 18–34% for non-trivial prompts."
read time04:12
review scheduledin 3 days
est. retention · d3088%
synthesised · 11 fragments · 3 days ago ↻ regenerate

Read-write loops in
agentic systems.

How agents remember between turns — and where the abstraction leaks.
diagram · generated 3d ago ↻ regenerate
TURN t–1 TURN t · NOW TURN t+1 Planner reads memory Planner reads memory Planner next turn Tool call search · code · fetch Tool call writes back Tool call write soft-commit MEMORY · fragments + soft commits ↓ NOW read on next plan (no rollback — reshuffle) soft commit prior turns · retained future · not yet written supadense · generated by gpt-vis
read it as every tool output is a soft commit — the next planner reads from the same store, and the loop doesn't need to roll back to stay coherent.

01What it is

An agentic system that read-write loops treats its memory as a first-class data structure, not an after-thought. Every tool call writes back into the same memory store the planner reads from on the next turn → frag 0421 — closing the gap between "what I just learned" and "what I think about next."

This differs from purely read-only retrieval (classical RAG), where the model fetches from a static corpus and discards what it generated → frag 0312. Read-write loops make the agent's own outputs part of the retrievable surface, which is what makes long-horizon tasks viable past ~10 turns.

key insight
The bottleneck on long agentic tasks isn't model size or context length — it's memory plasticity. Most architectures freeze writes to keep retrieval predictable. The recent shift is to make the write path cheap enough to do every turn.

02What surprised past-you

You started reading this from a different angle — "how do agents avoid retracing their own steps?" The answer turned out to be the same as the memory question, just shaped differently.

"Treats every draft as a soft commit — verification reshuffles the speculation tree in place."
— DeepSeek labs · arxiv 2511.14201 · captured 14d ago

The same primitive — → frag 0843 soft-commit memory — appears in three places you weren't expecting:

  • Speculative decoding: draft tokens are commits, verification reshuffles. No rollback needed.
  • Agentic planning: tool outputs are commits, the next plan reshuffles. Same shape.
  • Postgres MVCC: every write is a soft commit until VACUUM. Old idea, new context. → frag 0612
↳ speculative decoding
draft → verify → reshuffle
draft tokens are commits. verification keeps the good ones, no rewind.
↳ agentic planning
tool outputs land in memory; the next plan re-orders without erasing.
↳ postgres MVCC
t0 vacuum ↓ now
every write is a new version. reads see snapshots. only VACUUM forgets.
past-you · jun 14
"I bet the same trick works for embedding-time updates. Worth a re-read of MVCC." — and the trick did work, six months later. Supadense surfaced this annotation when you read the DeepSeek paper.

03Open threads

  • How do you bound the write rate? Pinecone v3 made per-user writes 4× cheaper, but the unit economics still flip at ~50 writes/turn. → frag 0721
  • What does eviction look like? No paper you've captured talks about which memories to forget. Likely an unanswered gap. ? unanswered
  • Does your prompt-cache note apply here? simonw's piece argued caching solves hot-context. That's half of read-write. Worth merging into this page.
you
what did i decide about persistent rag memory last time i looked at this?
past-you
You looked at this three times — June, August, and last week. Each time you stopped before deciding. The most recent thinking, from your RAG memory note:
"Per-user is right for personalization, but the storage cost is real. I think the answer is a hybrid — per-session for hot context, per-user for retrieval state, with TTLs that age based on retrieval recency."
Three things shifted since you wrote that:
· DeepSeek-V3 dropped a paper on the same hybrid pattern at scale (2511.08812).
· Your prompt-cache note (4d ago, simonw blog) overlaps directly — caching solves the hot-context half.
· Pinecone v3 made per-user 4× cheaper. Your benchmarks note is now stale.
RAG memory · your note arxiv 2511.08812 simonw · prompt cache pinecone v3 benchmarks
you
surface the deepseek paper next time i open today
past-you
Pinned to tomorrow's queue. I'll also re-rank the prompt-cache note above your stale pinecone benchmarks — the field has moved, the recommendation should follow.
✓ scheduled · wed 9:14 am
↵ to ask ↑ for history answers cite your fragments · never the open web