Practice — day 12.

7 questions · ~12 min · diagrams · recall · synthesis · gaps

3 / 7

diagram · learn by seeing

clusteragentic loops last reviewed6d est. gain+2.1 pts

question 03 · read the diagram. write the answer.

In this loop, why does the soft-commit boundary make rollback unnecessary?

↳ diagram · agentic loops drawn by supadense · from your 11 fragments

committed (visible) your answer goes here ↳ generated from frag 0421 · 0612 · 0843

your answer draft saves as you type

why this you've captured this shape three times — in postgres MVCC, in speculative decoding, and last week in DeepSeek's paper. supadense believes you can name the pattern but hasn't seen you write it down. name it once and you'll never lose it.

↵submit ⌘↵submit & reveal answer ⌥Sskip — i don't know

answered today · 2 ↑ retention +1.6 pts

94

"What's the core difference between read-write loops and classical RAG?" — agentic loops

↑ d30 +3.2

2 min

62

"Why does speculative decoding hurt at batch > 8?" — long-context

↑ d30 +0.8

5 min

retention forecast d30 · all clusters

82.4% ↑ 4.1 pts · 30d

today's questions add ≈ +2.8 pts

up next · 4 ≈ 8 min

04 "What did past-you decide" about per-user vs per-session retrieval state? recall

05 Label the missing block in your RAG retrieval pipeline diagram. diagram

06 Your io_uring note is from q1. What changed after DeepSeek-V3 shipped? stale

07 Sketch your current mental model for read-write eviction. gap

cadencesteady · 12 min/day

streak12 days

next break in3 q · ≈ 6 min

Gaps in your learning.

14 open · across 6 clusters · 4 marked severe

14

open · −2 this week

view cluster

severe · you've answered wrong twice in a row 4 gaps · est. +6.8 pts

The soft-commit boundary in agentic loops — when does the rollback window close?

You've described it twice as "write then commit" — but the boundary is the read-share moment, not the write. Past-you noted this in frag 0843 in June.

agentic loops last surfaced · 2d 3 connected fragments wrong · 2/2

est. gain +2.1 pts

Why does RAG memory need both BM25 and dense recall — what fails if you drop one?

You've conflated lexical drop-out with embedding miss in three conversations. The failure mode is different — rare nouns vs. paraphrase. You wrote about it in May; field has since moved.

RAG & embeddings last surfaced · 5d 6 connected fragments wrong · 2/2

est. gain +1.9 pts

Speculative decoding — why the draft model's size matters more than its accuracy.

You've twice optimised the wrong knob. The draft model is throughput-bound, not quality-bound. Frag 0612 spells out the math; you haven't opened it since Apr.

agentic loops last surfaced · today 4 connected fragments wrong · 2/2

est. gain +1.5 pts

MVCC vs. soft-commit — same abstraction or different?

You've described them as siblings. Past-you, in postgres MVCC, treated them as siblings too — and then read a paper that said otherwise. The note never updated.

postgres internals last surfaced · yesterday 8 connected fragments wrong · 2/2

est. gain +1.3 pts

drift · forgetting, not yet wrong 7 gaps · est. +4.4 pts

When does prompt caching stop being free?

You captured this in March. You haven't surfaced it in 41 days. Forgetting curve says ~38% recall now.

prompt cache last seen · 41d 2 connected fragments recall · 38%

est. gain +0.8 pts

The two-pointer pattern for streaming joins — what's the invariant?

You shipped this in your own code in Feb. Six months without surfacing. The pattern is in frag 0421.

rust async last seen · 6mo 1 connected fragment recall · 28%

est. gain +0.7 pts

Long-context attention — when do you need NTK scaling vs. YaRN?

You've read three papers on this since Jan. Each one nudged the threshold. Your note doesn't reflect the latest.

long-context last seen · 19d 5 connected fragments recall · 52%

est. gain +0.6 pts

Why does HNSW beat IVF on small datasets — what's the crossover?

You hand-wave the answer. Past-you wrote it precisely. Recall says you'll forget the precise version in ~9 days.

RAG & embeddings last seen · 28d 3 connected fragments recall · 48%

est. gain +0.6 pts

Postgres HOT updates — when does the heap-only optimisation fall back to index update?

You almost got this in your own reply on a HN thread last week. Almost.

postgres internals last seen · 7d 4 connected fragments recall · 61%

est. gain +0.6 pts

RAG re-ranking — when is cross-encoder overkill?

You captured a paper claiming bi-encoder is enough above a certain k. You couldn't quote the k yesterday.

RAG & embeddings last seen · 22d 2 connected fragments recall · 44%

est. gain +0.5 pts

Tool-call loops — bound by depth or by token budget?

You've used both bounds in production. Your note picks one and never explains why.

agentic loops last seen · 14d 2 connected fragments recall · 55%

est. gain +0.5 pts

stale · the field moved, your note didn't 3 gaps · refresh recommended

Your long-context note predates DeepSeek-R2's RoPE variant.

Note last touched 47 days ago. Two new papers since. arxiv 2510.08812 directly contradicts a claim in your summary.

stale · 47d long-context · note 03 2 papers since

2 new fragments waiting

io_uring — your note treats it as Linux-only.

Capability shipped on macOS via libdispatch wrapper in March. Your saved fragment is from a 2023 blog post.

stale · 84d rust async · note 01 1 paper since

1 new fragment waiting

Your embeddings cost curve is pre-text-embedding-3-small.

You wrote it when ada-002 was the default. The cost-quality math is meaningfully different now.

stale · 6mo RAG & embeddings · note 02 3 papers since

3 new fragments waiting

if you practice all severe +6.8 pts ≈ 22 minutes. Closes 4 high-confidence model errors.

if you practice all drift +4.4 pts ≈ 31 minutes. Holds 7 concepts above 70% recall.

if you refresh stale notes 6 new frags ≈ 8 minutes of light review. No quiz; just read & merge.

Today's queue.

7 surfacings · 12 minutes · adapted from a staff veteran cadence.

tue · 9:14 am · ist

fragments

847

+ 12 this week

retention · d30

82.4%

↑ 4.1 pts vs last month

connections

3.2×

edges per fragment

streak

12d

longest yet

Your queue · 7 items

est. 12 min

Past-you wondered about RAG memory three times this month — never landed a decision.

review · 4 connected fragments · arxiv 2509.11240

2h

Your LLM scaling note is outdated — a new approach landed last week.

stale · arxiv 2510.08812 · deepseek-r2

tue

Connected your agentic RAG note to 3 things you already know.

graph · auto · 3 edges drawn at 04:12

12d

Quiz ready — 3 questions, 2 minutes. Mixture of experts.

retention · spaced · score so far: 78%

12d

New paper in your stack: "speculative decoding at scale."

stack · arxiv · matched 6 saved notes

fri

Activity

last 20 days

12d streak

your longest yet · break it tomorrow

Recent captures

+ 12 this week

Speculative decoding without rollbacks

arxiv · 2511.14201· 4 connections

io_uring vs epoll: a 2026 update

blog · simonw· 2 connections

Postgres MERGE vs UPSERT

docs · postgres· stale (q3)

Vector store benchmarks: pinecone v3

github · readme· 1 connection

active (last 7 days)

retained

stale · field moved

gap (asked, no answer)

dense knowledge · this graph edges per fragment

fragments 847

density 3.2×

↑ 30d +0.8

RAG memory3.8

Agentic loops3.4

Embeddings2.7

Postgres2.1

Long-context1.4

io_uring0.5

selected · fragment 0421

RAG memory

A arxiv 2509.11240 · captured 12d ago · 4 reviews

The persistent-context problem. "Past-you" wondered three separate times whether to store retrieval state per-session or per-user. Each time you read a paper. Each time you wrote half a Notion page. Each time it went into the graveyard.

7 connections drawn auto

Vector store benchmarks · pinecone v3

your note · jul 11 · 2025

Embeddings: bi-encoder vs cross-encoder

arxiv · 2401.06732 · captured q1

Long-context windows past 1M

deepseek · 2026 · stale ↻

Agentic RAG: read-write loops

your project · open thread

Prompt cache for repeated queries

blog · simonw · captured 4d

arxiv · 2511.14201 · live capture

Speculative decoding
without rollbacks.

Chen, Park, & al. · DeepSeek labs · 14d ago

Speculative decoding accelerates large language model inference by drafting candidate tokens with a smaller model and verifying them with the target model. Existing approaches suffer from a fundamental tension: aggressive drafting produces high speedup but requires expensive rollbacks when speculation fails, while conservative drafting wastes the parallel verification budget.

We introduce a rollback-free variant that treats every draft as a soft commit — verification reshuffles the speculation tree in place without discarding context. Across LLaMA-3 70B, DeepSeek-V3, and Qwen-72B, we observe 2.1–3.4× speedup over vanilla speculative decoding, with no degradation in output quality.

1.1 The rollback problem

Prior work on speculative decoding assumes a strict accept-reject regime: every draft token is either accepted whole or fully rejected. This over-commits to draft-model uncertainty. In production traffic, we measure rollback rates of 18–34% for non-trivial prompts — meaning a third of draft work is, on average, thrown away.

The rollback cost compounds with model size. For 70B+ models the verifier KV cache must be rewound across multiple layers, which dominates inference cost beyond batch size 8.

capturing · live · 3 fragments

From this paper

supadense is parsing as you read

→ frag 0843 just now

"Aggressive drafting produces high speedup but requires expensive rollbacks when speculation fails."

+ new concept RAG memory · 2

→ frag 0844 4s ago

"Treats every draft as a soft commit — verification reshuffles the speculation tree in place."

+ 1 connection speculative decoding · q3

→ frag 0845 12s ago

"In production traffic, we measure rollback rates of 18–34% for non-trivial prompts."

+ 3 connections LLM inference prod metrics

read time04:12

review scheduledin 3 days

est. retention · d3088%

synthesised · 11 fragments · 3 days ago ↻ regenerate

Read-write loops in
agentic systems.

How agents remember between turns — and where the abstraction leaks.

diagram · generated 3d ago ↻ regenerate

read it as every tool output is a soft commit — the next planner reads from the same store, and the loop doesn't need to roll back to stay coherent.

01What it is

An agentic system that read-write loops treats its memory as a first-class data structure, not an after-thought. Every tool call writes back into the same memory store the planner reads from on the next turn → frag 0421 — closing the gap between "what I just learned" and "what I think about next."

This differs from purely read-only retrieval (classical RAG), where the model fetches from a static corpus and discards what it generated → frag 0312. Read-write loops make the agent's own outputs part of the retrievable surface, which is what makes long-horizon tasks viable past ~10 turns.

key insight

The bottleneck on long agentic tasks isn't model size or context length — it's memory plasticity. Most architectures freeze writes to keep retrieval predictable. The recent shift is to make the write path cheap enough to do every turn.

02What surprised past-you

You started reading this from a different angle — "how do agents avoid retracing their own steps?" The answer turned out to be the same as the memory question, just shaped differently.

"Treats every draft as a soft commit — verification reshuffles the speculation tree in place."

— DeepSeek labs · arxiv 2511.14201 · captured 14d ago

The same primitive — → frag 0843 soft-commit memory — appears in three places you weren't expecting:

Speculative decoding: draft tokens are commits, verification reshuffles. No rollback needed.
Agentic planning: tool outputs are commits, the next plan reshuffles. Same shape.
Postgres MVCC: every write is a soft commit until VACUUM. Old idea, new context. → frag 0612

↳ speculative decoding

draft tokens are commits. verification keeps the good ones, no rewind.

↳ agentic planning

tool outputs land in memory; the next plan re-orders without erasing.

↳ postgres MVCC

every write is a new version. reads see snapshots. only VACUUM forgets.

past-you · jun 14

"I bet the same trick works for embedding-time updates. Worth a re-read of MVCC." — and the trick did work, six months later. Supadense surfaced this annotation when you read the DeepSeek paper.

03Open threads

How do you bound the write rate? Pinecone v3 made per-user writes 4× cheaper, but the unit economics still flip at ~50 writes/turn. → frag 0721
What does eviction look like? No paper you've captured talks about which memories to forget. Likely an unanswered gap. ? unanswered
Does your prompt-cache note apply here? simonw's piece argued caching solves hot-context. That's half of read-write. Worth merging into this page.

connected · this page 11 frags · 7 edges

edges out · read-write architectures

source fragments 11

→ frag 0421 · rag memory · jun 14

"Per-session for hot context, per-user for retrieval state, with TTLs that age based on retrieval recency."

your note2 backlinks

→ frag 0843 · deepseek · 14d

"Treats every draft as a soft commit — verification reshuffles the speculation tree in place."

arxiv 2511.142013 backlinks

→ frag 0612 · postgres · q1

"MVCC: every write is a soft commit until VACUUM. Reads see snapshots, not state."

docs · postgres1 backlink

→ frag 0721 · pinecone · 4d

v3 cuts per-user write cost 4× — unit economics still flip at ~50 writes/turn.

readme · pineconefresh

→ frag 0312 · classical rag · q3

"Static corpus, read-only retrieval, no write path. The simplest agent dies at turn ten."

your notestale ↻

you

what did i decide about persistent rag memory last time i looked at this?

past-you

You looked at this three times — June, August, and last week. Each time you stopped before deciding. The most recent thinking, from your RAG memory note:

"Per-user is right for personalization, but the storage cost is real. I think the answer is a hybrid — per-session for hot context, per-user for retrieval state, with TTLs that age based on retrieval recency."

Three things shifted since you wrote that:

· DeepSeek-V3 dropped a paper on the same hybrid pattern at scale (2511.08812).

· Your prompt-cache note (4d ago, simonw blog) overlaps directly — caching solves the hot-context half.

· Pinecone v3 made per-user 4× cheaper. Your benchmarks note is now stale.

→ RAG memory · your note → arxiv 2511.08812 → simonw · prompt cache → pinecone v3 benchmarks

you

surface the deepseek paper next time i open today

past-you

Pinned to tomorrow's queue. I'll also re-rank the prompt-cache note above your stale pinecone benchmarks — the field has moved, the recommendation should follow.

✓ scheduled · wed 9:14 am

↵ to ask ⌥↑ for history answers cite your fragments · never the open web

Practice — day 12.

In this loop, why does the soft-commit boundary make rollback unnecessary?

Today's queue.

Your queue · 7 items

Activity

Recent captures

RAG memory

Speculative decodingwithout rollbacks.

1.1 The rollback problem

From this paper

Read-write loops inagentic systems.

01What it is

02What surprised past-you

03Open threads

Speculative decoding
without rollbacks.

Read-write loops in
agentic systems.