Skip to content

Commit df16cf0

Browse files
committed
remove comments
1 parent 5087800 commit df16cf0

File tree

1 file changed

+0
-106
lines changed

1 file changed

+0
-106
lines changed

website/blog/posts/2026-04-08-building-a-collaborative-ai-editor.md

Lines changed: 0 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -14,29 +14,19 @@ post: true
1414
published: true
1515
---
1616

17-
<!-- TLDR: State the point immediately. No setup, no preamble. Technical
18-
audience wants to know what this is and why it matters in under
19-
10 seconds. -->
20-
2117
I came to sync engines through collaborative editing. Since AI agents became part of my daily workflow, I've had an earworm: what would it look like to integrate an AI agent into a Yjs rich text editing flow — not as a sidebar that dumps text, but as a real participant with its own cursor, presence, and streaming edits?
2218

2319
This post walks through how I built a [Collaborative AI Editor](https://collaborative-ai-editor.examples.electric-sql.com) demo — a [TanStack Start](https://tanstack.com/start) app with a [ProseMirror](https://prosemirror.net)/Yjs editor and an AI chat sidebar. It uses [Durable&nbsp;Streams](https://durablestreams.com) as the single transport layer for both [Yjs](https://yjs.dev) document collaboration and [TanStack&nbsp;AI](https://tanstack.com/ai) chat sessions. Two integrations, one primitive, and the AI becomes a genuine CRDT peer.
2420

2521
> [!Warning] Collaborative AI Editor demo
2622
> Try the [live demo](https://collaborative-ai-editor.examples.electric-sql.com) and browse the [source code](https://github.com/electric-sql/collaborative-ai-editor).
2723
28-
<!-- ASSET: Video embed of the full demo experience -->
29-
3024
<div style="border: 1px solid #555; border-radius: 8px; padding: 48px 24px; text-align: center; color: #888; margin: 24px 0;">
3125
<p style="font-size: 48px; margin: 0;">&#9654;</p>
3226
<p style="font-size: 16px; margin: 8px 0 4px;"><strong>TODO: YouTube video embed</strong></p>
3327
<p style="font-size: 13px; margin: 0;">Full demo walkthrough — editor + chat sidebar + Electra editing in real-time</p>
3428
</div>
3529

36-
<!-- SITUATION: Head-nodding statements. The reader already believes these
37-
things. Establish shared reality — no persuasion, just recognition.
38-
Sam's personal context grounds this in lived experience, not theory. -->
39-
4030
## The natural intersection
4131

4232
AI-assisted writing and editing is everywhere. ChatGPT Canvas, Cursor, Notion AI all let an AI modify a document alongside you. Software engineers see this more clearly than most: we already have agents editing our code files daily, working alongside us in the same codebase.
@@ -45,11 +35,6 @@ At the same time, real-time collaboration is table stakes for productivity tools
4535

4636
The overlap is clear. We already have the tools for multiple people to edit a document together in real-time. If we treat an AI agent as just another peer in that system, we get collaborative AI editing without reinventing the wheel. The agent joins the same document, gets its own cursor, and edits alongside you using the same CRDT infrastructure that already handles conflict resolution between humans.
4737

48-
<!-- COMPLICATION: Introduce tension. The reader should feel "yes, that's
49-
my problem." This is grounded in what Sam actually encountered and
50-
observed — not theoretical concerns. The diff/patch vs CRDT structure
51-
point is the sharpest technical hook. -->
52-
5338
## But the integration is painful
5439

5540
The technology for building collaborative editors has evolved. Earlier editors like Google Docs used operational transforms (OT), which require a central server to resolve conflicts. CRDTs removed that constraint, and [Yjs](https://yjs.dev) has become the dominant toolkit for building CRDT-based editors today. But building a collaborative AI editor on top of Yjs means integrating several separate real-time systems: CRDT sync for the document, token streaming for the AI, and presence/awareness, each with its own transport, connection lifecycle, persistence, and failure handling.
@@ -60,22 +45,10 @@ On top of that, most approaches rely on client-side tool calls to edit the docum
6045

6146
Put it all together and you're looking at one protocol for Yjs, another for AI streaming, custom persistence for chat history, and separate reconnection logic for each layer. Every system fails independently. It's a lot of moving parts for something that should feel simple.
6247

63-
<!-- STYLE: The transition from complication to answer should feel natural —
64-
Sam worked on Durable Streams, then the Yjs integration, then the
65-
TanStack AI integration. Gluing them together was the obvious next step. -->
66-
6748
Having worked on Durable&nbsp;Streams, then built the [Yjs integration](https://durablestreams.com/yjs), then the [TanStack&nbsp;AI integration](https://durablestreams.com/tanstack-ai), the natural question was: what if these two integrations shared the same infrastructure?
6849

69-
<!-- ANSWER SECTIONS: Each ## is a component of the answer. Order by
70-
importance. Show, don't tell — code and examples over assertions. -->
71-
7250
## Three streams, one primitive
7351

74-
<!-- This section establishes the architectural foundation. The reader
75-
should understand the "shape" of the system before diving into
76-
specifics. Keep it concrete — what are the three streams, how do
77-
they connect. -->
78-
7952
[Durable&nbsp;Streams](https://durablestreams.com) is a persistent, addressable HTTP streaming protocol. Writers append data to a stream, subscribers consume it, and any client can catch up from its last offset at any time. The protocol handles reconnection and delivery automatically.
8053

8154
The Collaborative AI Editor uses three Durable&nbsp;Streams per document:
@@ -90,52 +63,10 @@ The [`@durable-streams/tanstack-ai-transport`](https://durablestreams.com/tansta
9063

9164
Both integrations share the same underlying protocol. No WebSocket servers to manage, no separate persistence layer to build, no custom reconnection logic. For local development, `@durable-streams/server` runs with file-backed storage. In production, you can use [Durable&nbsp;Streams Cloud](https://durablestreams.com) or self-host.
9265

93-
<!-- Architecture diagram -->
9466
<img src="/img/blog/building-a-collaborative-ai-editor/architecture.svg" alt="Architecture diagram showing the browser client and server agent both connecting to three Durable Streams: a Yjs document stream, a Yjs awareness stream, and a TanStack AI chat stream" />
9567

96-
<!-- ASCII fallback (replaced by SVG above):
97-
┌─────────────────────┐ ┌──────────────────────┐
98-
│ Browser Client │ │ Server Agent │
99-
│ │ │ ("Electra") │
100-
│ ┌───────────────┐ │ │ │
101-
│ │ ProseMirror │ │ │ ┌────────────────┐ │
102-
│ │ + Yjs Editor │ │ │ │ Server Y.Doc │ │
103-
│ └───────┬───────┘ │ │ └───────┬────────┘ │
104-
│ │ │ │ │ │
105-
│ ┌───────┴───────┐ │ │ ┌───────┴────────┐ │
106-
│ │ YjsProvider │ │ │ │ YjsProvider │ │
107-
│ └───┬───────┬───┘ │ │ └──┬────────┬────┘ │
108-
│ │ │ │ │ │ │ │
109-
│ ┌───┴───┐ │ │ │ │ ┌────┴────┐ │
110-
│ │useChat│ │ │ │ │ │ chat() │ │
111-
│ └───┬───┘ │ │ │ │ └────┬────┘ │
112-
└──────┼───────┼──────┘ └─────┼────────┼───────┘
113-
│ │ │ │
114-
│ │ Durable Streams │ │
115-
───────┼───────┼───────────────────────────┼────────┼────────
116-
│ │ │ │
117-
┌────┴───────┴───────────────────────────┴────────┴─────┐
118-
│ │
119-
│ ┌─────────────────────────────────────────────────┐ │
120-
│ │ Stream: /doc/{id}/yjs (Yjs document) │ │
121-
│ └─────────────────────────────────────────────────┘ │
122-
│ ┌─────────────────────────────────────────────────┐ │
123-
│ │ Stream: /doc/{id}/yjs/awareness (Yjs awareness)│ │
124-
│ └─────────────────────────────────────────────────┘ │
125-
│ ┌─────────────────────────────────────────────────┐ │
126-
│ │ Stream: /doc/{id}/chat (TanStack AI) │ │
127-
│ └─────────────────────────────────────────────────┘ │
128-
│ │
129-
│ Durable Streams Server │
130-
└───────────────────────────────────────────────────────┘
131-
-->
132-
13368
## The AI as a CRDT peer
13469

135-
<!-- The key architectural decision. The reader should understand WHY
136-
server-side matters, not just that it's server-side. The contrast
137-
with client-side tool calls is the hook. -->
138-
13970
The key architectural choice: the AI agent is a server-side Yjs peer, not a client-side bolt-on. On the server, the agent opens its own Yjs document and connects to the same Durable Stream as the human editors. It's just another participant in the room. In the demo, the agent is called "Electra".
14071

14172
From the human user's perspective, Electra looks like any other collaborator:
@@ -146,19 +77,13 @@ From the human user's perspective, Electra looks like any other collaborator:
14677

14778
The agent doesn't manipulate the Yjs document directly. It works through tool calls: the AI model decides what to do, a runtime on the server translates those tool calls into Yjs operations, and the CRDT sync propagates the changes to all connected clients. Because the agent is a server-side peer, it can edit the document whether or not any browser has it open.
14879

149-
<!-- ASSET: Screenshot or short video showing Electra's cursor and
150-
presence in the editor alongside a human user -->
151-
15280
<figure>
15381
<video class="w-full" autoplay loop muted playsinline preload="metadata">
15482
<source src="/videos/blog/building-a-collaborative-ai-editor/multiple-cursors.mp4" type="video/mp4" />
15583
</video>
15684
<figcaption>Electra's cursor and presence in the editor&nbsp;alongside&nbsp;a&nbsp;human&nbsp;user.</figcaption>
15785
</figure>
15886

159-
<!-- ASSET: Short code snippet — createServerAgentSession setup or
160-
awareness config -->
161-
16287
```ts
16388
async function createServerAgentSession(docKey: string, sessionId: string) {
16489
const ydoc = new Doc()
@@ -196,11 +121,6 @@ Because the agent is a server-side peer, you can start an edit, close your lapto
196121
197122
## Streaming edits into a live document
198123

199-
<!-- This is the most technically dense section. The reader needs to
200-
understand the problem (you can't just append characters one at a
201-
time to a CRDT) before the solution makes sense. Walk through
202-
the techniques in logical order: tools → anchors → commits → markdown. -->
203-
204124
The hard problem here is that the AI generates markdown text, but the document is a rich text CRDT. It's a structured data type, not a string. You need to convert streaming markdown tokens into rich text nodes and insert them at positions that stay valid while other users are editing concurrently.
205125

206126
### Document tools
@@ -223,10 +143,6 @@ The tool surface is deliberately constrained. The agent operates through the sam
223143

224144
### Routing streaming text into the document
225145

226-
<!-- This is one of the most interesting implementation details. It
227-
deserves its own subsection because it solves a real limitation
228-
of tool-based AI architectures. -->
229-
230146
Tool calls don't support async streaming. You call a tool with arguments and get a result back. But we want the model to stream prose into the document token by token.
231147

232148
The solution is a routing trick. `start_streaming_edit` is a tool that flips a switch: after it's called, the model's next text output gets intercepted and redirected into the document instead of streaming into the chat as an assistant message. The model just generates text naturally. The infrastructure decides where it goes.
@@ -239,9 +155,6 @@ The solution is a routing trick. `start_streaming_edit` is a tool that flips a s
239155

240156
When the model finishes generating or calls another tool, the routing automatically stops. The model can also explicitly call `stop_streaming_edit` to switch back to chat output mid-turn. The system prompt explains this convention so the model knows to call `start_streaming_edit` before generating document prose, and to switch back when it wants to respond in the chat.
241157

242-
<!-- ASSET: Short code snippet showing a subset of tool definitions
243-
from documentTools.ts -->
244-
245158
```ts
246159
const searchTextDef = toolDefinition({
247160
name: 'search_text',
@@ -277,16 +190,10 @@ const startStreamingEditDef = toolDefinition({
277190

278191
### Relative position anchors
279192

280-
<!-- This is the clever bit. The reader needs to understand why absolute
281-
positions fail under concurrent editing before relative positions
282-
make sense. -->
283-
284193
Absolute positions in a document shift when other users type, making them useless for concurrent editing. [Yjs](https://docs.yjs.dev/api/relative-positions) solves this with relative positions, which are anchored to CRDT items rather than offsets. They stay correct regardless of what other users do to the document.
285194

286195
When the user sends a message, the client encodes its current cursor and selection as relative anchors and sends them to the server as part of the chat context. This gives the agent awareness of what the user is doing and where they're looking in the document. The agent doesn't have to use this context, but when the user says "rewrite this paragraph" or "insert something here", it can target the exact semantic position the user intended.
287196

288-
<!-- ASSET: Short code snippet — relative anchor encode/decode -->
289-
290197
```ts
291198
// Client: encode the cursor position as a Yjs relative position
292199
const rel = absolutePositionToRelativePosition(cursorPos, fragment, mapping)
@@ -308,9 +215,6 @@ The edits stream into the document in real-time. The user sees text appear as th
308215

309216
AI models have been trained on markdown as a native format. They naturally use it to express formatting and emphasis. Rather than fight against this by inventing a custom format or requiring tool calls for every formatting operation, we lean into markdown as an intermediate representation. A streaming pipeline incrementally parses the token stream and converts it into native Yjs document nodes as they arrive — `**bold**` becomes a bold mark, `## heading` becomes a heading node, `- item` becomes a list item. The model writes in the format it's best at, and the document receives properly structured rich text.
310217

311-
<!-- ASSET: Animated gif/video showing streaming text appearing in the
312-
editor with the agent's cursor, while a human edits elsewhere -->
313-
314218
<figure>
315219
<video class="w-full" autoplay loop muted playsinline preload="metadata">
316220
<source src="/videos/blog/building-a-collaborative-ai-editor/markdown.mp4" type="video/mp4" />
@@ -320,19 +224,12 @@ AI models have been trained on markdown as a native format. They naturally use i
320224

321225
## Durable chat
322226

323-
<!-- This section is the most straightforward — it's mostly "swap the
324-
connection adapter and everything works." Keep it tight. The agent
325-
response flow subsection adds the interesting detail about how
326-
chat and document edits interact. -->
327-
328227
The chat sidebar uses [TanStack&nbsp;AI](https://tanstack.com/ai)'s `useChat` hook with no custom component code. The only difference from a standard chat setup is the connection adapter: instead of the default request/response model, a durable connection routes messages through a Durable&nbsp;Stream. Messages POST to `/api/chat`, and the client subscribes to `/api/chat-stream`.
329228

330229
Because the chat session lives on a Durable&nbsp;Stream, it's resilient by default. Refresh the page mid-generation and the chat picks up where it left off. Close the tab entirely, come back later, and the full conversation history is there, including any generations that completed while you were away.
331230

332231
The chat and document streams are linked but independent. You can have the chat open without the editor, or vice versa.
333232

334-
<!-- ASSET: Short code snippet — createDurableChatConnection usage -->
335-
336233
```ts
337234
import { durableStreamConnection } from '@durable-streams/tanstack-ai-transport'
338235
import { useChat } from '@tanstack/ai-react'
@@ -354,9 +251,6 @@ When the agent calls a document tool, the edit is applied to the Yjs document an
354251

355252
Cancellation is clean: stopping a generation tears down both the chat stream and any in-flight document edits. You can see this flow end-to-end in the [demo source code](https://github.com/electric-sql/collaborative-ai-editor).
356253

357-
<!-- ASSET: Screenshot showing the chat sidebar with a conversation,
358-
including tool call indicators showing document edits -->
359-
360254
<figure>
361255
<video class="w-full" autoplay loop muted playsinline preload="metadata">
362256
<source src="/videos/blog/building-a-collaborative-ai-editor/tool-calls.mp4" type="video/mp4" />

0 commit comments

Comments
 (0)