You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/blog/posts/2026-04-08-building-a-collaborative-ai-editor.md
-106Lines changed: 0 additions & 106 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,29 +14,19 @@ post: true
14
14
published: true
15
15
---
16
16
17
-
<!-- TLDR: State the point immediately. No setup, no preamble. Technical
18
-
audience wants to know what this is and why it matters in under
19
-
10 seconds. -->
20
-
21
17
I came to sync engines through collaborative editing. Since AI agents became part of my daily workflow, I've had an earworm: what would it look like to integrate an AI agent into a Yjs rich text editing flow — not as a sidebar that dumps text, but as a real participant with its own cursor, presence, and streaming edits?
22
18
23
19
This post walks through how I built a [Collaborative AI Editor](https://collaborative-ai-editor.examples.electric-sql.com) demo — a [TanStack Start](https://tanstack.com/start) app with a [ProseMirror](https://prosemirror.net)/Yjs editor and an AI chat sidebar. It uses [Durable Streams](https://durablestreams.com) as the single transport layer for both [Yjs](https://yjs.dev) document collaboration and [TanStack AI](https://tanstack.com/ai) chat sessions. Two integrations, one primitive, and the AI becomes a genuine CRDT peer.
24
20
25
21
> [!Warning] Collaborative AI Editor demo
26
22
> Try the [live demo](https://collaborative-ai-editor.examples.electric-sql.com) and browse the [source code](https://github.com/electric-sql/collaborative-ai-editor).
27
23
28
-
<!-- ASSET: Video embed of the full demo experience -->
<!-- SITUATION: Head-nodding statements. The reader already believes these
37
-
things. Establish shared reality — no persuasion, just recognition.
38
-
Sam's personal context grounds this in lived experience, not theory. -->
39
-
40
30
## The natural intersection
41
31
42
32
AI-assisted writing and editing is everywhere. ChatGPT Canvas, Cursor, Notion AI all let an AI modify a document alongside you. Software engineers see this more clearly than most: we already have agents editing our code files daily, working alongside us in the same codebase.
@@ -45,11 +35,6 @@ At the same time, real-time collaboration is table stakes for productivity tools
45
35
46
36
The overlap is clear. We already have the tools for multiple people to edit a document together in real-time. If we treat an AI agent as just another peer in that system, we get collaborative AI editing without reinventing the wheel. The agent joins the same document, gets its own cursor, and edits alongside you using the same CRDT infrastructure that already handles conflict resolution between humans.
47
37
48
-
<!-- COMPLICATION: Introduce tension. The reader should feel "yes, that's
49
-
my problem." This is grounded in what Sam actually encountered and
50
-
observed — not theoretical concerns. The diff/patch vs CRDT structure
51
-
point is the sharpest technical hook. -->
52
-
53
38
## But the integration is painful
54
39
55
40
The technology for building collaborative editors has evolved. Earlier editors like Google Docs used operational transforms (OT), which require a central server to resolve conflicts. CRDTs removed that constraint, and [Yjs](https://yjs.dev) has become the dominant toolkit for building CRDT-based editors today. But building a collaborative AI editor on top of Yjs means integrating several separate real-time systems: CRDT sync for the document, token streaming for the AI, and presence/awareness, each with its own transport, connection lifecycle, persistence, and failure handling.
@@ -60,22 +45,10 @@ On top of that, most approaches rely on client-side tool calls to edit the docum
60
45
61
46
Put it all together and you're looking at one protocol for Yjs, another for AI streaming, custom persistence for chat history, and separate reconnection logic for each layer. Every system fails independently. It's a lot of moving parts for something that should feel simple.
62
47
63
-
<!-- STYLE: The transition from complication to answer should feel natural —
64
-
Sam worked on Durable Streams, then the Yjs integration, then the
65
-
TanStack AI integration. Gluing them together was the obvious next step. -->
66
-
67
48
Having worked on Durable Streams, then built the [Yjs integration](https://durablestreams.com/yjs), then the [TanStack AI integration](https://durablestreams.com/tanstack-ai), the natural question was: what if these two integrations shared the same infrastructure?
68
49
69
-
<!-- ANSWER SECTIONS: Each ## is a component of the answer. Order by
70
-
importance. Show, don't tell — code and examples over assertions. -->
71
-
72
50
## Three streams, one primitive
73
51
74
-
<!-- This section establishes the architectural foundation. The reader
75
-
should understand the "shape" of the system before diving into
76
-
specifics. Keep it concrete — what are the three streams, how do
77
-
they connect. -->
78
-
79
52
[Durable Streams](https://durablestreams.com) is a persistent, addressable HTTP streaming protocol. Writers append data to a stream, subscribers consume it, and any client can catch up from its last offset at any time. The protocol handles reconnection and delivery automatically.
80
53
81
54
The Collaborative AI Editor uses three Durable Streams per document:
@@ -90,52 +63,10 @@ The [`@durable-streams/tanstack-ai-transport`](https://durablestreams.com/tansta
90
63
91
64
Both integrations share the same underlying protocol. No WebSocket servers to manage, no separate persistence layer to build, no custom reconnection logic. For local development, `@durable-streams/server` runs with file-backed storage. In production, you can use [Durable Streams Cloud](https://durablestreams.com) or self-host.
92
65
93
-
<!-- Architecture diagram -->
94
66
<imgsrc="/img/blog/building-a-collaborative-ai-editor/architecture.svg"alt="Architecture diagram showing the browser client and server agent both connecting to three Durable Streams: a Yjs document stream, a Yjs awareness stream, and a TanStack AI chat stream" />
<!-- The key architectural decision. The reader should understand WHY
136
-
server-side matters, not just that it's server-side. The contrast
137
-
with client-side tool calls is the hook. -->
138
-
139
70
The key architectural choice: the AI agent is a server-side Yjs peer, not a client-side bolt-on. On the server, the agent opens its own Yjs document and connects to the same Durable Stream as the human editors. It's just another participant in the room. In the demo, the agent is called "Electra".
140
71
141
72
From the human user's perspective, Electra looks like any other collaborator:
@@ -146,19 +77,13 @@ From the human user's perspective, Electra looks like any other collaborator:
146
77
147
78
The agent doesn't manipulate the Yjs document directly. It works through tool calls: the AI model decides what to do, a runtime on the server translates those tool calls into Yjs operations, and the CRDT sync propagates the changes to all connected clients. Because the agent is a server-side peer, it can edit the document whether or not any browser has it open.
148
79
149
-
<!-- ASSET: Screenshot or short video showing Electra's cursor and
@@ -196,11 +121,6 @@ Because the agent is a server-side peer, you can start an edit, close your lapto
196
121
197
122
## Streaming edits into a live document
198
123
199
-
<!-- This is the most technically dense section. The reader needs to
200
-
understand the problem (you can't just append characters one at a
201
-
time to a CRDT) before the solution makes sense. Walk through
202
-
the techniques in logical order: tools → anchors → commits → markdown. -->
203
-
204
124
The hard problem here is that the AI generates markdown text, but the document is a rich text CRDT. It's a structured data type, not a string. You need to convert streaming markdown tokens into rich text nodes and insert them at positions that stay valid while other users are editing concurrently.
205
125
206
126
### Document tools
@@ -223,10 +143,6 @@ The tool surface is deliberately constrained. The agent operates through the sam
223
143
224
144
### Routing streaming text into the document
225
145
226
-
<!-- This is one of the most interesting implementation details. It
227
-
deserves its own subsection because it solves a real limitation
228
-
of tool-based AI architectures. -->
229
-
230
146
Tool calls don't support async streaming. You call a tool with arguments and get a result back. But we want the model to stream prose into the document token by token.
231
147
232
148
The solution is a routing trick. `start_streaming_edit` is a tool that flips a switch: after it's called, the model's next text output gets intercepted and redirected into the document instead of streaming into the chat as an assistant message. The model just generates text naturally. The infrastructure decides where it goes.
@@ -239,9 +155,6 @@ The solution is a routing trick. `start_streaming_edit` is a tool that flips a s
239
155
240
156
When the model finishes generating or calls another tool, the routing automatically stops. The model can also explicitly call `stop_streaming_edit` to switch back to chat output mid-turn. The system prompt explains this convention so the model knows to call `start_streaming_edit` before generating document prose, and to switch back when it wants to respond in the chat.
241
157
242
-
<!-- ASSET: Short code snippet showing a subset of tool definitions
<!-- This is the clever bit. The reader needs to understand why absolute
281
-
positions fail under concurrent editing before relative positions
282
-
make sense. -->
283
-
284
193
Absolute positions in a document shift when other users type, making them useless for concurrent editing. [Yjs](https://docs.yjs.dev/api/relative-positions) solves this with relative positions, which are anchored to CRDT items rather than offsets. They stay correct regardless of what other users do to the document.
285
194
286
195
When the user sends a message, the client encodes its current cursor and selection as relative anchors and sends them to the server as part of the chat context. This gives the agent awareness of what the user is doing and where they're looking in the document. The agent doesn't have to use this context, but when the user says "rewrite this paragraph" or "insert something here", it can target the exact semantic position the user intended.
287
196
288
-
<!-- ASSET: Short code snippet — relative anchor encode/decode -->
289
-
290
197
```ts
291
198
// Client: encode the cursor position as a Yjs relative position
@@ -308,9 +215,6 @@ The edits stream into the document in real-time. The user sees text appear as th
308
215
309
216
AI models have been trained on markdown as a native format. They naturally use it to express formatting and emphasis. Rather than fight against this by inventing a custom format or requiring tool calls for every formatting operation, we lean into markdown as an intermediate representation. A streaming pipeline incrementally parses the token stream and converts it into native Yjs document nodes as they arrive — `**bold**` becomes a bold mark, `## heading` becomes a heading node, `- item` becomes a list item. The model writes in the format it's best at, and the document receives properly structured rich text.
310
217
311
-
<!-- ASSET: Animated gif/video showing streaming text appearing in the
312
-
editor with the agent's cursor, while a human edits elsewhere -->
@@ -320,19 +224,12 @@ AI models have been trained on markdown as a native format. They naturally use i
320
224
321
225
## Durable chat
322
226
323
-
<!-- This section is the most straightforward — it's mostly "swap the
324
-
connection adapter and everything works." Keep it tight. The agent
325
-
response flow subsection adds the interesting detail about how
326
-
chat and document edits interact. -->
327
-
328
227
The chat sidebar uses [TanStack AI](https://tanstack.com/ai)'s `useChat` hook with no custom component code. The only difference from a standard chat setup is the connection adapter: instead of the default request/response model, a durable connection routes messages through a Durable Stream. Messages POST to `/api/chat`, and the client subscribes to `/api/chat-stream`.
329
228
330
229
Because the chat session lives on a Durable Stream, it's resilient by default. Refresh the page mid-generation and the chat picks up where it left off. Close the tab entirely, come back later, and the full conversation history is there, including any generations that completed while you were away.
331
230
332
231
The chat and document streams are linked but independent. You can have the chat open without the editor, or vice versa.
333
232
334
-
<!-- ASSET: Short code snippet — createDurableChatConnection usage -->
@@ -354,9 +251,6 @@ When the agent calls a document tool, the edit is applied to the Yjs document an
354
251
355
252
Cancellation is clean: stopping a generation tears down both the chat stream and any in-flight document edits. You can see this flow end-to-end in the [demo source code](https://github.com/electric-sql/collaborative-ai-editor).
356
253
357
-
<!-- ASSET: Screenshot showing the chat sidebar with a conversation,
358
-
including tool call indicators showing document edits -->
0 commit comments