Matthew Weidner |
Feb 13th, 2024
Home | RSS Feed
Keywords: CRDTs, bibliography
This blog post is Part 4 of a series.
This post concludes my CRDT survey by describing further topics that are outside my focus area. It is not a proper survey of those topics, but instead a lightweight bibliography, with links to a relevant resource or two. To find more papers about a given topic, see crdt.tech/papers.html.
My previous posts made several implicit assumptions, which I consider part of the “traditional” CRDT model:
The further topics below concern collaborative apps that violate at least one of these assumptions. I’ll end with some alternatives to CRDT design and an afterword.
Strong consistency is (roughly) the guarantee that all replicas see all operations in the same order. This contrasts with CRDTs’ eventual consistency guarantee, which allows different replicas to see concurrent operations in different orders. Existing work explores systems that mix CRDTs and strong consistency.
Some apps need strong consistency for certain operations but allow CRDT-style optimistic updates for others. For example, a calendar app might use strong consistency when reserving a room, to prevent double-booking. Several papers describe mixed consistency systems or languages that are designed for such apps.
Besides providing strong consistency for certain operations, a central server can provide strong consistency for parts of the protocol while clients use CRDTs for optimistic local updates. For example, the server can decide which operations are valid or invalid, assign a final total order to operations, or coordinate “garbage collection” actions that are only possible in the absence of concurrent operations (e.g., deleting tombstones).
Part 2 described exact undo, in which you apply your normal semantics to the operation history less undone operations. The same principle applies to other situations where you deliberately omit operations: rejected/reverted changes, partial replication, “cherry-picking” across branches, etc.
However, applying your normal semantics to “the operation history less undone operations” is more difficult that it sounds, because many semantics and algorithms assume causal-order delivery. For example:
It is an open problem to formulate undo-tolerant variants of various CRDTs.
Previous posts assumed that all collaborators always load an entire document and sychronize all operations on that document. In partial replication, replicas only store and synchronize parts of a document. Partial replication can be used as part of access control, or as an optimization for large documents - clients can leave most of the document on disk or on a storage server.
Existing CRDT libraries (Yjs, Automerge, etc.) do not have built-in partial replication, but instead encourage you to split collaborative state into multiple documents if needed, so that you can share or load each document separately. However, you then lose causal consistency guarantees between operations in different documents. It is also challenging to design systems that perform the actual replication, e.g., intelligently deciding which documents to fetch from a storage server.
In a traditional collaborative app, all users want to see the same state, although they may temporarily diverge due to network latency. Version control systems instead support multiple branches that evolve independently, except during explicit merges. CRDTs are a natural fit for version control because their concurrency-aware semantics may lead to better merge results (operations in parallel branches are considered concurrent).
The original local-first essay already discussed versions and branches on top of CRDTs. Recent work has started exploring these ideas in practice.
Traditional CRDTs assume that all devices are trustworthy and follow the given protocol. A malicious or buggy device can easily deviate from the protocol in a way that confuses other users.
In particular, a malicious device could assign the same UID to two versions of an operation, then broadcast each version to a different subset of collaborators. Collaborators will likely process only the first version they receive, then reject the second as redundant. Thus the group will end up in a permanently inconsistent state.
A few papers explore how to make CRDTs Byzantine fault tolerant so that these inconsistent states do not occur, even in the face of arbitrary behavior by malicious collaborators.
In a purely local-first setting, access control - controlling who can read and write a collaborative document - is tricky to implement or even define. For example:
A few systems implement protocols that handle these situations.
In a collaborative text document, when one users inserts or deletes text, they shift later characters’ indices. This would cause trouble for other users’ concurrent editing operations if you interpreted their original indices literally:
Alice typed " the" at index 17, but concurrently, Bob typed " gray" in front of her. From Bob's perspective, Alice's insert should happen at index 22.
The CRDT way to fix this problem is to use immutable list CRDT positions instead of indices. Another class of algorithms, called Operational Transformation (OT), instead “transform” indices that you receive over the network to account for concurrent operations. So in the above figure, Bob would receive “Index 17” from Alice, but add 5 to it before processing the insertion, to account for the 5 characters that he concurrently inserted in front of it.
I personally consider list CRDT positions to be the simpler mental model, because they stay the same over time. They also work in arbitrary networks. (Most deployed OT algorithms require a central server; decentralized OT exists but is notoriously complicated.) Nonetheless, OT algorithms predate CRDTs and are more widely deployed - in particular, by Google Docs.
Traditional CRDT papers are algorithm papers that design a CRDT by hand and prove eventual consistency with pen-and-paper. Some more recent papers instead try to synthesize CRDTs automatically from some specification of their behavior - e.g., a sequential (single-threaded) data structure plus some invariants that it should preserve even in the face of concurrency.
Note that while synthesis may be easier than designing a CRDT from scratch, there is little guarantee that the synthesized semantics are reasonable in the eyes of your users. Indeed, some existing papers choose rather unusual semantics.
I hope you’ve enjoyed reading this blog series. Besides satisfying your curiosity, though, you might wonder: Why learn this in the first place?
Indeed, one approach to CRDTs is to let experts implement them in a library, then use those implementations without thinking too hard about how they work. That is the approach we take for ordinary local data structures, like Java Collections. It works well when you use a CRDT library for the task it was built for - e.g., Yjs makes it easy to add central-server live collaboration to various rich-text editors.
However, in my experience so far - both using and implementing CRDT libraries - it is hard to use a CRDT library in any way that the library creator didn’t anticipate. So if your app ends up needing some of the Further Topics above, or if you need to tune the collaborative semantics, you may be forced to develop a custom system. For that, CRDT theory will come in handy. (Though still consider using established tools when possible - especially for tricky parts like list CRDT positions.)
Even when you use an existing CRDT library, you might have practical questions about it, like:
Understanding a bit of CRDT theory will help you answer these questions.
This blog post is Part 4 of a series.
Home • Matthew Weidner • PhD student at CMU CSD • mweidner037 [at] gmail.com • @MatthewWeidner3 • LinkedIn • GitHub