Efficient Synchronization of State-based CRDTs

Abstract

Data consistency often needs to be sacrificed in order to ensure high-availability in large scale distributed systems. Conflict-free Replicated Data Types relax consistency by always allowing query and update operations at the local replica without remote synchronization. Consistency is then re-established by a background mechanism that synchronizes the replicas in the system. In state-based CRDTs replicas synchronize by periodically sending their local state to other replicas and by merging the received remote states with the local state. This synchronization can become very costly and unacceptable as the local state grows. Delta-state-based CRDTs solve this problem by producing smaller messages to be propagated. However, it requires each replica to store additional metadata with the messages not seen by its direct neighbors in the system. This metadata may not be available after a network partition, since a replica can be forced to garbage-collect it (due to storage/memory limitations), or when the set of direct neighbors of a replica changes (due to dynamic memberships). In this dissertation we further improve the synchronization of state-based CRDTs, by introducing the concept of Join Decomposition of a state-based CRDT and explaining how it can be used to reduce the synchronization cost of this variant of CRDTs. We validate our proposal experimentally on Google Cloud Platform by comparing the state-based synchronization algorithm against the classic and improved versions of the delta-state-based algorithm. The results of this comparison show that our proposed techniques can greatly reduce state transmission, even under normal operation when the network is stable.

Publication
MsC Thesis