Planet-Scale Leaderless Consensus


Modern web applications replicate their data across the globe and require strong consistency guarantees for their most critical data. These guarantees are usually provided via state-machine replication (SMR). Recent advances in SMR have focused on leaderless protocols, which improve the performance and availability of traditional Paxos-based solutions. Although leaderless protocols have shown great promise, they are poorly suited to planet-scale systems as they leverage large quorums, offer unpredictable performance and have complex recovery mechanisms. In this thesis we propose two leaderless protocols, Atlas and Tempo, tailored to planet-scale systems. Atlas minimizes the size of its quorums by making use of the observation that concurrent data center failures are rare. It also processes a high percentage of accesses in a single round trip, even when these conflict. Atlas achieves this while having a recovery mechanism that is significantly simpler than that of previous leaderless protocols. Tempo builds upon Atlas, but achieves superior throughput and offers predictable performance even in contended workloads. To achieve these benefits, Tempo timestamps each application command and executes it only after the timestamp becomes stable, i.e., all commands with a lower timestamp are known. Both the timestamping and stability detection mechanisms are fully decentralized, thus obviating the need for a leader replica. We evaluate Atlas and Tempo in both real and simulated geo-distributed environments and demonstrate that they outperform state-of-the-art alternatives.

PhD Thesis