I’ve often struggled to give good reference material when talking to
people about the blockchain. The landscape is filled with obscure
buzzwords and overzealous hype and making one’s way through it is a
challenge. This post is my attempt to provide a map, or rather a map of
maps . I will start with some academic papers on distributed
consensus before moving into more dangerous waters towards the end.
For anyone interested in distributed systems, I recommend starting with
The quest for a scalable blockchain fabric: Proof-of-work vs. BFT
replication. It’s probably the best overview paper I’ve come across so
far. It describes key concepts —blockchains/distributed ledgers,
bitcoin, ethereum, proof-of-work (PoW), smart contracts, state machine
replication (SMR) and crash fault-tolerant (CFT) vs Byzantine
fault-tolerant (BFT) consensus protocols. It then weaves them into a
wider narrative where PoW based consensus —as implemented in public
blockchains— is compared with traditional CFT/BFT solutions to
consensus —which is where so-called permissioned blockchains are
turning their attention.
Moreover, an analysis of blockchains is provided in terms
of their consensus algorithm and the implications this choice has in
aspects ranging from identity management to consensus finality, scalability,
performance (latency, throughput, power consumption), tolerated power of
adversary, network synchrony assumptions, and existence of correctness
proofs of the underlying protocols.
PoW-based blockchains and BFT consensus algorithms are placed at
opposite ends of the scalability spectrum: PoW offers high scalability
but poor performance, whereas current BFT (and CFT) consensus algorithms
offer good performance but their scalability is poorly understood and
they are seldom deployed with more than a handful of replicas . The
paper then proceeds to discuss current efforts and suggestions for
improving blockchain scalability, before concluding with a short list of
open problems.
The major contribution in my opinion is its function as a curated and
comprehensive list of references which cover all of the above aspects in
detail for anyone looking to dive deeper . It also provides an
informed but sober —though not completely unbiased— overview of the
current state of affairs, from an industrial research perspective .
A good follow-up paper is Rethinking Permissioned Blockchains which
explores the author’s proposal for the redesign of the architecture of
HyperLedger Fabric.
All references in that paper are worth looking into, but I would
strongly recommend State Machine Replication for the Masses with
BFT-SMaRt for anyone interested in the practicalities of
implementing a BFT consensus algorithm. The full technical report
describes the design and implementation of a production BFT consensus
algorithm in Java, available under an open source license. The
architecture is modular, the types of failure tolerated are configurable
—from crashes to malicious Byzantine nodes— and the set of replicas
can be adjusted dynamically while the system is running —something
often missing from available BFT implementations.
The implementation’s API and programming model are discussed, and the
system is evaluated against alternative BFT implementations (PBFT and
UpRight) as well as CFT ones (JPaxos). Finally, key take-aways from
building and maintaining the system are provided.
This is an excellent starting point for anyone interested in
implementing a practical distributed consensus system, be it a CFT or
BFT one. The main contribution in my opinion is the lessons and
take-aways from developing and maintaining the system over the course of
many years and across changing teams. The description of how the
low-level design was tackled and how the system was tested and evaluated
is also very useful.
Another very interesting reference is The Honey Badger of BFT
Protocols. This paper describes an asynchronous, non-deterministic BFT
consensus algorithm with reasonable performance and scalability, robust
against adverse network conditions. Evaluation with 100 nodes on AWS
results in throughput in the order of a few thousand transactions per
second, but latency in the order of hundreds of seconds. The algorithm
is based on an atomic broadcast protocol rather than state machine
replication —see Byzantine Consensus in Asynchronous Message-Passing
Systems: a Survey for a formal description and useful survey of BFT
concepts and algorithms. Moreover, the research webpage of Andrew
Miller —co-author of the Honey Badger paper— contains a wealth of
resources on many other aspects of blockchains and cryptocurrencies.
For anyone with a further interest in distributed systems and
particularly consensus protocols, Can’t we all just agree is a series
of posts from Adrian Colyer’s morning paper blog and a good starting
point. It covers some classical CFT consensus protocols such as
Viewstamped Replication, Paxos, Zab and Raft in the form of paper
summaries. Studying at least one of those designs in more detail is
highly recommended; Raft is probably the easiest one and also widely
implemented. A useful —and often entertaining— companion resource in
this area is Kyle Kingsbury’s blog on distributed systems, often
elaborating on how many of them fail to provide their touted guarantees
in practice.
So far we have focused mostly on academic issues in distributed systems.
For those preferring less formalised descriptions, the Ethereum
whitepaper works as a good introduction to both Bitcoin and Ethereum,
the motivation and technical decisions behind it and some of the
potential applications.
But what about all the business hype behind the blockchain? This is
where things get tricky. If you have any exposure to the blockchain
space and the companies in it, you’ve probably come across one or two
technology whitepapers . Their typical promise is infinite
scalability, seamless interoperability and an end to world hunger —but
you can only choose two at a time, we are realists after all. The
problem with many whitepapers is that they are mostly marketing material
dressed in technical jargon. They use impressive-sounding language and
give out just enough to hint at a competitive advantage, but not enough
to reveal its true extent —or lack thereof. This makes it challenging
to distinguish between serious efforts and good ol’ snake oil (but see
Attack of the 50-foot blockchain for a colourfully illustrated guide).
If you want to explore this area further —despite these warnings— I
recommend taking a look at the blog of George Samman which has
summarised a lot of these efforts, mostly in the financial sector. I’ll
admit I am intrigued by Kadena (based on Juno which was itself based
on a BFT extension of Raft) and HashGraph (a non-deterministic BFT
consensus algorithm), although the veracity of some of the claims made
remains to be verified.
Finally, if your buzzword tolerance is high enough , Blockchains
and Smart Contracts for the Internet of Things may serve as an
introduction into other popular areas of potential application. The
report provides a blockchain taxonomy, examines the potential benefits
and drawbacks for IoT and hints at one of the other major open problems
in the area besides distributed consensus: privacy and confidentiality
of transactions (a third one is interoperability, both with legacy
systems and other blockchains). The reference section is extensive,
linking to a plethora of websites and blogs in the wider blockchain space.