A Primer on Ethereum Blockchain Light Clients
A large number of projects working on Blockchain-based, peer-to-peer protocols make grandiose claims about performance and throughput. With so much innovation happening in the R&D stages of these projects, many often don’t account for the real challenges of adoption encountered once these protocols are live.
It is easy to assume most people running a network are between some confidence interval of latency and computational capacity, but easy to forget the barrier to entry most users have to interact with nodes on the blockchain. Unfortunately, running a full node is prohibitively expensive and slow for most, so a large portion of users rely on “light” nodes that piggyback off the security of full nodes without the extensive resource requirements needed.
Ethereum’s light client mode allows for devices as lightweight as RaspberryPi’s to join the network, download block headers as they appear, and only validate certain pieces of state on-demand as required by their users. In Ethereum in particular, the network is so easy to saturate with these clients that full, archival nodes have RaspberryPi’s latching onto them faster than you can say “Merkle Tree”. Cryptoeconomic incentives are not enough when running full nodes— as resource cost effectiveness creates a bottleneck in balancing a distributed network. It is very hard to predict exactly what balance of full to light nodes a network will have in the wild. There have been some discussions on how to balance these incentives and make it easier for users to justify running a full node.
Introducing Light Clients: Key Actors in Ethereum
The key idea behind a light client is that it is able to fetch parts of the state on demand as it concerns to its user. It assumes an honest model where miners are correctly following the rules of Ethereum and at least a full node in the system is completely honest.
Their basic functionality is to download block header as they appear in the network and issue on-demand requests for Merkle proofs of certain pieces of state being used by the client. Instead of using local storage, light clients on Ethereum use a distributed hash table to keep track of trie nodes. Given that Ethereum’s state is represented via large Merkle trees, it is easy to use the Merkle root along with a path of nodes along a branch in the tree to verify the integrity of a piece of information as a lightweight proof. This ultimately relies on trusting the Merkle root provided is correct.
Light client messages include, but are not limited to, checking the balance of an account, verifying a transaction was confirmed, checking of event logs from a certain contract deployed on the network, and more. All of these can be reduced to sublinear complexity via Merkle proof validation. When data from the blockchain is unavailable, or a proof does not check out when verifying a state transition index, clients are allowed to raise alarms to other participants in the peer-to-peer network.
The Geth client specifies a fundamentally different configuration and protocol manager when in light mode. For those wanting to know what exactly happens when Geth spins up a light client, check out my Github issue on the topic here.
The Underlying Consensus Mechanism
The current light client protocol assumes proof of work consensus operating in the main chain via full nodes. In proof of work, there exists a mathematical function by which we can verify a block header is valid. That is, this algorithm is hard to compute in order to output such header, but easy to verify. Light clients, upon launch, look for the chain of longest block headers, and the cost for an attacker to spoof this chain by producing faulty headers is almost inconceivable.
There is an underlying transfer of physical work in the form of electricity towards contributing to the security of the chain via proof of work, although validating the produced headers is very efficient. Light clients are useful in a proof of work context because headers can be verified in constant time, but we do not obtain the same guarantees in a proof of stake context.
Light Clients in Proof of Stake: Is Proof of Work Necessary?
The simplified problem with light clients in proof of stake amounts to block headers not being tied to a certain amount of “real” work actors need to put in to produce them. That is, the strength derived from this consensus protocol comes down to punishment deterrents for byzantine actors rather than a reward for computing an NP-hard problem via spending electrical energy. Actors trying to grow the wrong chain in proof of stake will get punished, whereas in a proof of work system, actors mining on the wrong chain will fork and not reap the rewards of mining on a canonical chain.
Proof of stake provides in-protocol mechanisms for finalizing block headers deterministically. Once these headers are trusted, accessing the data they contain is logarithmic in complexity, i.e. fetching nodes from a Merkle tree. Despite this, headers do not contain scalar values we can use to verify a proof of work solution, making verification at least logarithmic instead of constant time-efficient in a naive implementation.
We can, however, do better at least with the syncing efficiency of light clients. As proposed by Vitalik in his post on the matter, a more light-client friendly proof of stake system can be constructed via a checkpoints system. A checkpoint is defined as a fixed number of blocks where 2/3 of participants must agree on via cryptographic signatures, an d such checkpoint must contain a hash of the previous checkpoint. In this new light client sync, only checkpoints can be downloaded and the client can then verify the signatures of the participants. This reduces the overhead of having to download every single header as in the current proof of work light client mode.
This approach, however, does not fix the issue of validating the headers, so there could be the potential for a proof of stake protocol to include a small amount of proof of work upon block header creation for the purpose of light clients.
Does a Hybrid Proof of Work / Proof of Stake System Help Light Clients?
There definitely could be a role in using computational power in the production of a block header for the purpose of light client validation in a proof of stake system so long as the power required to validate these headers is very small.
If you enjoyed learning more about this, check out the work the Ethereum community is doing on sharding, including my team, Prysmatic Labs! In a sharded model, light clients are incredibly useful as one of the goals of sharding is to reduce the computational requirements for nodes. One place to begin is on ETHResearch, where most of the latest sharding developments are posted: