According to ethereum.org, data availability (DA) is “the guarantee that the block proposer published all transaction data for a block and that the transaction data is available to other network participants.”
But how exactly do you guarantee data is available?
For most Layer 1s (L1s), it’s pretty straightforward. L1 nodes know transaction data is available by downloading and executing it themselves. This is how nodes verify blocks and is at the core of how blockchains work.
Layer 2s (L2s) change the paradigm. L2s (specifically rollups) use fancy cryptographic proofs to guarantee blocks are valid without nodes having to execute every transaction. This unlocks massive benefits and new L1 designs!
But not so fast. Rollups still need data to be available, just for different reasons.
So how do we scale? Seems like we are back to where we started.
Introducing DA layers
DA layers specialize in, as you might expect, assuring nodes that data is available. This can take different forms, including:
DA blockchains
DA committees
DA middleware
Data sharding
We’re only going to discuss the first two, but here are a few resources if you want to learn about DA middleware and data sharding.
Because it’s still very expensive to post data on Ethereum, most rollup teams are posting their data off-chain. This design technically classifies them as validiums.
Ethereum’s data-sharding roadmap solves the problem and enables cheap rollup data, but to be safe, let’s assume we’re a year away from the first major upgrade. In the meantime, rollup teams have two major options: DA committees and DA blockchains.
DA committees are selected entities that hold off-chain copies of the transaction data and promise to make it available in case of emergency. These committees often have 7-10 members and are a slight improvement over fully relying on the rollup operator.
DA blockchains take the idea a few steps further by replacing small, permissioned committees with large, permissionless committees that have strong economic incentives to behave.
A common mistake is thinking that data availability = data storage. However, this is not the case.
An easy way to think about the difference is on a time dimension.
DA layers make sure nodes can access data on a short time horizon. Their main goal is to smoothly progress blockchain state, and they typically do not make assurances about longer time horizons. As ethereum.org puts it, “data availability is relevant when a block is yet to pass consensus.”
In fact, DA layers might even discard the data after a few weeks. In Ethereum’s next major upgrade, this data will be pruned after ~2 weeks.
Data storage layers make sure data is available on a longer time horizon and are closer to the cloud storage solutions most web2 developers are familiar with. Of course, it’s not hard to imagine web3 developers opting for decentralized versions like Arweave.
There are many things that can be built on top of DA layers. Let’s touch on three:
As we mentioned earlier, validiums are common today. Even after Ethereum has implemented its own sharded DA layer, it’s likely that rollup teams will still use off-chain data to reduce costs. Developers have historically always pushed the boundaries of what’s possible.
Sovereign rollups not only use DA layers for data availability but also for consensus. Applications are likely good candidates to become sovereign rollups (rather than smart contract rollups or validiums) if they need full control over state transitions yet don’t want to worry about a validator set.
In his recent talk, Balaji Srinivasan envisions a future where “fiat information” competes with “crypto information.” He describes “reliable data feeds” using crypto oracles like Chainlink, where IRL metadata is posted on chain. That data could be posted onto DA layers.
It’s the early days for DA layers. Polygon Avail, EigenDA, and Celestia are all still in testnet, and Ethereum data sharding is 1-3 years away, depending on the upgrade in question.
However, there’s plenty to look forward to. Let’s highlight what seems to be a common endgame across the board. Most teams envision something like this:
Progressively increasing block sizes and sharding them across the network
Relieving nodes of downloading full blocks using KZG commitments
Maintaining low verification costs with data availability sampling.
Eventually, we get to a place where DA layers enable high throughput applications while trust-minimized light clients verify on mobile devices.
That’s right - performance and decentralization!
Hopefully, this article helped you gain more familiarity with data availability. The goal was to offer a broad overview and address common misperceptions about the topic.
There are many deep dives into how it works, so if you want to jump down the rabbit hole, here are some resources:
As always, this article is based on a snapshot in time, and web3 moves very quickly. The technology and timelines mentioned might change.
To keep up with the latest, I recommend following along with sources like the Polygon website, the Polygon DAO blog, and The Village Times newsletter. And to get involved, come join us at Polygon DAO.