Linea: State Management
EVM State Manager. The Linea's state management ecosystem
- The state manager is the part of execution client responsible for updating the state of the network globally, and the state of every account individually.
- The state manager audits the "read" access made in the EVM, meaning it monitors, verifies, and logs all operations where EVM needs to read data from the blockchain.
- The main task of the state manager is to receive blocks that have been executed by the sequencer and use the trace data from their execution to update the state of the network.
- Linea uses two data structure types to manage state:
- A Merkle Patricia Trie to record the world state, maintain consensus, and process blocks. This mirrors how consensus and state are managed by Ethereum Mainnet.
- A variant of regular Merkle tree called a sparse Merkle Tree (SMT), which is used to more efficiently track, manage, and update storage slots representing accounts.
- It then passes this updated network state information to the prover in the form of Merkle proofs for submission to Ethereum Mainnet (L1).
Merkle Trees
- The Merkle tree and its variations are commonly used across EVM chains to store and retrieve data about the state of every account on the blockchain.
- A Merkle tree is comprised of 'nodes' that branch off from each other.
- At the base is the 'root' or state root, from which branches stem, and leaves stem from the branches.
- Each node, regardless of type, is represented by a cryptographic hash which encodes data about its properties - for example, the contents of your account.
- Each hash encodes the hashes of its child nodes.
- Taken to its full extent, this cascading system means the root encodes data of the state of every single account on the blockchain.
- Cryptographic hashes are deterministic, which means you can reverse the hash of the function to get the data which it encoded.
- If you have the hash of the root - the only node without a parent - you can theoretically derive from it the data of any node in the entire tree.
- As a layer (L2) network, Linea is in the business of making transacting faster and more efficient.
- Linea implements a sparse Merkle tree to track account state and generate and store proofs, and unlock greater efficiency when compared to standard Merkle trees, which require recomputation for every block, leading to excessive computational demands.
Sparse Merkle Trees
- Linea's state management uses sparse Merkle trees to minimize computation and contribute to blockchain's efficiency.
- A sparse Merkle tree is a variation of a standard Merkle tree where not all leaf nodes are filled with data; instead, data is only stored in nodes where it is needed.
- It is a complete tree of fixed depth, meaning that all branches of the tree have the same length - i.e. the same number of leaves.
- At initialization - the beginning of the chain's history - all leaf nodes are set to a default value, which is typically a hash of a specific value, such as zero.
- Because all leaf nodes have same hash value, the parent nodes and higher level nodes also have the same hash value.
- A node whose hash is the default value for its level is therefore considered to represent an empty subtree.
- In the example above, the children of node A (leaves) contain null values, which means node A does too. Node B, meanwhile reflects that its children also contain values.
- With this construction, we do not need to keep track of individual node hash.
- Instead we can assume hashes that reflect the default value are empty, and the subtree or node that lies further down the chain of nodes can be disregarded.
- We only need to pay attention to the ones that correspond to non-empty subtrees.
Cryptographic Accumulator
- In this context, we can consider Linea's sparse Merkle tree as a type of cryptographic accumulator.
- A cryptographic accumulator is a type of cryptographic encoding a collection of items into very short strings and allowing read/write operations to be proven.
- Merkle trees and sparse merkle trees are elementary examples of accumulators but there are others with more powerful capabilities.
- Linea's state manager uses an extended version of a sparse Merkle tree that enables it to prove all CRUD operations for a key-address database.
- As an outline, the construction uses a sparse merkle tree to store the nodes of a sorted double linked list that encodes all the non-zero items of the state.
- Linea's state manager uses the accumulator to track the account trie of Linea but also the storage of every contract separately.
- The leaves of the tree have the following structure
prev || next || hKey || hVal
.
hKey
andhVal
are the hashes of the key and the value of the stored state entry, respectively.prev
andnext
are pointers storing the position of the leaves whosehkeys
are immediately lower and higher, respectively, following the lexicographic order.- The first two leaves of the SMT are called the head and tail, and are special in that they do not encode a stored tuple.
- The head is the lowest possible
hkey
, while the tail is the highest possiblehKey
. - They are therefore situated at the beginning and the end of the linked list, respectively.
- Starting from the head, we can access the SMT leaf stored at
head.next
to get the lowest actually stored item. - Further incrementing the
next
value will give us the second-lowest stored item and so on. - Repeating the process walks us through the entire set of stored items before we end up at the tail node, marking the final step.
- Leaves can also be referred to as storage slots, in that they contain data about the contents of the account in the question.
Tracking Empty Leaves
- All leaves in the tree are populated with default/zero values at initialization.
- Since a deterministic hashing function will ensure that these leaves are always represented by the same hash, empty leaves can be easily recognized by the accumulator.
- However, in order for the state manager to update a storage slot with data about a Linea account's contents, it must know which empty leaf to overwrite, and exactly where these empty leaves are.
- A further consideration is that we require the index of a 'new' leaf - an empty leaf being updated so that it stores data - to be overwritten in a deterministic way.
- This requirement means that anyone can theoretically reconstruct the tree simply by looking at transaction history.
- To ensure consistency in the leaves' position, the state manager only ever inserts 'new' leaves to the left of the previous leaf in the tree.
- If this wasn't the case and the state manager was able to insert in any position, it would be impossible to reconstitute the tree in the exact same configuration, severely impacting the ability of L1 to verify the Merkle proof provided.
Applying the Accumulator
- The EVM uses a variant of Merkle Tree known as MPT to track
- World state, which keeps track of accounts,
- Account storage state, which keeps track of contents of each account.
- On Linea, we adapt this structure.
- The MPT is still used for world state, but the custom cryptographic accumulator described above is used for account storage state.
- The accumulator can perform the following operations:
- Insertion : Adding a new storage slot to the tree, triggered by storing a non-zero value in previously zero-valued slot.
- Update : Changing the value of an existing slot in the tree, triggered by storing a new non-zero value in a previously non-zero slot.
- Deletion : Removing a storage slot from the tree, triggered by storing the zero value in previously non-zero slot.
- Read Zero : Proving non-membership, triggered when a storage slot has been accessed, but not updated, and its value is zero.
- Read non-zero : Proving membership triggered when a storage slot has been accessed, but not updated and its value is non-zero.
- These operations are applied two trees; world state and account storage state.
World State
- The world state tree maps all accounts that exist on the blockchain - contracts and externally owned accounts (EOAs) - and points towards the account account storage for each.
- While on Ethereum mainnet, this data is stored in a standard trie, Linea uses the accumulator to map accounts as key:value pairs.
- Otherwise, the implementation is similar to the EVM.
- The structure is as follows:
HKey
:Val
:
- Critically every piece of data fed into the
Val
hash function must have a finite field interpretation. - The data must be formatted this way to enable the Linea prover to correctly access the world state when verifying proofs.
- Each element is formatted as follows.
- : The nonce is written in big-endian into a
byte32
. For instance if the nonce is10
then the nonce should be encoded as0x000000000000000000000000000000000000000000000000000000000000000a
. - : Formatted the same as the nonce, big-endian
byte32
. - : The storage root should not be the Keccak of the Patricia trie root as in the EVM, but the "custom merkle tree" root of the account storage state.
- : The code hash should not be the Keccak of the code, it should be the one described in the following section.
- 2 field elements for : One for the 128 most significant bits and one for the 128 least significant bits. The Keccak code hash corresponds exactly to the Keccak hash as specified by the EVM. (i.e the output of EXTCODEHASH)
- : The code size should be the same value as that returned by the CODESIZE/EXTCODESIZE opcodes.
- : The nonce is written in big-endian into a
Account Storage State
- Also referred to as the storage trie, the account storage state is the database the state manager accesses to retrieve data about the contents of accounts.
- Account storage is mainly relevant for contract accounts; for EOAs, the data about assets and transactions is stored in the world state
Val
, and thecodeHash
and its variants are empty. - Since the main function of account storage is to record contracts in such a way that they can be easily retrieved and processed, it must efficiently encode the contract.
- It does this using the following format:
HKey
:Val
:
- In both cases, the
MSB
refers to the first 16 bytes of a 'word', and LSB the last 16. - 'Word' in the context refers to the natural unit of data used by the EVM, which is a 256-bit (32 byte) chunks.
- For example, if the data regarding the contract's code was encoded in a
byte32
, the standard data type for words on the EVM and equivalents like Linea, it might look like this.
[a0, a1, a2, …., a15, b0, b1, …, b15]
- That
byte32
would be split into anMSB
andLSB
like thisMSB
:[0, 0, .... a15]
LSB
:[0, 0, .... b15]
Generating state-root transition witnesses
- The accumulator, built using a sparse Merkle tree, is simultaneously:
- A data structure on which we can perform operations;
- A dataset that we can summarize using a short string at any time
- A tool that can be used by the Linea protocol to verify that a given operation triggered transition from hash a to hash b.
- Once the accumulator has processed the trace information it receives about a new block and updated state accordingly, it can pass a new state root hash to the prover, via the coordinator.
- The state root hash can then be used by the prover as a "witness": a verifiable method of proving that transactions in each block have taken place, without having to divulge the nature of those transactions.