Database Backend: BoltDB

BoltDB is the persistent, key-value store used as a database for Ethereum 2.0 in the Prysm client.

One piece of software we inherited from our days as a Geth fork is the storage engine, LevelDB. As a simple, embedded key/value store written in Go, LevelDB worked well. However, after seeing an implementation of Badger in Geth and database corruption issues, we decided to set aside time to survey other options ourselves.

Our first requirement was that the storage engine was embedded. Validating needs to be simple, and a separate database process would hinder that goal. The second requirement was that the database be written in Go. RocksDB and LMDB look good on paper, but we felt that the overhead of using C bindings would be too high. After considering those requirements, we decided to do a deeper dive into three options: Bolt, Badger, and LevelDB. Bolt uses a B+tree to index key value pairs, whereas LevelDB and Badger use an LSM-tree as the underlying data structure. One difference between LevelDB and Badger is that only the keys are indexed in the LSM-tree and values are written to an append-only log. In practice this results in faster reads and writes, particularly on SSDs and with large values.

After testing and benchmarking all three options, we decided that Bolt would be the best option for our use case. Although LevelDB and Badger performed better in write-heavy benchmarks as expected for an LSM-tree, the difference wasn’t substantial. On the other hand, Bolt performed much better on read-heavy benchmarks. We also noticed that Bolt consumed more space on disk compared to our other two options. However, a critical requirement was that writes to the database weren’t lost, and Bolt provides the strongest guarantees there. LevelDB has longstanding issues with data corruption and lost writes.

How to Use BoltDB in Prysm

We contain all database related logic in a db/ package defined in the Prysm repository. Given it is a key-value store backend, we define what we call “buckets” (akin to tables in relational databases), as the data stores for Bolt. We define general categories to “bucket” data into such as blocks, transactions, state, and attestations. If you want to add a new bucket, say for example, myNewStuffBucket to our database, add it to our schema in db/schema.go as follows:

// The Schema will define how to store and retrieve data from the db.
// Currently we store blocks by prefixing `block` to their hash and
// using that as the key to store blocks.
// `block` + hash -> block
//
// We store the crystallized state using the crystallized state lookup key, and
// also the genesis block using the genesis lookup key.
// The canonical head is stored using the canonical head lookup key.
// The fields below define the suffix of keys in the db.
var (
attestationBucket = []byte("attestation-bucket")
blockBucket = []byte("block-bucket")
mainChainBucket = []byte("main-chain-bucket")
myNewStuffBucket = []byte(“my-new-stuff-bucket”)
...
)

And then add your bucket to our NewDB constructor in db/db.go as follows:

// NewDB initializes a new DB. If the genesis block and states do not exist, this method creates it.
func NewDB(dirPath string) (*BeaconDB, error) {
if err := os.MkdirAll(dirPath, 0700); err != nil {
return nil, err
}
datafile := path.Join(dirPath, "beaconchain.db")
boltDB, err := bolt.Open(datafile, 0600, nil)
if err != nil {
return nil, err
}
db := &BeaconDB{db: boltDB, DatabasePath: dirPath}
if err := db.update(func(tx *bolt.Tx) error {
return createBuckets(tx, myNewStuffBucket, …otherBuckets)
}); err != nil {
return nil, err
}
return db, err
}

Transactions in Bolt and Defining New DB Functions

Once your bucket is defined, you can create functions that save and fetch information from persistent storage easily with Bolt. Take for example, two functions that check if a block exists in the database or a function to save a block to the database. In Prysm, we define them as follows in db/block.go:

// HasBlock accepts a block hash and returns true if the block does not exist.
func (db *BeaconDB) HasBlock(hash [32]byte) bool {
hasBlock := false
// #nosec G104
_ = db.view(func(tx *bolt.Tx) error {
b := tx.Bucket(blockBucket)
hasBlock = b.Get(hash[:]) != nil
return nil
})
return hasBlock
}
// SaveBlock accepts a block and writes it to disk.
func (db *BeaconDB) SaveBlock(block *types.Block) error {
hash, err := block.Hash()
if err != nil {
return fmt.Errorf("failed to hash block: %v", err)
}
enc, err := block.Marshal()
if err != nil {
return fmt.Errorf("failed to encode block: %v", err)
}
return db.update(func(tx *bolt.Tx) error {
b := tx.Bucket(blockBucket)
return b.Put(hash[:], enc)
})
}

Where the general skeleton of a Bolt-enabled function is as follows:

func (db *BeaconDB) InsertSomethingToDB(key [32]byte, value []byte) error {
return db.update(func(tx *bolt.Tx) error {
b := tx.Bucket(myNewStuffBucket)
return b.Put(key, value)
})
}

To take a peek into the differences between bolt’s API including Put, Get, update, view, etc. check out the official BoltDB godocs here.