System Design: Storage and Retrieval

August 10, 2025

System Design: Storage and Retrieval

Image source: By SqlPac - Own work, CC BY-SA 3.0

Databases are designed to store data and then "give back" that data when requested. To understand how this works, there are two main keywords we need to be mindful of i.e. logs and indexes. Logs refer to text that is output by an application describing what is happening within the application. To understand an index, we need a bit of a back story, so, we have a basic bash script that writes data to a database and retrieves it when requested for:

#!/bin/bash

db_set () {
    echo "$1,$2" >> database
}

db_get () {
    grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}

Our write function (db_set) works well for our application, however, the challenge is with our read function(db_get). Now assuming we have over a million lines of data in our database, every time we want to retrieve something, we have to search through out the entire database which introduces some complexities. Therefore, indexes were introduced as an additional structure that is derived from the primary data, simplifying the retrieval process. Now that we have that our of the way lets look at some indexing structures:

Hash Indexes:

For key-value stores(data structure like a dictionary), the most simple way of indexing is to use a hash map(hash table). This works by keeping an in-memory hash map where every key is mapped to a byte offset in the data file (the location at which the value can be found). When a new key-value pair is added, the hash map is updated to reflect the the offset of the new data added, at retrieval, the hash map is referenced to get the offset in the data file, move to the location and then read the value. The limitations of a hashmap includes:

Hash maps are stored in memory as such, a very large number of keys complicates things.
You'll have to look up each key in a hash map since range queries are not efficient.

SS Table and LSM-Trees:

Sorted String(SS) Tables works similarly to a hash table the main difference is that, SS Tables require that the sequence of key-value pairs is sorted by key and each key only appears once within each merged segment file. Merging segments works by reading the input files side by side, look at the first key in each file, copy the lowest key (according to the sort order) to the output file, and repeat. This produces a new merged segment file, also sorted by key. Therefore, you can find a particular key in a file, without keeping an index of all the keys in memory, allowing efficient range queries.

Log-Structured Merge-Tree (LSM-Tree) work similarly to an SS table however, when write requests are made, add it to an in-memory balanced tree data structure (memtable). When the memtable grows to few megabytes, write it out to disk as an SS Table file. For a read request, first try to find the key in the memtable, then in the most recent on-disk segment, then in the next-older segment, etc. Regularly, run a merging and compaction process in the background to combine segment files and to discard overwritten or deleted values.

B-Trees

The Log-Structured indexes breaks the database in variable-sized segments whereas B-trees break the database down into fixed-size blocks or pages, usually 4 KB in size, and read or write one page at a time. Each page can be identified using an address or location, which allows one page to refer to another, similar to a pointer, but on disk instead of in memory. These page references can then be used to construct a tree of pages.

Compared to LSM-Trees, B-Trees are typically faster for reads but slower for writes.

Transaction Processing or Analytics?

Recently, databases are being used more for data analytics and not just returning raw data to users, implying analytic queries need to scan over a huge number of records, only reading a few columns per record, and calculates aggregate statistics.

Data Warehousing

Data warehouse contains a read-only copy of the data in all the various OLTP(Online Transaction Processing) systems in a company. Data is extracted from OLTP databases (using either a periodic data dump or a continuous stream of updates), transformed into an analysis-friendly schema, cleaned up, and then loaded into the data warehouse, a process known as Extract–Transform–Load (ETL).

Column Oriented Storage

Given that fact tables can have trillions of rows and petabytes of data, storing and running queries can become a challenge. “Although fact tables are often over 100 columns wide, a typical data warehouse query only accesses 4 or 5 of them at one time. The idea behind column-oriented storage is simple: don’t store all the values from one row together, but store all the values from each column together instead. If each column is stored in a separate file, a query only needs to read and parse those columns that are used in that query, which can save a lot of work.

Column Compression

Often, the number of distinct values in a column is small compared to the number of rows. As such, column oriented storage can be compressed quite well. One method used for compression is the bitmap encoding, which works by taking a column with n distinct values and turning it into n separate bitmaps: one bitmap for each distinct value, with one bit for each row. The bit is 1 if the row has that value, and 0 if not.

Sort order in Column Storage

Since column storage are basically stored in the order in which they were inserted, it does not necessarily matter what order in which the data is stored. However, data analytics needs may require an order for the stored data, the most typical method is to sort the data by date so that queries can be filtered via ranges.

Writing to Column Oriented Storage

These optimizations make sense in data warehouses, because most of the load consists of large read-only queries run by analysts. Column-oriented storage, compression, and sorting all help to make those read queries faster. However, they have the downside of making writes more difficult. To address this challenge, all writes first go to an in-memory store, where they are added to a sorted structure and prepared for writing to disk. It doesn’t matter whether the in-memory store is row-oriented or column-oriented. When enough writes have accumulated, they are merged with the column files on disk and written to new files in bulk.

Reference

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Book by Martin Kleppmann

This blog post is a summary of my personal notes and understanding from reading "Designing Data-Intensive Applications" by Martin Kleppmann. All credit for the original ideas belongs to the author.

Search This Blog

Gilbs' Tech Hub

System Design: Storage and Retrieval

Hash Indexes:

SS Table and LSM-Trees:

B-Trees

Transaction Processing or Analytics?

Data Warehousing

Column Oriented Storage

Column Compression

Sort order in Column Storage

Writing to Column Oriented Storage

Reference

Comments

Post a Comment

Popular Posts

Webstack monitoring

Postmortem