System design: Data Models

July 17, 2025

System design: Data Models

Systems Architecture Word Cloud Royalty Free Vector Image

Data models have a profound effect: not only on how the software is written, but also on how we think about the problem that we are solving. As most applications are built by by layering one data model on top of another; applications use APIs to manipulate real world information modeled as objects or data structures, these objects or data structures are stored in JSON/XML format, tables in a relational database or a graph model. The JSON/XML format, tables or graphs are stored as bytes in memory, on disk, or on a network, allow the data to be queried, searched, manipulated, and processed in various ways. On yet lower levels, bytes are represented in terms of electrical currents, pulses of light, magnetic fields, and more.

It can take a lot of effort to master just one data model (think how many books there
are on relational data modeling). Building software is hard enough, even when working with just one data model and without worrying about its inner workings. But
since the data model has such a profound effect on what the software above it can
and can’t do, it’s important to choose one that is appropriate to the application.

Relational Model Versus Document Model

The best-known data model today is probably that of SQL, based on the relational
model proposed by Edgar Codd in 1970: data is organized into relations (called
tables in SQL), where each relation is an unordered collection of tuples (rows in SQL). As computers became vastly more powerful and networked, they started being used for increasingly diverse purposes. And remarkably, relational databases turned out to generalize very well, beyond their original scope of business data processing, to a broad variety of use cases. Much of web today is still powered by relational databases, be it online publishing, discussion, social networking, ecommerce, games, software-as-a-service productivity applications, or much more.

The Birth of NoSQL

Down the line in the 2010s, NoSQL(originally intended simply as a catchy Twitter hashtag for a meetup on open source, distributed, nonrelational databases in 2009) was birthed out of the following concerns:

A need for greater scalability than relational databases can easily achieve, including very large datasets or very high write throughput.
A widespread preference for free and open source software over commercial
database products.
Specialized query operations that are not well supported by the relational model
Frustration with the restrictiveness of relational schemas, and a desire for a more dynamic and expressive data model.

The Object-Relational Mismatch

Most modern applications are built using object-oriented programming languages, which bring to light a key concern of SQL, i.e., if data is stored in relational tables, an awkward translation layer is required between the objects in the application code and the database model of tables, rows, and columns. This disconnect between the models is sometimes called an impedance mismatch.

Compared to JSON format, which has all the relevant information is in one place, and one query is sufficient. To query a multi-table schema, you need to either perform multiple queries or perform a messy multiway join between the users table and its subordinate tables.

Many-to-One and Many-to-Many Relationships

Image source: rezi.ai

Let's take a look at a linked in profile, a user may have education and position fields on their profile. Now given that one user may have attended different schools and as such have multiple schools as well as positions in their profile, this creates a many-to-one relationship. However, each school or company may have their own profile page with a logo, location, industry, employees etc, creating a many-to-many relationship.

Are Document Databases Repeating History?

While many-to-many relationships and joins are routinely used in relational databases, document databases and NoSQL reopened the debate on how best to represent such relationships in a database. Most document databases and NoSQL excel at many-to-one relationships however struggle with many-to-many relationships.

In the 1960s and 70s, developers had to decide whether to duplicate (denormalize) data or to manually resolve references from one record to another. The two most prominent solutions were the relational model (which became SQL, and took over the world) and the network model (which initially had a large following but eventually faded into obscurity).

Document databases reverted back to the hierarchical model in one aspect: storing
nested records within their parent record rather than in a separate table. However, when it comes to representing many-to-one and many-to-many relationships, relational and document databases are not fundamentally different: in both cases, the related item is referenced by a unique identifier, which is called a foreign key in the relational model and a document reference in the document model.

Relational Versus Document Databases Today

Document data model offer schema flexibility, better performance due to locality, and that for some applications it is closer to the data structures used by the application. The relational model counters by providing better support for joins, and many-to-one and many-to-many relationships.

Which data model leads to simpler application code?

This depends on the nature of your application and the most prevalent type of relationships present. If your application has more prevalent many-to-one relationships, then a document model makes the most sense. On the other hand if your application has many-to-many relationships being the most prevalent, then the better choice would be an SQL data model.

Reference

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Book by Martin Kleppmann

This blog post is a summary of my personal notes and understanding from reading "Designing Data-Intensive Applications" by Martin Kleppmann. All credit for the original ideas belongs to the author.

Search This Blog

Gilbs' Tech Hub

System design: Data Models

Relational Model Versus Document Model

The Birth of NoSQL

The Object-Relational Mismatch

Many-to-One and Many-to-Many Relationships

Are Document Databases Repeating History?

Relational Versus Document Databases Today

Which data model leads to simpler application code?

Reference

Comments

Post a Comment

Popular Posts

Webstack monitoring

Postmortem