System design: Data Models and Query Languages
![]() |
Image source: VectorStock |
It can take a lot of effort to master just one data model (think how many books there
are on relational data modeling). Building software is hard enough, even when working with just one data model and without worrying about its inner workings. But
since the data model has such a profound effect on what the software above it can
and can’t do, it’s important to choose one that is appropriate to the application.
Relational Model Versus Document Model
The best-known data model today is probably that of SQL, based on the relational
model proposed by Edgar Codd in 1970: data is organized into relations (called
tables in SQL), where each relation is an unordered collection of tuples (rows in SQL). As computers became vastly more powerful and networked, they started being used for increasingly diverse purposes. And remarkably, relational databases turned out to generalize very well, beyond their original scope of business data processing, to a broad variety of use cases. Much of web today is still powered by relational databases, be it online publishing, discussion, social networking, ecommerce, games, software-as-a-service productivity applications, or much more.
The Birth of NoSQL
Down the line in the 2010s, NoSQL(originally intended simply as a catchy Twitter hashtag for a meetup on open source, distributed, nonrelational databases in 2009) was birthed out of the following concerns:
- A need for greater scalability than relational databases can easily achieve, including very large datasets or very high write throughput.
- A widespread preference for free and open source software over commercial
database products. - Specialized query operations that are not well supported by the relational model
- Frustration with the restrictiveness of relational schemas, and a desire for a more dynamic and expressive data model.
The Object-Relational Mismatch
Most modern applications are built using object-oriented programming languages, which bring to light a key concern of SQL, i.e., if data is stored in relational tables, an awkward translation layer is required between the objects in the application code and the database model of tables, rows, and columns. This disconnect between the models is sometimes called an impedance mismatch.
Compared to JSON format, which has all the relevant information is in one place, and one query is sufficient. To query a multi-table schema, you need to either perform multiple queries or perform a messy multiway join between the users table and its subordinate tables.
Many-to-One and Many-to-Many Relationships
![]() |
Image source: rezi.ai |
Let's take a look at a linked in profile, a user may have education and position fields on their profile. Now given that one user may have attended different schools and as such have multiple schools as well as positions in their profile, this creates a many-to-one relationship. However, each school or company may have their own profile page with a logo, location, industry, employees etc, creating a many-to-many relationship.
Are Document Databases Repeating History?
While many-to-many relationships and joins are routinely used in relational databases, document databases and NoSQL reopened the debate on how best to represent such relationships in a database. Most document databases and NoSQL excel at many-to-one relationships however struggle with many-to-many relationships.
In the 1960s and 70s, developers had to decide whether to duplicate (denormalize) data or to manually resolve references from one record to another. The two most prominent solutions were the relational model (which became SQL, and took over the world) and the network model (which initially had a large following but eventually faded into obscurity).
Document databases reverted back to the hierarchical model in one aspect: storing
nested records within their parent record rather than in a separate table. However, when it comes to representing many-to-one and many-to-many relationships, relational and document databases are not fundamentally different: in both cases, the related item is referenced by a unique identifier, which is called a foreign key in the relational model and a document reference in the document model.
Relational Versus Document Databases Today
Document data model offer schema flexibility, better performance due to locality, and that for some applications it is closer to the data structures used by the application. The relational model counters by providing better support for joins, and many-to-one and many-to-many relationships.
Which data model leads to simpler application code?
This depends on the nature of your application and the most prevalent type of relationships present. If your application has more prevalent many-to-one relationships, then a document model makes the most sense. On the other hand if your application has many-to-many relationships being the most prevalent, then the better choice would be an SQL data model.
Reference
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Book by Martin Kleppmann
Comments
Post a Comment