System Design: Query Languages for Data

Before the introduction of SQL, most databases were queried using imperative code. However the SQL, came with the declarative code approach, the main difference between the two approaches was in how verbose a command had to be. That is, an imperative language tells the computer to perform certain operations in a certain order whereas in a declarative language, you just specify the desire outcome but not how to achieve that goal.

This approach presents many advantages from being typically more concise and easier to work with imperative languages to being more beneficial in automatic optimizations and parallel processing applications. 

Declarative Queries on the Web

In CSS, when you style a child component of an element in a corresponding HTML file,  that style is applied to all child components that will be added in the future no further code is required to apply that styling to the new components.

<ul>
<li class="selected">
<p>Desserts</p>
<ul>
<li>Ice cream</li>
<li>Chocolate</li>
<li>Marshmallow</li>
</ul>
</li>
<li>
<p>Bread</p>
<ul>
<li>Rye</li>
<li>Sourdough</li>
<li>Baguette</li>
</ul>
</li>
</ul>

 

li.selected > p {
background-color: #77dd77;
}

Doing the same thing in JavaScript would require you to write more code for each new child component that may be added in the future. Therefore, inn a web browser, using declarative CSS styling is much better than manipulating styles imperatively in JavaScript. 

MapReduce Querying

MapReduce is a programming model for processing large amounts of data in bulk across many machines. A limited form of MapReduce is supported by some NoSQL datastores, including MongoDB and CouchDB, as a mechanism for performing read-only queries across many documents.

db.observations.mapReduce(
function map() {
var year = this.observationTimestamp.getFullYear();
var month = this.observationTimestamp.getMonth() + 1;
emit(year + "-" + month, this.numFood);
},
function reduce(key, values) {
return Array.sum(values);
},
{
query: { type: "Dessert" },
out: "monthlySharkReport"
}
);

The above code, queries the database first by mapping all desserts served to the date and time in which they were served and subsequently uses the reduce function to sum the total desserts served in a given day and return it. 

Graph-Like Data Models

A graph consists of two kinds of objects: vertices (also known as nodes or entities) and edges (also known as relationships or arcs). This approach was developed as a way to simplify the complex data connections that can arise from many-to-many relationships in SQL databases. The schema below uses the PostgreSQL json datatype to store the properties of each vertex or edge. 

CREATE TABLE vertices (

    vertex_id   integer PRIMARY KEY,
    properties  json
);

CREATE TABLE edges (
    edge_id     integer PRIMARY KEY,
    tail_vertex integer REFERENCES vertices (vertex_id),
    head_vertex integer REFERENCES vertices (vertex_id),
    label       text,
    properties  json
);

CREATE INDEX edges_tails ON edges (tail_vertex);
CREATE INDEX edges_heads ON edges (head_vertex);

This presents some important aspects such as:

  1. Any vertex can have an edge connecting it with any other vertex.
  2. Given any vertex, you can efficiently find both its incoming and its outgoing edges, and thus traverse the graph, by following a path through a chain of vertices, both forward and backward.
  3. By using different labels for different kinds of relationships, you can store several different kinds of information in a single graph, while still maintaining a clean data model.

The Cypher Query Language

Cypher is a declarative query language for property graphs, created for the Neo4j graph database. It is named after a character in The Matrix movie and is not related to ciphers in cryptography. The code below is used to insert data into a graph database

CREATE
(NAmerica:Location {name:'North America', type:'continent'}),
(USA:Location {name:'United States', type:'country' }),
(Idaho:Location {name:'Idaho', type:'state' }),
(John:Person {name:'John' }),
(California) -[:WITHIN]-> (USA) -[:WITHIN]-> (NAmerica),
(John) -[:BORN_IN]-> (California)

After, the database can be queried to find people who emigrated from the US to Europe using the queries below: 

MATCH
(person) -[:BORN_IN]-> () -[:WITHIN*0..]-> (us:Location {name:'United States'}),
(person) -[:LIVES_IN]-> () -[:WITHIN*0..]-> (eu:Location {name:'Europe'})
RETURN person.name

 

Reference

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Book by Martin Kleppmann

 

This blog post is a summary of my personal notes and understanding from reading "Designing Data-Intensive Applications" by Martin Kleppmann. All credit for the original ideas belongs to the author. 

Comments

Popular Posts