Cosmos DB — Under the Hood

6 min readMar 21, 2021

We all know the features offered by Cosmos DB. I’ve listed those for reference as below:

Globally Distributed
Linearly Scalable
Schema-Agnostic Auto Indexing
Multi-Model
Multi-API and Multi-Language Support
High Availability
Guaranteed Low Latency
Multi-Master Support

When I first got to know about cosmos DB, I was intrigued by its offering of Multi-API and multi model capabilities. We have all known and used different databases for different use case like relational DB for maintaining strong ACID properties, key-value no-sql database for low latency applications, graph databases for nested, related and hierarchical data. But here it was offering all-in-one behaviour. I started to think about how would it actually do that. How would it support these capabilities internally, After all each DB has to have its own representation at the data layer.

In this part of article, I would like to share my learnings about Cosmos internal architecture, how it supports these astonishing features of multi model, multi API support, how it supports schema agnostic auto indexing. Let’s start with exploring about features of multi model, multi API support.

Cosmos DB is deployed and managed on cluster of machines. On Deployment, It manifests itself as overlay of network of machines, referred as federation. Each machine then hosts replicas corresponding to various resource partition. Replicas are load balanced in federation. Each replica hosts a DB engine, the heart of Cosmos DB, which manages the document and index both. DB Engine is at the core of each node of Cosmos, It consists of s of components including replicated state machine (RSM) for coordination, the JavaScript language runtime, the query processor, and the storage and indexing subsystems responsible for transactional storage and indexing of documents. To provide durability and high availability, DocumentDB’s database engine persists data on local SSDs and replicates it among the database engine instances within the replica set respectively.

Cosmos DB uses Bw-tree to store data, which is an optimized version of B-tree. It achieves very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. This document structure allows Cosmos to store data in form of ARS (Atomic Record Stores).

So at the bottom of each API, the data is stored in ARS format in physical form and translated to the logical document based on the API chosen. This would give a rise to the thought that we can store data using any API and access the same data using multiple different APIs for different use cases. For Ex: Store data with SQL API, and fetch it using NoSQL/Graph APIs. While this looks logically an option, It is currently not supported currently for cross API access of its underlying ARS documents. But there seems a high possibility to have something like that as represented below:

Since now ARS provides a common way to represent any data, it could be mapped in any form of API. For Example, In case of Mongo DB , atomic record could be of a Document, whereas in Graph DB, it could be of Node and Edges.

This way Cosmos can map to any form of API in its logical form, and this forms the basis of providing multi model, multi API capabilities.

Let’s now look at another feature of Schema agnostic auto indexing. The schema of a document describes the structure and the type system of the document independent of the document instance. There are several ways of specifying document structure. With a goal to eliminate the impedance mismatch between the database and the application programming models, CosmosDB exploits the simplicity of JSON and its lack of a schema specification. It makes no assumptions about the documents and allows documents within a CosmosDB collection to vary in schema, in addition to the instance specific values. In contrast to other document databases, CosmosDB database engine operates directly at the level of JSON grammar, remaining agnostic to the concept of a document schema and blurring the boundary between the structure and instance values of documents. This, in turn, enables it to automatically index documents without requiring schema or secondary indices.

The technique which helps blurring the boundary between the schema of JSON documents and their instance values, is representing documents as trees. Representing JSON documents as trees normalizes both the structure and the instance values across documents into a unifying concept of a dynamically encoded path structure. For representing a JSON document as a tree, each label (including the array indices) in a JSON document becomes a node of the tree. Both the property names and their values in a JSON document are all treated as labels in the tree representation. Above figure shows two example JSON documents and their corresponding tree representations. Above two example shows documents representation with different schema.

With automatic indexing, every path in a document tree is indexed (unless the developer has explicitly configured the indexing policy to exclude certain path patterns). Each update of a document to a DocumentDB collection leads to update of the structure of the index (i.e., causes addition or removal of nodes). To keep the query cost to optimal level, It uses a normalized path representation upon which both automatic indexing and query subsystems are built. CosmosDB query language operates over paths of the document trees and uses the inverted index to represent it. Inverted index is also a tree and in fact, the index is serialized to a valid JSON document.

The index tree is a document which is constructed out of the union of all of the trees representing individual documents within the collection. The index tree grows over time as new documents get added or updated to the DocumentDB collection. Each node of the index tree is an index entry containing the label and position values (the term), and ids of the documents (or fragments of a document) containing the specific node (the postings).

In above section, We saw how CosmosDB achieves schema agnostic indexing by treating Document as Trees and Index as a document which again can be represent as Tree to efficiently work with. In the next parts of this Article, We will explore about how CosmosDB provides different level of consistency and data partitioning strategy. Stay tuned!.

References:

https://www.vldb.org/pvldb/vol8/p1668-shukla.pdf
https://azure.microsoft.com/en-us/blog/a-technical-overview-of-azure-cosmos-db/#:~:text=The%20core%20type%20system%20of,of%20atoms%2C%20records%20or%20sequences.
https://docs.microsoft.com/en-us/azure/cosmos-db/global-dist-under-the-hood
https://vmayakumar.wordpress.com/2020/02/11/azure-cosmos-db-overview/
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/bw-tree-icde2013-final.pdf

Cosmos DB — Under the Hood

Written by Rajneesh Prakash