When we talk about the ‘Planet-Scale’ capabilities of Azure Cosmos DB, we don’t just imply its capacity to distribute globally or scale massively. There is another aspect of Cosmos DB that truly makes it ‘universal’ (no puns intended), which is its capability to incorporate multiple data models without giving up on any of the core Cosmos DB functionalities.
As of now, Cosmos DB supports five different types of APIs to store and process data.
|#||API||Data Model||Container Name||Item||Support|
|4||Gremlin||Graph||Graph||Node or Edge||GA|
Before we delve into the details of each API, let’s first look at how Cosmos DB can support all these different APIs, and whether these APIs are interchangeable or not.
Atom-Record-Sequence (ARS) and Interoperability
Under the hood, Cosmos DB uses a database engine named Atom-Record-Sequence, or ARS. ARS is a proprietary solution from Microsoft that is responsible for the actual persistence of the data. All the supported APIs are projections of the ARS Model.
As of now, Cosmos DB doesn’t allow us to change APIs once the database has been created, except for SQL and Graph API and it shouldn’t come as a surprise because Cosmos DB stores the data in essentially the same way for all these APIs. However, if you have been following Microsoft’s cloud developments, you will notice the pace at which it is developing and updating its flagship products like Cosmos DB and Power BI. There is a great chance that the interoperability between the APIs will be supported soon.
SQL API (Previously DocumentDB)
If you have been following this blog series from the start, then you would remember that Cosmos DB was initially known as DocumentDB and it only supported Document Data-Model which could be queried through a modified version of SQL, specifically catered for JSON objects. The current variant of Cosmos DB was released in 2017 and this DocumentDB was now supported through SQL API.
As the name suggests, SQL API supports querying JSON data through SQL. Hence, it doesn’t require a steep learning curve to get things working.
Also, SQL API is a great choice if you are starting a project from scratch because it supports server-side programming, which is a great plus compared to other APIs.
MongoDB is another Document-based distributed database that is supported by Cosmos DB. It is one of the most famous NoSQL databases in the open-source community that stores JSON data as documents and provides rich query features over this JSON data. Its inclusion in Cosmos DB combines the widespread adoption of MongoDB along with the planet-scale capabilities of Cosmos DB.
MongoDB API and DocumentDB are similar in the sense that they both use documents as objects. However, MongoDB uses the ‘find’ method to query data as opposed to SQL in DocumentDB.
Cosmos DB not only supports creating a database with MongoDB API but also supports migrating existing MongoDB applications over to Cosmos DB with minimal changes. You could just be changing the connection string from MongoDB to Cosmos in most trivial cases.
Cassandra is a wide-column-store, distributed, open-source NoSQL database. Wide column store database is like relational databases in the sense that it has tables consisting of columns and rows but differs in the way that the column names and types can change for different rows in the same table. Cassandra API uses CQL (Cassandra Query Language) as the language to store and query data, which is very similar to SQL.
Just as MongoDB API, Cassandra API can be used for both creating a new database or migrating an existing one with minimal changes.
There is a theory called ‘Six degrees of separation’ which dictates that each person in the world knows the other ones by six social connections or less. For any two people X and Y, there is a chain of at most 6 connections, X->A->B->C->D->E->F->Y, that joins them with each other.
Now we are not here to validate the theory, but if you had to do this kind of relation modeling in SQL, it would be a nightmare both for the programmer and the database engine. The reason is that relational databases define relations on tables rather than individual records. Hence, if we want to create connections as above, it would take a lot of joins between the tables which could be storing millions of rows.
This is a problem more suited for Graph databases. In graph databases, relationships are physically stored for each record or entity. In graph terms, we call them an edge (entity) and vertices (relationships). If we use a graph database, we can easily and efficiently traverse through multiple levels of connections.
Cosmos DB provides Gremlin API to create and query graph databases. In fact, we can also use SQL to query the graph databases. Gremlin API has advantages over other graph databases in the sense that it comes under the umbrella of Cosmos DB and inherits all its capabilities, along with seamless integration with all other Azure services.
Azure Table Storage was an offering used to store structured NoSQL data as key-value pairs. Notice the use of ‘was’ in the previous statement. Azure still offers Table Storage, but it is not a standalone offering anymore. Rather it has been included in Cosmos DB (just as DocumentDB) and like the rest of the APIs, Table Storage also benefits from the premium capabilities of Cosmos DB. Table API is used to store and query data from Table Storage in Cosmos DB.
Table API provides cost-effective storage than other NoSQL options in Cosmos DB. Also, the applications that are already using Table Storage can easily be migrated to Cosmos DB. In fact, Microsoft is planning to eventually migrate all of its Table Storage databases to Cosmos DB Table API.
Azure Cosmos DB provides multi-model databases to store different kinds of data for multiple business use cases. Regardless of the model, it stores the data as ARS (Atom-Record-Sequence) under the hood. Each API has its own sets of benefits that it brings to the table, but the Cosmos DB premium features such as geo-replication and data retrieval in milliseconds, are common to all.
In the next article, we will dig a little bit deeper into the Cosmos DB architecture and take a look at its resource model to understand how it branches data into databases, containers, and items.