DP-900-4
DP - 900
Module - 4 Cosmos DB
* Introduction
- Azure Cosmos DB, Microsoft’s powerful multi-model, globally distributed database. It supports both semi-structured and structured data. This offers flexibility, scalability, and high performance for modern apps.
* What is Cosmos DB?
- Azure Cosmos DB is a NoSQL (semi-structured) database that can also handle structured data. It is similar to Table Storage but much more capable.
* Key features:
- Supports multiple APIs, working with various data models.
- Supports global data distribution; it replicates data to multiple Azure regions.
* Key Feature
1: Multiple APIs (Multi-Model Support)
- Cosmos DB supports five main APIs, allowing it to operate like different types of databases based on the use case:
| API Type | Description | Acts Like |
|---|---|---|
Core (SQL) / JSON API |
Stores and queries JSON documents (semi-structured). | Document Database |
| MongoDB API | Makes Cosmos DB behave like MongoDB (for JSON-like NoSQL docs). |
MongoDB |
| Table API | Makes Cosmos DB function like Azure Table Storage (key-value pairs). |
Key-Value Store |
| Gremlin API | Used for graph databases, to represent relationships (nodes & edges). |
Graph DB |
| Cassandra API | Stores data in column-family format, focuses on columns instead of rows. |
Column-Family Store |
- Note: Each API offers a different way to interact with Cosmos DB, but they all share the same fast, scalable infrastructure.
2: Global Distribution
- Cosmos DB can replicate data across multiple Azure regions. This means users can access data locally, no matter where they are. It ensures low latency, high availability, and fault tolerance.
- Example:
- Your factory in Mumbai creates product data. Your customers in Apollo can access that same data locally without delay.
* Azure Region: A region is a group of data centers located close together, with zero latency difference between them.
* Understanding APIs (Application Programming Interfaces)
- An API is like a bridge that allows two programs to communicate.
- Example:
- One app handles orders. Another app stores data in Cosmos DB. They communicate via Cosmos DB’s API, the “public interface” of the system. Each API defines how data is sent, received, and managed, without needing to know internal details.
* Why Use Cosmos DB?
- Supports multiple data models (document, key-value, graph, column).
- Globally distributed; minimizes latency worldwide.
- Schema-flexible; handles changing data structures easily.
- Ideal for IoT, e-commerce, global apps, and real-time analytics.
| Feature | Description |
|---|---|
| Data Type | Semi-structured & structured |
| APIs Supported | SQL (JSON), MongoDB, Table, Gremlin, Cassandra |
| Distribution | Global, multi-region replication |
| Performance | Low latency, high throughput |
| Best For | Global apps, scalable NoSQL workloads |
* In Short
Azure Cosmos DB equals:
- Multi-model database (works like many DB types)
- Global replication for low-latency access
- Perfect for modern, cloud-scale applications
Configuring Cosmos DB
* Introduction :
- Azure Cosmos DB is a globally distributed, multi-model NoSQL database. It offers high performance, low latency, and unlimited scalability. It supports several APIs including SQL, MongoDB, Cassandra, Gremlin, and Table API, but only one API can be used per database.
* Setting Up Cosmos DB :
- Go to the Azure Portal and create a Cosmos DB account.
- Inside that account, create one or more databases.
- Choose your API, for example, MongoDB if you are migrating from Mongo or using JSON documents.
- Enable multi-region writes for global replication and select regions for data copies.
* Capacity Management :
- Cosmos DB uses containers, similar to blob storage.
- Containers are divided into partitions, with each partition holding up to 20 GB and no limit on the total.
- Each partition has a partition key; choose one that is frequently used in queries.
- Ensure even data distribution to avoid hotspots caused by an uneven load.
* Performance (Speed) :
- For writes, Cosmos DB guarantees less than 10 ms latency for 90% of writes.
- Why not 100%? Network latency and replication delays can cause variation.
- Multi-region writes improve reliability but can add delay.
* Consistency Levels (Data Sync Options) :
- You can choose how synchronized your replicas should be. There are five options, but the main ones are:
1) Strong Consistency: Data updates in all regions before confirming a write. This option is slower but more reliable.
2) Eventual Consistency: Local write happens first, and replicas update later. This option is faster but less reliable.
- Example: Shipping data may not require instant updates, so eventual consistency is suitable.
* Throughput (Speed + Capacity) :
- Throughput refers to the amount of data you can read or write per second.
- It is measured using Request Units (RUs).
- Purchase RUs based on your expected activity.
- If you exceed your RUs, Cosmos DB allows bursts for an extra charge or throttles requests, causing delays.
* Request Units (RUs) :
- Estimate your read and write needs, and buy enough RUs.
- For example, allocate 900 RUs for your database.
- You can assign RUs either:
1) At the Database Level: Shared among all containers.
2) At the Container Level: Assign more to busy containers; for instance, give 800 RUs to a high-traffic container and 100 to each of the others.
| Feature | Description |
|---|---|
| APIs | SQL, MongoDB, Cassandra, Gremlin, Table |
| Replication | Multi-region, optional |
| Partitions | Up to 20 GB each, unlimited total |
| Consistency | Strong ↔ Eventual (choose trade-off) |
| Throughput | Managed via Request Units (RUs) |
| Scalability | Automatic with containers & partitions |
| Speed | <10 ms write latency for 90% of operations |
In Short:
Cosmos DB is a scalable, globally distributed, and customizable NoSQL database. It balances speed, consistency, and cost through its APIs, partitions, and RUs.
Application Programming Interface (APIs)
* Azure Cosmos DB supports multiple APIs that allow you to work with different types of semi-structured data models. The right API for you depends on your data format, current system, and query style.
* Difference between Azure Storage Account and Azure Cosmos DB
| Feature | Azure Storage Account (Table Storage) |
Azure Cosmos DB (Table API & others) |
|---|---|---|
Data Type |
Semi-structured key-value data |
Semi-structured (supports JSON, documents, graph, wide-column) |
| APIs Supported | Only Table Storage API | Multiple APIs (NoSQL, MongoDB, Cassandra, Gremlin, Table) |
| Scalability Limit | Tens of thousands of reads/writes per second |
Tens of millions of reads/writes per second (via Request Units - RUs) |
Multi-region Writes |
Single-region write, geo-redundant read |
Multi-region writes supported for global apps |
Performance Model |
Fixed limits based on storage type |
Scalable throughput using Request Units (RUs) |
Use Case |
Simple storage, low-cost NoSQL table |
High-performance, global NoSQL solution with APIs |
Consistency Models |
Basic consistency | 5 consistency levels (Strong, Bounded Staleness, Session, Consistent Prefix, Eventual) |
* Why Choose Cosmos DB APIs?
- Cosmos DB has two major advantages over standard Azure Storage:
1) Multi-region writes, which create global, geo-redundant backups automatically.
2) Unlimited throughput, allowing you to scale reads and writes by purchasing more Request Units (RUs), up to millions per second.
* Cosmos DB APIs :
1) NoSQL (Core) API
- Model Type: JSON-based semi-structured database.
- Why Choose It:
- It's ideal for apps that use JSON documents.
- It supports a SQL-like query language for easy querying.
- Best For: Developers with SQL experience who want NoSQL flexibility.
2) MongoDB API
- Model Type: NoSQL, JSON-like documents.
- Why Choose It:
- It is wire compatible with on-premises MongoDB, so no code changes are needed.
- It's perfect for moving existing MongoDB databases to the cloud.
- Microsoft offers a MongoDB migration tool for easy transfer.
- Best For: Teams already using MongoDB locally or in production apps.
3) Cassandra API
- Model Type: Wide-column NoSQL database.
- Why Choose It:
- It focuses on column-based queries rather than full rows.
- You can use Cassandra Query Language (CQL) to access data.
- It's great for analytics where column groups matter more than records.
- It is wire compatible with existing Cassandra versions.
- You can migrate using Cassandra’s replication tools, Apache Spark, or Azure Databricks.
- Example Use Case: Inventory or sales systems where you frequently check product counts, shipments, and forecasts.
4) Gremlin (Graph) API
- Model Type: Graph (Network) Database.
- Why Choose It:
- It stores entities (vertices/nodes) and relationships (edges) between them.
- It is ideal for modeling social networks, organizational charts, or network topologies.
- Each node and relationship can hold its own data.
- Example Use Case:
- Social apps that track friendships, memberships, or co-workers.
- Network mapping for devices and their connections.
| API | Data Model | Query Language | Best For |
|---|---|---|---|
| NoSQL (Core) | JSON | SQL-like syntax | JSON apps, easy querying |
| MongoDB | JSON-like | Mongo Queries | MongoDB migrations |
| Cassandra | Wide-column | CQL | High-performance analytics |
| Gremlin | Graph | Gremlin | Social & network data |
* Key Takeaways
- Cosmos DB supports multiple APIs, but only one per database.
- Choose your API based on data type, existing technology, and query needs.
- Cosmos DB’s global replication and elastic throughput make it ideal for modern, scalable cloud applications.
* In Short:
- Cosmos DB allows you to choose the right data model, whether document, column, or graph-based, with global scalability and fast performance.
Comments
Post a Comment