DP-900-3

 DP-900 
Module 3: Semi-Structured Data & Microsoft Table Storage



* Why Semi-Structured Data?

- Consider Amazon.com as an example, an online transaction system that processes millions of purchases daily around the globe. Relational databases often struggle at this scale for several reasons:

- A single transaction usually needs updates to multiple tables (customer, order, product).

- Each update requires time and adds overhead to keep transaction integrity intact.

- Fetching data from normalized tables can slow down performance.

- Every row in a relational table must have the same columns, which complicates handling diverse product data like books, clothes, and electronics.

- As a result, relational databases lack the flexibility and scalability needed for large, diverse workloads.


* The Solution: Semi-Structured Data

- Semi-structured databases provide a flexible method for storing and processing significant amounts of diverse data efficiently. Here’s how they are different:

- Each transaction is stored in a single row without needing joins or multiple table updates.

- There are no foreign or primary keys; all data for a transaction is stored together.

- Transaction integrity is simpler because only one record gets affected.

- They can easily support millions of transactions each day.

- To achieve this, semi-structured data discards rigid schemas. Each row is self-describing, like JSON or XML, defining its own fields and structure.

Example:

{

  "CustomerName": "Peter Vogel",

  "Product": "Book",

  "Price": 20.99,

  "City": "London"

}

Another transaction might omit “City” or include different fields, and that’s acceptable.


* From Structured to Semi-Structured

- Semi-structured data is essentially denormalized data. Instead of spreading information across several tables, everything is kept together.


* Advantages

- Faster writes and fewer updates.

- No complicated joins or transaction management.

- High scalability for millions of operations daily.


* Limitations

- Data duplication, such as repeating customer addresses in every transaction.

- Limited search capabilities with no standard indexes or WHERE clauses like SQL.

- Less flexibility for complex queries.

- Usually tailored for specific applications, not for organization-wide analytics.

- Searching Challenges

- In relational databases, every row shares the same columns, making indexing and searching straightforward. For instance, you can easily find all customers in London. In semi-structured systems, not all rows have the same attributes, which limits indexing and searching.

- However, since semi-structured databases are designed for specific, high-volume applications, this is often not a significant problem.


* What About SQL?

- SQL (Structured Query Language) was created for structured data, so it doesn’t work directly with semi-structured databases. That’s why they are often referred to as:


* NoSQL databases (Not Only SQL)

- or Non-relational databases.

- These databases use APIs or query models specific to their design rather than SQL commands.

- Microsoft's Solution: Azure Table Storage


* Azure Table Storage is Microsoft's NoSQL key-value store designed for:

- High scalability

- Massive transaction processing

- Low latency operations

- It stores semi-structured data in tables, but unlike SQL tables:

- Each row (called an entity) can have different columns (called properties).

- Every entity must have:

- A PartitionKey to group related data

- A RowKey that acts as a unique identifier within a partition

- This structure allows for fast and cost-effective queries in large-scale systems like e-commerce, IoT, or logging.


* Real-World Use

- Most organizations use both types of databases:

- NoSQL (Table Storage) to handle fast, large-scale transactions.

- Relational Databases (SQL Server, Azure SQL) to store structured, shared data for analytics and reporting.

- For example, Amazon records orders in a semi-structured system for speed and keeps customer profiles in a relational database for easy access.

Feature Relational (SQL)     Semi-Structured (NoSQL)
Data Model     Structured     Flexible (self-describing)
Schema     Fixed     Variable per record
Speed     Slower (joins, transactions)     Faster (single record writes)
Scalability     Limited     Very high
Query Language     SQL     NoSQL / API-based
Example     Azure SQL Database     Azure Table Storage


* Key Takeaways :

- Semi-structured data supports high-speed, high-volume transactions.

- It’s schema-less, denormalized, and self-describing.

- Azure Table Storage is Microsoft’s NoSQL solution for semi-structured data.

- Most enterprises adopt a hybrid approach, combining both structured and semi-structured systems for better performance and flexibility.


Azure Table Storage, Semi-Structured Data Explained  

* What is Table Storage?  

- Table Storage is Microsoft’s semi-structured database for managing large-scale data. 

- It’s a key-value database. Every record (row) is identified by: 

- Partition Key, which groups related rows. 

- Row Key, which uniquely identifies a record within the partition. 

- It can store 25 times more data than a single Azure SQL Database. 

- Every row can have different properties, unlike relational tables with fixed columns.  


Feature Relational DB     Semi-Structured (Table Storage)
Relationships     Supported         Not supported
Views / Stored Procedures     Supported         (some NoSQL DBs support them)
Indexes     Any column         Only Partition + Row Keys
Normalization     Supported         Not applicable
Search     Flexible         Limited but very fast
Speed     Moderate         Millions of transactions/day


Advantage: Fast inserts and reads at scale.  

Trade-off: Less flexible queries and relationships.  


* How Table Storage Works  

- Each row includes: 

- Partition Key, which divides data into groups for scalability.  

- Row Key, which is unique within each partition.  

- Timestamp, which auto-updates when data changes.  

- The best performance occurs when both the Partition Key and Row Key are known.  


* Choosing a Good Partition Key  

- The partition key should split data into balanced, evenly active groups.  

- Avoid hot partitions, which cause an uneven load.  

- For example, if customers are grouped by country, some, like India or the USA, may have too much activity.  

- It’s better to partition by something evenly distributed, like region, customer type, or date range.  


* Flexibility in Use  

- In relational databases, any column can be indexed or searched.  

- In Table Storage, searches depend only on the Partition Key and Row Key.  

- So, the design relies on specific application needs:  

- For a Customer Management System, country might work as a partition key.  

- For a Lead Generation App, it might not work, since new leads’ countries aren’t known.  


* Timestamp Property  

- This is an auto-generated field that shows the last update time.  

- It is used for concurrency control; you can check if another user modified the record since you last retrieved it.  


* Tools to Work with Table Storage  

1. Azure Storage Browser (Web Tool)  

- You can access this via the Azure Portal.  

- It lets you add, edit, or delete rows.  

- You can manage files, blobs, containers, and tables directly in your browser.  


2. Azure Storage Explorer (Desktop App)  

- This is a downloadable tool for Windows, Mac, and Linux.  

- It’s similar to File Explorer but for Azure resources.  

- You can manage tables, blobs, files, and queues offline.  


* Summary  

- Table Storage is Azure’s key-value, semi-structured data store.  

- It supports massive scale, fast operations, and a flexible schema.  

- It is ideal for high-volume apps like IoT, logs, and user data.  

- Choose partition keys wisely to avoid performance bottlenecks.  


Demo - Table Storage



Comments

Popular posts from this blog

AI-900-3,4

AI-900 12,13

AI-900 10,11