DP-900-3

DP-900
Module 3: Semi-Structured Data & Microsoft Table Storage

* Why Semi-Structured Data?

- Consider Amazon.com as an example, an online transaction system that processes millions of purchases daily around the globe. Relational databases often struggle at this scale for several reasons:

- A single transaction usually needs updates to multiple tables (customer, order, product).

- Each update requires time and adds overhead to keep transaction integrity intact.

- Fetching data from normalized tables can slow down performance.

- Every row in a relational table must have the same columns, which complicates handling diverse product data like books, clothes, and electronics.

- As a result, relational databases lack the flexibility and scalability needed for large, diverse workloads.

* The Solution: Semi-Structured Data

- Semi-structured databases provide a flexible method for storing and processing significant amounts of diverse data efficiently. Here’s how they are different:

- Each transaction is stored in a single row without needing joins or multiple table updates.

- There are no foreign or primary keys; all data for a transaction is stored together.

- Transaction integrity is simpler because only one record gets affected.

- They can easily support millions of transactions each day.

- To achieve this, semi-structured data discards rigid schemas. Each row is self-describing, like JSON or XML, defining its own fields and structure.

Example:

{

"CustomerName": "Peter Vogel",

"Product": "Book",

"Price": 20.99,

"City": "London"

}

Another transaction might omit “City” or include different fields, and that’s acceptable.

* From Structured to Semi-Structured

- Semi-structured data is essentially denormalized data. Instead of spreading information across several tables, everything is kept together.

* Advantages

- Faster writes and fewer updates.

- No complicated joins or transaction management.

- High scalability for millions of operations daily.

* Limitations

- Data duplication, such as repeating customer addresses in every transaction.

- Limited search capabilities with no standard indexes or WHERE clauses like SQL.

- Less flexibility for complex queries.

- Usually tailored for specific applications, not for organization-wide analytics.

- Searching Challenges

- In relational databases, every row shares the same columns, making indexing and searching straightforward. For instance, you can easily find all customers in London. In semi-structured systems, not all rows have the same attributes, which limits indexing and searching.

- However, since semi-structured databases are designed for specific, high-volume applications, this is often not a significant problem.

* What About SQL?

- SQL (Structured Query Language) was created for structured data, so it doesn’t work directly with semi-structured databases. That’s why they are often referred to as:

* NoSQL databases (Not Only SQL)

- or Non-relational databases.

- These databases use APIs or query models specific to their design rather than SQL commands.

- Microsoft's Solution: Azure Table Storage

* Azure Table Storage is Microsoft's NoSQL key-value store designed for:

- High scalability

- Massive transaction processing

- Low latency operations

- It stores semi-structured data in tables, but unlike SQL tables:

- Each row (called an entity) can have different columns (called properties).

- Every entity must have:

- A PartitionKey to group related data

- A RowKey that acts as a unique identifier within a partition

- This structure allows for fast and cost-effective queries in large-scale systems like e-commerce, IoT, or logging.

* Real-World Use

- Most organizations use both types of databases:

- NoSQL (Table Storage) to handle fast, large-scale transactions.

- Relational Databases (SQL Server, Azure SQL) to store structured, shared data for analytics and reporting.

- For example, Amazon records orders in a semi-structured system for speed and keeps customer profiles in a relational database for easy access.

Feature	Relational (SQL)	Semi-Structured (NoSQL)
Data Model	Structured	Flexible (self-describing)
Schema	Fixed	Variable per record
Speed	Slower (joins, transactions)	Faster (single record writes)
Scalability	Limited	Very high
Query Language	SQL	NoSQL / API-based
Example	Azure SQL Database	Azure Table Storage

* Key Takeaways :

- Semi-structured data supports high-speed, high-volume transactions.

- It’s schema-less, denormalized, and self-describing.

- Azure Table Storage is Microsoft’s NoSQL solution for semi-structured data.

- Most enterprises adopt a hybrid approach, combining both structured and semi-structured systems for better performance and flexibility.

Azure Table Storage, Semi-Structured Data Explained

* What is Table Storage?

- Table Storage is Microsoft’s semi-structured database for managing large-scale data.

- It’s a key-value database. Every record (row) is identified by:

- Partition Key, which groups related rows.

- Row Key, which uniquely identifies a record within the partition.

- It can store 25 times more data than a single Azure SQL Database.

- Every row can have different properties, unlike relational tables with fixed columns.

Feature	Relational DB	Semi-Structured (Table Storage)
Relationships	Supported	Not supported
Views / Stored Procedures	Supported	(some NoSQL DBs support them)
Indexes	Any column	Only Partition + Row Keys
Normalization	Supported	Not applicable
Search	Flexible	Limited but very fast
Speed	Moderate	Millions of transactions/day

Advantage: Fast inserts and reads at scale.

Trade-off: Less flexible queries and relationships.

* How Table Storage Works

- Each row includes:

- Partition Key, which divides data into groups for scalability.

- Row Key, which is unique within each partition.

- Timestamp, which auto-updates when data changes.

- The best performance occurs when both the Partition Key and Row Key are known.

* Choosing a Good Partition Key

- The partition key should split data into balanced, evenly active groups.

- Avoid hot partitions, which cause an uneven load.

- For example, if customers are grouped by country, some, like India or the USA, may have too much activity.

- It’s better to partition by something evenly distributed, like region, customer type, or date range.

* Flexibility in Use

- In relational databases, any column can be indexed or searched.

- In Table Storage, searches depend only on the Partition Key and Row Key.

- So, the design relies on specific application needs:

- For a Customer Management System, country might work as a partition key.

- For a Lead Generation App, it might not work, since new leads’ countries aren’t known.

* Timestamp Property

- This is an auto-generated field that shows the last update time.

- It is used for concurrency control; you can check if another user modified the record since you last retrieved it.

* Tools to Work with Table Storage

1. Azure Storage Browser (Web Tool)

- You can access this via the Azure Portal.

- It lets you add, edit, or delete rows.

- You can manage files, blobs, containers, and tables directly in your browser.

2. Azure Storage Explorer (Desktop App)

- This is a downloadable tool for Windows, Mac, and Linux.

- It’s similar to File Explorer but for Azure resources.

- You can manage tables, blobs, files, and queues offline.

* Summary

- Table Storage is Azure’s key-value, semi-structured data store.

- It supports massive scale, fast operations, and a flexible schema.

- It is ideal for high-volume apps like IoT, logs, and user data.

- Choose partition keys wisely to avoid performance bottlenecks.

Demo - Table Storage

https://diigitaldrafts.blogspot.com/2025/11/dp-900-creating-semi-structure-database.html

Search This Blog

DigitalDrafts

DP-900-3

DP-900
Module 3: Semi-Structured Data & Microsoft Table Storage

Azure Table Storage, Semi-Structured Data Explained

Demo - Table Storage

Comments

Post a Comment

Popular posts from this blog

AI-900-3,4

AI-900 12,13

AI-900 10,11

DP-900-3

DP-900 Module 3: Semi-Structured Data & Microsoft Table Storage

Azure Table Storage, Semi-Structured Data Explained

Demo - Table Storage

Comments

Post a Comment

Popular posts from this blog

AI-900-3,4

AI-900 12,13

AI-900 10,11

DP-900
Module 3: Semi-Structured Data & Microsoft Table Storage