DP-900-2

 DP - 900 
Module 2 - Structured Data and Relational Database Solutions 


* When you move your databases to Azure, you have different solutions based on the level of control or management you want.




1. Running Databases on Azure Virtual Machines (IaaS)

You can transfer your existing Database Management System (DBMS) to Azure by installing it on an Azure Virtual Machine.
Examples: Microsoft SQL Server, MySQL, etc.
You manage:

- OS upgrades and patches
- DBMS upgrades and patches
- Data access and data import

This option gives you full control but requires high maintenance.

2. Serverless Database Solutions (PaaS)

If you want to reduce administrative tasks, consider a serverless approach.
Azure automatically handles servers, upgrades, and patches.
You will still manage data access and data import.
There’s no need to manage VMs or DBMS manually.
Azure SQL Database Family provides several options:


1). Azure SQL Database

This is a fully managed database service.
It works with most features of on-prem SQL Server, but not all.
Missing features include:

- Service Broker (for scheduling tasks)
- Database Mail (for sending emails from DBMS)

This option is ideal for single databases, but it has limited feature compatibility.

2). Azure SQL Managed Instance

This offers nearly 100% compatibility with on-prem SQL Server.
It provides more control over update and patch timings.
This is great for migration with minimal changes.

3). Elastic Pools

This allows multiple databases to share resources like CPU and memory.
Example: 
OLTP (sales) uses resources during the day.
Inventory (batch jobs) uses them at night.
This saves costs by optimizing resource use.

4). Azure SQL Edge



This is designed for IoT environments.
It processes continuous data streams in real time.
It uses rolling time windows (e.g., 5-minute sliding window) instead of separate transactions.

5). Open Source Database Options


Azure also supports popular open-source databases:
- MySQL / MariaDB
- PostgreSQL
Use Azure Database for MySQL/PostgreSQL/MariaDB for managed open-source solutions. You can find third-party database engines on Azure Marketplace or install your own on a VM.

* Summary  

Option Type     Management Best For
Azure VM + SQL Server     IaaS     You manage all         Full control
Azure SQL Database     PaaS     Azure manages         Single DBs
Managed Instance     PaaS     Shared control         Migration needs
Elastic Pools     PaaS    Azure manages         Cost-saving multi-DB setup
SQL Edge     PaaS     Azure manages         IoT & real-time data
Open-source DBs PaaS/IaaS     Azure or You         Flexible tech stack

Creating Azure SQL


Normalizing Data


* Structured data and relational databases are closely related; you can’t understand one without the other.

* Structured Data

- Structured Data is organized into tables, each with rows and columns.
- Each row represents a unique entity or record, such as one customer.
- Each column represents an attribute, like Customer ID, First Name, or Last Name.
- Each column has a specific data type, including text, number, or date.
Example:

Customer ID  First Name  Last Name  
1  Peter  Vogel  
2  Jan  Vogel  

- It is easy to query, organize, and maintain. However, it can lead to duplication if not structured properly.

* Normalization



- Normalization is the process of organizing structured data into tables to reduce redundancy and improve consistency.
- Goal: Each table contains only relevant data for a single entity.  
Example: The Customer table should have only customer-related information.

- Problem with Denormalized Tables
- A denormalized table mixes different types of data, such as customer and sales information. This can lead to issues like:
- Duplicate data where the same customer information repeats in multiple rows.  
- Inconsistency where an address changes in one row but not in others.  
Wasted storage.

Example of Denormalized Data:
Sales Order ID  Customer Name  Address  Product  Quantity  
001  Peter Vogel  Toronto  Laptop  2  
002  Peter Vogel  Toronto  Mouse  1  

- If Peter changes his address, it must be updated in multiple rows, which likely causes errors.

- Normalized Solution

- Split the data into separate tables:
- Customer Table stores customer info.  
- Sales Order Table stores order info.  
- Product Table stores product info.  

- Each record links through unique identifiers (keys), not by repeating data.

** Primary Key

- A Primary Key (PK) uniquely identifies each row in a table.
- Example:
CustomerID in the Customer Table.  
SalesOrderID in the Sales Order Table.  
- No two rows can share the same primary key value.

* Database Schema

- After normalization, we define a database schema, which is the structure of the database:
- It lists all tables in the database.  
- It defines each table’s schema, including:
- Columns  
- Data types  
- Primary key  
- Relationships with other tables  

- Example:
Customer Table → CustomerID (PK), FirstName, LastName  
Product Table → ProductID (PK), ProductName, Price  
SalesOrder Table → SalesOrderID (PK), CustomerID (FK), ProductID (FK), Quantity  

* Relationships Between Tables

- To combine data from multiple tables, use relationships (foreign keys) to link related rows.

Example:

- CustomerID in the SalesOrder Table links to CustomerID in the Customer Table.  
- This makes it easy to find all orders for a specific customer.

* Key Takeaway:  
- Structured data in relational databases must be normalized to avoid duplication and ensure data integrity. Through schemas and relationships, we create organized, efficient, and easily searchable databases.


Relationships in Structured Data


* After normalizing data, we have multiple related tables, each representing a single business entity. Most real-world transactions involve several entities. To connect these tables, we use relationships.

* Why Relationships Are Needed

- Example: A Sales Order involves:

- The Customer who made the purchase
- The Product that was sold

To provide complete information about one sales order, we must link the:

- Sales Order Table
- Customer Table
- Product Table

* Primary Key and Foreign Key

- Each table has a Primary Key (PK), which is a unique identifier for each record.
- To create a relationship, we copy the primary key from one table and place it as a new column in another table. That new column becomes the Foreign Key (FK).

- Example: CustomerID is the Primary Key in the Customer Table. The same CustomerID is added as a Foreign Key in the SalesOrder Table to link each order to its customer.
- So, if a user asks, “Who placed Sales Order A123?” Look up the order in the SalesOrder Table, then find CustomerID = 0002. Go to the Customer Table and find customer 0002 = Peter Vogel.

- That’s how a relationship connects data across tables.

Table Key Type Description
Customer Table     Primary Key → CustomerID     Identifies each customer
Product Table     Primary Key → ProductID     Identifies each product
SalesOrder Table     Foreign Keys → CustomerID,     ProductID     Connects customer and product for each order


Thus, the SalesOrder Table depends on both the Customer and Product tables.

* Table Dependencies
Relationships create dependencies between tables.
Example: 
- SalesOrder depends on Customer (who made the purchase)
- SalesOrder depends on Product (what was purchased)
- This structure ensures data integrity. Every order must link to a valid customer and product.

*Performance Impact

- While relationships improve data structure and accuracy, they can slow down performance. Each lookup requires checking related rows in multiple tables. With many customers and products, these lookups can become time-consuming.

- Tip: Proper indexing and database optimization can reduce lookup delays.

Concept Description
Relationship Link between tables using keys
Primary Key (PK) Unique identifier for each record
Foreign Key (FK) Copy of another table’s PK to create link
Dependency One table relies on another (e.g., SalesOrder → Customer)
Challenge Multiple lookups can affect performance


Key Takeaway: Relationships in structured data link multiple tables together using Primary and Foreign Keys. They ensure data consistency and logical connections, forming the backbone of relational databases.


Other DBMS Features

* When we normalize data, it gets stored in multiple related tables. For example, a sales order might pull information from the Customer, SalesOrder, and Product tables. To make data access faster, simpler, and more consistent, relational databases use Indexes, Views, and Stored Procedures.

1) Indexes: Speed Up Data Retrieval

- An Index works like an index in a book. It helps find data quickly without scanning the whole table.

- If you often search by City, create an index on that column. Now, rows are grouped and sorted by city for quick lookups.

a) Types of Indexes

- 1. Clustered Index



- Controls the physical row order in the table.
- Data is stored in sorted order (like 001, 002, 003).
- Usually on the Primary Key column.
- Only one clustered index is allowed per table.

- 2. Non-Clustered Index


- A separate structure that stores pointers to actual rows.
- Does not affect the physical order.
- You can create multiple non-clustered indexes for different columns.
- Too many indexes can slow down updates. Each insert, update, or delete must also update all related indexes. The ideal balance is around 6 to 7 indexes per table.

2) Views: Simplify and Secure Data Access


- A View is a virtual table. It shows data from one or more tables but does not store it physically.
- Use Views when you often need the same combination of columns or joins. For example, a SalesSummary View may combine columns from the SalesOrder, Product, and Customer tables.

** Benefits

- Simplifies queries with a predefined data structure; there's no need for repeated joins.
- Protects sensitive data by showing only selected columns.
- Restricts access, allowing apps to access the view instead of the actual tables.
- This is ideal for maintaining data privacy and ease of access.


3) Stored Procedures: Improve Performance and Consistency




- A Stored Procedure is a saved block of SQL commands that runs as one unt. Instead of multiple calls from the app to the server, one call executes all steps, improving speed.

**Benefits

- Better performance with fewer network calls and faster processing.
- Consistency as all applications call the same procedure, ensuring uniform results.
- Centralized logic; if business logic changes, you only need to update the procedure, not every app.
- For example, a procedure named ShipSalesOrder might handle all shipping tasks—updating orders, inventory, and status in one go. If the shipping process changes, you only need to update this procedure, and all apps stay consistent automatically.


Feature Purpose Benefits Notes
Index Speed up data search     Faster queries     Keep limited indexes
Clustered Index Physical row order     Quick access via PK     Only one per table
Non-Clustered Index Logical mapping     Flexible searches     Multiple allowed
View Virtual table     Simplifies queries, hides data     Improves privacy
Stored Procedure Predefined SQL set     Faster, consistent, reusable     Centralized logic


Querying and Updating the data


* SQL :

- SQL (Structured Query Language), pronounced either “S-Q-L” or “Sequel,” is the industry-standard language used for:
- Retrieving data from relational databases.
- Updating and modifying existing data.
- Defining and controlling database structures and access.
- The American National Standards Institute (ANSI) manages SQL with guidance from the International Standards Organization (ISO).
- Different database vendors offer their own extended versions:

- Microsoft SQL Server: Transact-SQL (T-SQL)
- Oracle: PL/SQL
- MySQL: MySQL SQL dialect

- The core SQL syntax remains consistent, making it transferable across systems.

* 3 Main Parts of SQL :



- SQL is divided into three main components, each serving a specific purpose:

1. DML (Data Manipulation Language)

This is used for retrieving and modifying data in tables. The main commands include:

- SELECT: Retrieve data from tables.
- UPDATE: Modify existing data in tables.
- DELETE: Remove rows from a table.
- INSERT: Add new rows to a table.

Common clauses are:

- FROM: Specifies the table to work with.
- JOIN: Combines data from multiple tables.
- WHERE: Filters rows based on conditions.

Tip: SQL can get complex with many clauses. Simple tasks are easy, but advanced queries can be challenging.

Examples:

-- SELECT  
SELECT CustID, FirstName, LastName  
FROM Customers  
WHERE CustID = '0001';  

-- UPDATE  
UPDATE Customers  
SET LastName = 'Irvin'  
WHERE CustID = '0001';  

-- DELETE  
DELETE FROM Customers  
WHERE CustID = '0001';  

-- INSERT  
INSERT INTO Customers (CustID, FirstName, LastName)  
VALUES ('0003', 'Jason', 'Vandeville');  

Always include a WHERE clause in DELETE to avoid removing all rows!

2. DDL (Data Definition Language)

This is used to define and manage database structures like tables, views, and indexes.

Main commands include:

- CREATE: Add new tables, views, or procedures.
- DROP: Delete existing tables or database objects.
- ALTER: Modify existing database structures.

Example:

CREATE TABLE Customers (  
  CustID VARCHAR(10),  
  FirstName VARCHAR(50),  
  LastName VARCHAR(50)  
);  

3. DCL (Data Control Language)

This is used to control access and permissions for users.

Main commands include:

- GRANT: Give users permission to read or modify data.
- REVOKE: Remove permissions previously granted.

Example:

GRANT SELECT, UPDATE ON Customers TO User1;  
REVOKE UPDATE ON Customers FROM User1;  

SQL Category Purpose Common Commands
DML Manipulate data     SELECT, INSERT, UPDATE, DELETE
DDL Define structure     CREATE, ALTER, DROP
DCL Control access     GRANT, REVOKE

* Key Takeaways

- SQL is the universal language for relational databases.
- DML handles data operations, DDL manages structure, and DCL controls access.
- Knowing standard SQL lets you work across different DBMS like SQL Server, MySQL, PostgreSQL, and others.
- Always use WHERE clauses carefully to avoid unwanted data changes or deletions.

Running SQL Statements


Transaction Integrity (ACID Properties)

Transaction Integrity, also known as Transactional Integrity or Managing Transactions.


* What Is a Transaction?
- A transaction is a set of operations that must be treated as a single logical unit. 
- For example, when you transfer $10 from your Savings Account to your Checking Account, it involves two updates:
1. Subtract $10 from Savings
2. Add $10 to Checking
- Together, these two updates form one transaction.

* What Can Go Wrong?

- If the first update succeeds but the second fails, $10 disappears. If another user looks at the data between those two steps, they see inconsistent information. For instance, they might see $10 missing from savings but not yet added to checking.
- The bank's total amount ($70) and your accounts ($60) don’t match, causing inconsistency!

* Transaction Integrity Ensures:

- Data stays consistent
- All updates succeed or none do
- No half-completed transactions are visible
- Data remains safe even after a system failure

The 4 Rules of Transaction Integrity — ACID

Property Meaning Example (Bank Transaction)
 Atomicity All operations in a transaction are treated as one single unit. Either all happen or none happen. Both updates (savings -$10, checking +$10) occur together or not at all.
Consistency
Transaction must move the database from one valid state to another valid state.

Total balance before and after transaction = $70.
Isolation
Ongoing transactions are invisible to others until complete.

Other users can only see the initial or final state, not in-between.
Durability
Once committed, the changes persist permanently, even if a system crash occurs.

After transaction completes, balances stay $10 and $60 even after reboot.


Atomic + Consistent + Isolated + Durable = ACID

Rollback vs Commit

Commit → Transaction completes successfully → changes are saved.

Rollback → Something fails → all updates are undone → data returns to its original consistent state.

For example, if the transfer fails after deducting from savings, the rollback restores both accounts to:

$20 in Savings

$50 in Checking

Data is safe and consistent again.

Concept Description
Transaction     Group of SQL operations treated as one unit
Goal     Maintain accuracy, consistency, and reliability
ACID     Key rules ensuring transaction integrity
Rollback     Reverts to previous consistent state if any error occurs
Commit     Confirms all updates and saves final state

In Short:

Transaction Integrity = ACID. It ensures the database stays accurate, consistent, isolated, and durable. This means no lost money, no half updates, and no corrupted data.

Costs & Benefits of Relational Databases

* Costs and benefits of using relational databases and why we sometimes need other storage solutions like unstructured data or data warehouses.

* Benefits of Relational Databases
- Relational databases are popular for their flexibility, accuracy, and consistency.

1) Flexible Design:  
- Data is spread across multiple related tables.  
- Example: From Customer to Sales Order to Product, we can easily answer questions like “Which customers bought which products?”

2) Supports Many Business Scenarios:  
- Because of relationships and normalization, we can combine tables in many ways to answer different business questions.

3) No Duplicate Data:  
- Each entity’s data is stored only once.  
- Example: All customer info lives in the Customer table, avoiding redundancy and ensuring consistency.

* Costs / Limitations of Relational Databases

- Even though relational databases are powerful, they face performance and structural challenges.

1. Multiple Table Updates  
- Most business transactions, like transferring money, involve updating several tables.  
- This slows down online transaction processing (OLTP), especially when many users are active at the same time.

2. Fixed Schema (Rigid Structure)  
- Every row in a table must have the same columns.  
- But in real life, data types differ.  
- Example:  
Books have ISBN  
Stoves have safety warnings  
Computers have specifications  
Forcing these diverse entities into one rigid structure causes inefficiency and complexity.

3. Reporting Performance  
- Reports often require processing thousands of rows.  
- To create reports, the system must join multiple tables (Customer, SalesOrder, Product).  
- This slows down query performance; large reports can take too long to generate.

Solutions to These Problems 
 
Problem Best Solution Description
Too many transactions per second (performance issue) Unstructured Data Systems Handle large, fast data streams better.
Different data shapes / flexible formats needed Unstructured or Semi-Structured Storage Works better when entities don’t share the same schema.
Reporting is too slow due to joins Data Warehouse (Denormalized Data) Combines tables (Customer + Sales + Product) into one big table for faster reporting.
* Note:  
- Denormalized data means faster reads and slower updates.  
- It’s great for reporting, but not for real-time transactions.

Aspect Structured Data (Relational DB) Unstructured / Data Warehouse

Design

    Highly structured (tables, rows, columns)

    Flexible / combined tables

Speed

    Slower with large joins

    Faster for bulk reads

Use Case
    
    Daily transactions (OLTP)
    
    Analytics, reports (OLAP)

Example
    
    Bank, E-commerce apps
    
    Business intelligence dashboards

In Short

Relational Databases are consistent, reliable, and flexible.  
Yet, they struggle with high-speed transactions, diverse data, and large-scale reporting.  
Therefore, we use Unstructured Data for flexibility and Data Warehouses for efficient reporting.




Comments

Popular posts from this blog

AI-900-3,4

AI-900 12,13

AI-900 10,11