DP-900-2
DP - 900
Module 2 - Structured Data and Relational Database Solutions
* When you move your databases to Azure, you have different solutions based on the level of control or management you want.
1. Running Databases on Azure Virtual Machines (IaaS)
You can transfer your existing Database Management System (DBMS) to Azure by installing it on an Azure Virtual Machine.
Examples: Microsoft SQL Server, MySQL, etc.
You manage:
- OS upgrades and patches
- DBMS upgrades and patches
- Data access and data import
This option gives you full control but requires high maintenance.
2. Serverless Database Solutions (PaaS)
If you want to reduce administrative tasks, consider a serverless approach.
Azure automatically handles servers, upgrades, and patches.
You will still manage data access and data import.
There’s no need to manage VMs or DBMS manually.
Azure SQL Database Family provides several options:
1). Azure SQL Database
This is a fully managed database service.
It works with most features of on-prem SQL Server, but not all.
Missing features include:
- Service Broker (for scheduling tasks)
- Database Mail (for sending emails from DBMS)
This option is ideal for single databases, but it has limited feature compatibility.
2). Azure SQL Managed Instance
This offers nearly 100% compatibility with on-prem SQL Server.
It provides more control over update and patch timings.
This is great for migration with minimal changes.
3). Elastic Pools
This allows multiple databases to share resources like CPU and memory.
Example:
OLTP (sales) uses resources during the day.
Inventory (batch jobs) uses them at night.
This saves costs by optimizing resource use.
4). Azure SQL Edge
This is designed for IoT environments.
It processes continuous data streams in real time.
It uses rolling time windows (e.g., 5-minute sliding window) instead of separate transactions.
5). Open Source Database Options
- MySQL / MariaDB
- PostgreSQL
Use Azure Database for MySQL/PostgreSQL/MariaDB for managed open-source solutions. You can find third-party database engines on Azure Marketplace or install your own on a VM.
* Summary
| Option | Type | Management | Best For |
|---|---|---|---|
| Azure VM + SQL Server | IaaS | You manage all | Full control |
| Azure SQL Database | PaaS | Azure manages | Single DBs |
| Managed Instance | PaaS | Shared control | Migration needs |
| Elastic Pools | PaaS | Azure manages | Cost-saving multi-DB setup |
| SQL Edge | PaaS | Azure manages | IoT & real-time data |
| Open-source DBs | PaaS/IaaS | Azure or You | Flexible tech stack |
Creating Azure SQL
Normalizing Data
* Structured data and relational databases are closely related; you can’t understand one without the other.
* Structured Data
- Structured Data is organized into tables, each with rows and columns.
- Each row represents a unique entity or record, such as one customer.
- Each column represents an attribute, like Customer ID, First Name, or Last Name.
- Each column has a specific data type, including text, number, or date.
Example:
Customer ID First Name Last Name
1 Peter Vogel
2 Jan Vogel
- It is easy to query, organize, and maintain. However, it can lead to duplication if not structured properly.
* Normalization
- Normalization is the process of organizing structured data into tables to reduce redundancy and improve consistency.
- Goal: Each table contains only relevant data for a single entity.
Example: The Customer table should have only customer-related information.
- Problem with Denormalized Tables
- A denormalized table mixes different types of data, such as customer and sales information. This can lead to issues like:
- Duplicate data where the same customer information repeats in multiple rows.
- Inconsistency where an address changes in one row but not in others.
Wasted storage.
Example of Denormalized Data:
Sales Order ID Customer Name Address Product Quantity
001 Peter Vogel Toronto Laptop 2
002 Peter Vogel Toronto Mouse 1
- If Peter changes his address, it must be updated in multiple rows, which likely causes errors.
- Normalized Solution
- Split the data into separate tables:
- Customer Table stores customer info.
- Sales Order Table stores order info.
- Product Table stores product info.
- Each record links through unique identifiers (keys), not by repeating data.
** Primary Key
- A Primary Key (PK) uniquely identifies each row in a table.
- Example:
CustomerID in the Customer Table.
SalesOrderID in the Sales Order Table.
- No two rows can share the same primary key value.
* Database Schema
- After normalization, we define a database schema, which is the structure of the database:
- It lists all tables in the database.
- It defines each table’s schema, including:
- Columns
- Data types
- Primary key
- Relationships with other tables
- Example:
Customer Table → CustomerID (PK), FirstName, LastName
Product Table → ProductID (PK), ProductName, Price
SalesOrder Table → SalesOrderID (PK), CustomerID (FK), ProductID (FK), Quantity
* Relationships Between Tables
- To combine data from multiple tables, use relationships (foreign keys) to link related rows.
Example:
- CustomerID in the SalesOrder Table links to CustomerID in the Customer Table.
- This makes it easy to find all orders for a specific customer.
* Key Takeaway:
- Structured data in relational databases must be normalized to avoid duplication and ensure data integrity. Through schemas and relationships, we create organized, efficient, and easily searchable databases.
Relationships in Structured Data
* After normalizing data, we have multiple related tables, each representing a single business entity. Most real-world transactions involve several entities. To connect these tables, we use relationships.
* Why Relationships Are Needed
- Example: A Sales Order involves:
- The Customer who made the purchase
- The Product that was sold
To provide complete information about one sales order, we must link the:
- Sales Order Table
- Customer Table
- Product Table
* Primary Key and Foreign Key
- Each table has a Primary Key (PK), which is a unique identifier for each record.
- To create a relationship, we copy the primary key from one table and place it as a new column in another table. That new column becomes the Foreign Key (FK).
- Example: CustomerID is the Primary Key in the Customer Table. The same CustomerID is added as a Foreign Key in the SalesOrder Table to link each order to its customer.
- So, if a user asks, “Who placed Sales Order A123?” Look up the order in the SalesOrder Table, then find CustomerID = 0002. Go to the Customer Table and find customer 0002 = Peter Vogel.
- That’s how a relationship connects data across tables.
| Table | Key Type | Description |
|---|---|---|
| Customer Table | Primary Key → CustomerID |
Identifies each customer |
| Product Table | Primary Key → ProductID |
Identifies each product |
| SalesOrder Table | Foreign Keys → CustomerID, ProductID |
Connects customer and product for each order |
Thus, the SalesOrder Table depends on both the Customer and Product tables.
* Table Dependencies
Relationships create dependencies between tables.
Example:
- SalesOrder depends on Customer (who made the purchase)
- SalesOrder depends on Product (what was purchased)
- This structure ensures data integrity. Every order must link to a valid customer and product.
*Performance Impact
- While relationships improve data structure and accuracy, they can slow down performance. Each lookup requires checking related rows in multiple tables. With many customers and products, these lookups can become time-consuming.
- Tip: Proper indexing and database optimization can reduce lookup delays.
| Concept | Description |
|---|---|
| Relationship | Link between tables using keys |
| Primary Key (PK) | Unique identifier for each record |
| Foreign Key (FK) | Copy of another table’s PK to create link |
| Dependency | One table relies on another (e.g., SalesOrder → Customer) |
| Challenge | Multiple lookups can affect performance |
Key Takeaway: Relationships in structured data link multiple tables together using Primary and Foreign Keys. They ensure data consistency and logical connections, forming the backbone of relational databases.
Other DBMS Features
* When we normalize data, it gets stored in multiple related tables. For example, a sales order might pull information from the Customer, SalesOrder, and Product tables. To make data access faster, simpler, and more consistent, relational databases use Indexes, Views, and Stored Procedures.
1) Indexes: Speed Up Data Retrieval
- An Index works like an index in a book. It helps find data quickly without scanning the whole table.
- If you often search by City, create an index on that column. Now, rows are grouped and sorted by city for quick lookups.
a) Types of Indexes
- 1. Clustered Index
- Controls the physical row order in the table.
- Data is stored in sorted order (like 001, 002, 003).
- Usually on the Primary Key column.
- Only one clustered index is allowed per table.
- 2. Non-Clustered Index
- Does not affect the physical order.
- You can create multiple non-clustered indexes for different columns.
- Too many indexes can slow down updates. Each insert, update, or delete must also update all related indexes. The ideal balance is around 6 to 7 indexes per table.
2) Views: Simplify and Secure Data Access
- A View is a virtual table. It shows data from one or more tables but does not store it physically.
- Use Views when you often need the same combination of columns or joins. For example, a SalesSummary View may combine columns from the SalesOrder, Product, and Customer tables.
** Benefits
- Simplifies queries with a predefined data structure; there's no need for repeated joins.
- Protects sensitive data by showing only selected columns.
- Restricts access, allowing apps to access the view instead of the actual tables.
- This is ideal for maintaining data privacy and ease of access.
3) Stored Procedures: Improve Performance and Consistency
- A Stored Procedure is a saved block of SQL commands that runs as one unt. Instead of multiple calls from the app to the server, one call executes all steps, improving speed.
**Benefits
- Better performance with fewer network calls and faster processing.
- Consistency as all applications call the same procedure, ensuring uniform results.
- Centralized logic; if business logic changes, you only need to update the procedure, not every app.
- For example, a procedure named ShipSalesOrder might handle all shipping tasks—updating orders, inventory, and status in one go. If the shipping process changes, you only need to update this procedure, and all apps stay consistent automatically.
| Feature | Purpose | Benefits | Notes |
|---|---|---|---|
| Index | Speed up data search | Faster queries | Keep limited indexes |
| Clustered Index | Physical row order | Quick access via PK | Only one per table |
| Non-Clustered Index | Logical mapping | Flexible searches | Multiple allowed |
| View | Virtual table | Simplifies queries, hides data | Improves privacy |
| Stored Procedure | Predefined SQL set | Faster, consistent, reusable | Centralized logic |
Querying and Updating the data
* SQL :
- SQL (Structured Query Language), pronounced either “S-Q-L” or “Sequel,” is the industry-standard language used for:
- Retrieving data from relational databases.
- Updating and modifying existing data.
- Defining and controlling database structures and access.
- The American National Standards Institute (ANSI) manages SQL with guidance from the International Standards Organization (ISO).
- Different database vendors offer their own extended versions:
- Microsoft SQL Server: Transact-SQL (T-SQL)
- Oracle: PL/SQL
- MySQL: MySQL SQL dialect
- The core SQL syntax remains consistent, making it transferable across systems.
* 3 Main Parts of SQL :
- SQL is divided into three main components, each serving a specific purpose:
1. DML (Data Manipulation Language)
This is used for retrieving and modifying data in tables. The main commands include:
- SELECT: Retrieve data from tables.
- UPDATE: Modify existing data in tables.
- DELETE: Remove rows from a table.
- INSERT: Add new rows to a table.
Common clauses are:
- FROM: Specifies the table to work with.
- JOIN: Combines data from multiple tables.
- WHERE: Filters rows based on conditions.
Tip: SQL can get complex with many clauses. Simple tasks are easy, but advanced queries can be challenging.
Examples:
-- SELECT
SELECT CustID, FirstName, LastName
FROM Customers
WHERE CustID = '0001';
-- UPDATE
UPDATE Customers
SET LastName = 'Irvin'
WHERE CustID = '0001';
-- DELETE
DELETE FROM Customers
WHERE CustID = '0001';
-- INSERT
INSERT INTO Customers (CustID, FirstName, LastName)
VALUES ('0003', 'Jason', 'Vandeville');
Always include a WHERE clause in DELETE to avoid removing all rows!
2. DDL (Data Definition Language)
This is used to define and manage database structures like tables, views, and indexes.
Main commands include:
- CREATE: Add new tables, views, or procedures.
- DROP: Delete existing tables or database objects.
- ALTER: Modify existing database structures.
Example:
CREATE TABLE Customers (
CustID VARCHAR(10),
FirstName VARCHAR(50),
LastName VARCHAR(50)
);
3. DCL (Data Control Language)
This is used to control access and permissions for users.
Main commands include:
- GRANT: Give users permission to read or modify data.
- REVOKE: Remove permissions previously granted.
Example:
GRANT SELECT, UPDATE ON Customers TO User1;
REVOKE UPDATE ON Customers FROM User1;
| SQL Category | Purpose | Common Commands |
|---|---|---|
| DML | Manipulate data | SELECT, INSERT, UPDATE, DELETE |
| DDL | Define structure | CREATE, ALTER, DROP |
| DCL | Control access | GRANT, REVOKE |
* Key Takeaways
- SQL is the universal language for relational databases.
- DML handles data operations, DDL manages structure, and DCL controls access.
- Knowing standard SQL lets you work across different DBMS like SQL Server, MySQL, PostgreSQL, and others.
- Always use WHERE clauses carefully to avoid unwanted data changes or deletions.
Running SQL Statements
Transaction Integrity (ACID Properties)
Transaction Integrity, also known as Transactional Integrity or Managing Transactions.
- A transaction is a set of operations that must be treated as a single logical unit.
- For example, when you transfer $10 from your Savings Account to your Checking Account, it involves two updates:
1. Subtract $10 from Savings
2. Add $10 to Checking
- Together, these two updates form one transaction.
* What Can Go Wrong?
- If the first update succeeds but the second fails, $10 disappears. If another user looks at the data between those two steps, they see inconsistent information. For instance, they might see $10 missing from savings but not yet added to checking.
- The bank's total amount ($70) and your accounts ($60) don’t match, causing inconsistency!
* Transaction Integrity Ensures:
- Data stays consistent
- All updates succeed or none do
- No half-completed transactions are visible
- Data remains safe even after a system failure
The 4 Rules of Transaction Integrity — ACID
| Property | Meaning | Example (Bank Transaction) |
|---|---|---|
| Atomicity | All operations in a transaction are treated as one single unit. Either all happen or none happen. | Both updates (savings -$10, checking +$10) occur together or not at all. |
| Consistency | Transaction must move the database from one valid state to another valid state. |
Total balance before and after transaction = $70. |
| Isolation | Ongoing transactions are invisible to others until complete. |
Other users can only see the initial or final state, not in-between. |
| Durability | Once committed, the changes persist permanently, even if a system crash occurs. |
After transaction completes, balances stay $10 and $60 even after reboot. |
Atomic + Consistent + Isolated + Durable = ACID
Rollback vs Commit
Commit → Transaction completes successfully → changes are saved.
Rollback → Something fails → all updates are undone → data returns to its original consistent state.
For example, if the transfer fails after deducting from savings, the rollback restores both accounts to:
$20 in Savings
$50 in Checking
Data is safe and consistent again.
| Concept | Description |
|---|---|
| Transaction | Group of SQL operations treated as one unit |
| Goal | Maintain accuracy, consistency, and reliability |
| ACID | Key rules ensuring transaction integrity |
| Rollback | Reverts to previous consistent state if any error occurs |
| Commit | Confirms all updates and saves final state |
In Short:
Transaction Integrity = ACID. It ensures the database stays accurate, consistent, isolated, and durable. This means no lost money, no half updates, and no corrupted data.
Costs & Benefits of Relational Databases
* Costs and benefits of using relational databases and why we sometimes need other storage solutions like unstructured data or data warehouses.
* Benefits of Relational Databases
- Relational databases are popular for their flexibility, accuracy, and consistency.
1) Flexible Design:
- Data is spread across multiple related tables.
- Example: From Customer to Sales Order to Product, we can easily answer questions like “Which customers bought which products?”
2) Supports Many Business Scenarios:
- Because of relationships and normalization, we can combine tables in many ways to answer different business questions.
3) No Duplicate Data:
- Each entity’s data is stored only once.
- Example: All customer info lives in the Customer table, avoiding redundancy and ensuring consistency.
* Costs / Limitations of Relational Databases
- Even though relational databases are powerful, they face performance and structural challenges.
1. Multiple Table Updates
- Most business transactions, like transferring money, involve updating several tables.
- This slows down online transaction processing (OLTP), especially when many users are active at the same time.
2. Fixed Schema (Rigid Structure)
- Every row in a table must have the same columns.
- But in real life, data types differ.
- Example:
Books have ISBN
Stoves have safety warnings
Computers have specifications
Forcing these diverse entities into one rigid structure causes inefficiency and complexity.
3. Reporting Performance
- Reports often require processing thousands of rows.
- To create reports, the system must join multiple tables (Customer, SalesOrder, Product).
- This slows down query performance; large reports can take too long to generate.
Solutions to These Problems
| Problem | Best Solution | Description |
|---|---|---|
| Too many transactions per second (performance issue) | Unstructured Data Systems | Handle large, fast data streams better. |
| Different data shapes / flexible formats needed | Unstructured or Semi-Structured Storage | Works better when entities don’t share the same schema. |
| Reporting is too slow due to joins | Data Warehouse (Denormalized Data) | Combines tables (Customer + Sales + Product) into one big table for faster reporting. |
* Note:
- Denormalized data means faster reads and slower updates.
- It’s great for reporting, but not for real-time transactions.
| Aspect | Structured Data (Relational DB) | Unstructured / Data Warehouse |
|---|---|---|
Design |
Highly structured (tables, rows, columns) |
Flexible / combined tables |
Speed |
Slower with large joins |
Faster for bulk reads |
Use Case |
Daily transactions (OLTP) |
Analytics, reports (OLAP) |
Example |
Bank, E-commerce apps |
Business intelligence dashboards |
In Short
Relational Databases are consistent, reliable, and flexible.
Yet, they struggle with high-speed transactions, diverse data, and large-scale reporting.
Therefore, we use Unstructured Data for flexibility and Data Warehouses for efficient reporting.
Comments
Post a Comment