All About Snowflake Database

Creating SnowFlake Account

Step 1 : Go to the Snowflake signup.snowflake.com and add details and click continue.

Step 2 : Add details.

Step 3 : Check the documentation and click Virtual Hands on lab.

Step 4 : Activate your snowflake account.

Step 5 : Add username and password.

When you activated your account it would have opened your trial in a new browser tab, but just in case you closed that tab or lost track of it, remember you can always go back to your email inbox and find an email that will give you the link to your Snowflake Trial account.

Creating a database.

Step 1 : Create a new database. From the sidebar go to the Catalog from there go to the Database Explorer. And click on (+)database.

Step 2 : Add details regarding your database. And click Create.

Explore the database

Step 3 : See the privileges, Schemas, Views and objects inside all the schemas.

Databases group datasets, or tables, together.

Within a database, a schema acts as a second-level organizational grouping.

When a database is created in Snowflake, it automatically generates two schemas:

INFORMATION_SCHEMA : contains a collection of views and cannot be deleted, renamed, or moved.

PUBLIC : is created empty and can store tables, views, and other objects. It can be deleted, renamed, or moved.

Transfer Ownership of your database to SYSADMIN Role.

Step 4 : Click on three dots and then select transfer ownership and select sysadmin. Click transfer.

Step 5 : Transfer your account role to SYSAdmin role.Click on your profile name at the bottom of the sidebar. And select SYSAdmin.

Identity Vs. Access

Identity refers to who a user is.

Access refers to what a user can see or do.

Authentication is the process of verifying a user’s identity and proving who they are.

Authorization is the process of confirming which actions or resources a user is allowed to access.

Authentication and authorization are separate processes, but they often happen at the same time.

With the move to cloud-based systems, these processes have become more distinct and complex.

The authenticator verifies the user’s identity and checks if their credentials are valid.

After authentication, the user becomes an authenticated account in the system.

The authorizer manages permissions and decides what the authenticated user can do.

Role-Based Access Control (RBAC) is used to handle authorization.

In RBAC, roles are assigned to users, and permissions are connected to roles rather than to individual users.

Users can change roles to complete specific tasks, and access is granted or denied based on the current role.

Changing Your Role Setting

Roles in Snowflake can easily be changed through a menu on the home screen or worksheet.

All users are automatically given the PUBLIC role, which has the fewest privileges.

Trial account users are automatically assigned the ACCOUNTADMIN role, which has the highest level of database privileges.

The ACCOUNTADMIN role includes three additional roles through RBAC inheritance.

In production environments:

SYSADMIN is usually used for creating and managing databases, warehouses, tables, and views.

SECURITYADMIN and USERADMIN manage security and user tasks.

ACCOUNTADMIN is rarely used and only for specific administrative tasks.

Roles can be seen as "hats" with different levels of access and responsibilities.

ACCOUNTADMIN provides full administrative (VIP) access.

SECURITYADMIN and USERADMIN control and enforce user access.

SYSADMIN manages data object creation and management.

The ORGADMIN (Organizational Administrator) role is predefined and can:

Create new Snowflake accounts (not individual users).

Edit high-level configuration settings across accounts.

Beginners typically start with ACCOUNTADMIN for simplicity and later move to SYSADMIN for daily operations.

Step 6 : Go to the governance and security tab from sidebar. And click on Users & Roles.

Go to the roles tab.

Go to the graph tab.

Read the details of each role from below given image.

RBAC Review

Identity refers to the user, including their username and password.

Authentication happens once during login to confirm the user's identity.

Access refers to the role a user is using in Snowflake.

Authorization occurs continuously while using the platform, deciding what actions a user can take.

Snowflake provides six predefined roles, and users can also create their own custom roles.

ACCOUNTADMIN is the highest-level role with the most privileges.

When you assign ACCOUNTADMIN, it automatically grants access to SYSADMIN and two security roles through inheritance.

You can change roles from the home page or any worksheet.

Role settings can vary between different interfaces.

If you encounter a "does not exist" error, the first step is to check the current role you are using.

DAC (Discretionary Access Control)

Snowflake uses RBAC (Role-Based Access Control) for security management. It also uses DAC (Discretionary Access Control). Discretionary Access Control means that the owner of an object controls who can access it. In Snowflake, the owner is a role, not a specific user. The owning role decides which other roles can access or modify objects. For example, the SYSADMIN role can grant or restrict access to specific tables or objects for other roles.

Snowflake’s access model includes Discretionary Access Control (DAC) and Role-Based Access Control (RBAC).

DAC follows the principle of “you create it, you own it.”

The role that creates an object automatically becomes its owner.

The owning role has full control; it can delete, rename, or modify the object.

This is similar to personal file ownership models, like MS Word, email, and Google Sheets.

In Snowflake, ownership and permissions are determined by the role that was active when the object was created, not by the individual user.

Default Role Assignment

Default Role is a user-specific setting in Snowflake that provides convenience.

Each user automatically gets a default role assigned.

For trial account users, the default role is ACCOUNTADMIN.

The default role determines which role is active right after login.

Users can change their default role to another one if needed.

For trial accounts, it’s best to keep the ACCOUNTADMIN role as the default since workshop labs are designed with that in mind.

The default role works independently of role design, hierarchy, inheritance, or administrative oversight.

Challenge Lab : Make the warehouse available to SYSAdmin.

Step 1 : Go to the warehouse section from compute from manage section of sidebar.

Step 2 : go to the SYSTEM$STREAMLIT_NOTEBOOK_WH

Click on transfer Ownership and click transfer.

Data Containers

From Spreadsheets to Databases: A Smarter Way to Manage Data

Many small businesses begin by managing data on paper or in spreadsheets. At first, this works well. However, as the amount of data increases across multiple folders, files, and sheets, it becomes hard to combine and analyze.

Imagine having folders like House Plants and Garden Plants. You might have workbooks for Vegetables, Fruits, and Flowers, each containing several sheets for plant height, root depth, or soil needs. Managing all of this in spreadsheets can quickly get overwhelming.

That’s where databases come in. They organize data more efficiently while keeping a structure similar to spreadsheets.

Spreadsheet vs. Database Hierarchy

Spreadsheet Concept Database Equivalent

Folder Database

Workbook Schema

Worksheet Table

A database has schemas, and each schema has tables. This is similar to how folders hold workbooks and workbooks hold sheets.

Why Move to a Database?

Better organization: Centralizes and simplifies data management.

Scalability: Handles large datasets smoothly.

Better analysis: Makes it easier to combine and query data.

Cost-efficient: Cloud platforms like Snowflake remove high upfront costs.

Example Structure

Database: Garden_Plants

Schemas: Vegetables, Fruits, Flowers

Tables: Plant_Height, Root_Depth, Soil_Needs

This setup resembles spreadsheets but offers much greater flexibility and analytical power.

Key Takeaway

When data outgrows spreadsheets, moving to a database is the next logical step. With cloud solutions like Snowflake, anyone can organize, analyze, and scale their data easily without complexity or high cost.

Create Database And Schemas

Step 1 : Set Your Role to SYSAdmin. Go to your profile. Click on switch role and switch it to SYSAdmin role.

Step 2 : Create a database named GARDEN_PLANTS.

Click on +Database.

Click on the database name.

Step 3 : Drop the public schema and create new schemas VEGGIES, FRUITS, FLOWERS.

Go to the schema tab.

Go to the public schema and select drop.

Click on +Schema button to create new schemas.

Creating Worksheets in Snowflake.

Step 1 : Create one SQL file. Go to the home page and click on + icon and select SQL file.

Rename this file to My Worksheet.

Knowing What will Run!!

The Object Pickers

Run the show database commands.

Running a SHOW DATABASES command is just like being at the first level of an Object Picker (but with more details, and the ability to cut and paste the info into a spreadsheet).

Run the show schemas command.

Table, Data types and Loading Data

Scenario Summary

A database with three schemas has been created.

The next step is to create tables and load data using Structured Query Language (SQL).

The example involves a spreadsheet that tracks the root depth of vegetables, categorized as shallow, medium, and deep.

Key Concepts Introduced

Structured Query Language (SQL)

The language used to create, manage, and manipulate databases.

Also called S.Q.L. or SeQueL.

Data Modeling and Normalization

Techniques used to organize data efficiently.

These help avoid redundancy and improve database structure.

Steps in Designing the Table

Review and Adjust Data Structure

Add a unique ID to each row for primary key identification.

Add a single-letter code for convenience.

Split the measurement data into three separate columns for clarity and normalization.

Unit Conversion

Convert measurements from inches to centimeters to maintain consistency.

Define Table Details

Choose column names that clearly describe the data.

Assign appropriate data types, such as INT, VARCHAR, and DECIMAL.

Decide on the maximum length for each column’s data.

Write the SQL Command

Use the defined structure to write a CREATE TABLE statement that includes:

Column names

Data types

Length constraints

Key Takeaways

SQL is essential for creating and managing tables in databases.

Data modeling helps ensure data is structured logically and efficiently.

Adding unique IDs, using clear column names, and defining data types are crucial for good database design.

Normalization and unit consistency, like using centimeters, improve data accuracy and usability.

Create A Root Depth Table

Step 1 : Go to the Project Section from sidebar and select workspaces from that section.

Step 2 : Click on Add new.

Make sure the role is of SYSAdmin.

Choose the database as Garden_Plants. And Schema as Veggies.

Step 3 : Create the table.

Find your created table.

View the definition.

Step 4 : Insert Data into the table.

Select Preview table.

Write below given query.

Error : SQL compilation error: Table 'ROOT_DEPTH' does not exist or not authorized.

TO solve this error follow below image.

Run the insert query again.

Select Star (SELECT *) and Limits

SELECT * (Select Star):

It is a SQL statement used to request all columns in a table without listing them one by one.

The asterisk (*) is pronounced as "Star" in this context (e.g., "Select Star").

Using SELECT * with LIMIT:

If you want all columns but not all rows, you can combine SELECT * with the LIMIT clause.

Purpose of LIMIT: It ensures you get only a small set of rows.

Benefit: This prevents wasting compute power by retrieving millions of rows when only a few are needed for preview or inspection.

Example Syntax:

SQL

SELECT *

FROM ROOT_DEPTH

LIMIT 1;

(This example retrieves all columns and limits the result to just 1 row.)

Snowflake Worksheets and Warehouses

Snowflake Worksheets & Context Settings

Worksheets

Worksheets are the main place in Snowflake where users write, run, save, and share code (SQL queries). Users can create new worksheets by clicking the plus sign ($\mathbf{+}$). Worksheets have blue buttons that can toggle different parts of the display on and off.

Context Settings

Every worksheet has four context settings that are saved with it. These settings provide the default environment needed for running code.

1. Data Storage Location (Database and Schema)

These two settings give a shortcut to the data storage location, allowing users to write shorter code.

- Database: Sets the default database (e.g., GARDEN_PLANTS). If this is incorrect, the SELECT statement won't run.

- Schema: Sets the default schema within the database (e.g., VEGGIES). If this isn't set, the SELECT statement won't run because it won't know where to find the table.

Note: If the database and schema aren't set in the context, they can be added before the table name (e.g., DATABASE_NAME.SCHEMA_NAME.TABLE_NAME) to make the code run.

2. User Role and Warehouse

- User Role: This setting controls permissions and access.

- Warehouse: This is the computing power used to process the data. It is not for storing data.

Snowflake Warehouse (Compute)

A warehouse is the machine that processes the data. Warehouses can be large (process data quickly) or smaller (process data more slowly). The default warehouse for a trial account is COMPUTE_WH (an extra small warehouse). Warehouses can be set to turn off and on automatically as needed.

- Automatic Shutdown: The warehouse will automatically shut down after 10 minutes without any code running, which saves costs.

- Automatic Startup: The warehouse will turn back on when code is ready to run.

The Key Difference: Required vs. Suggested Context Settings

Of the four context settings (drop menus) in a Snowflake worksheet, two are required for executing a query, while the other two are suggested starting points that make your code simpler.

1. The Required Settings (ROLE and WAREHOUSE)

You need these settings defined for any SELECT statement to run because they provide the basic structure:

ROLE: This setting is required because it determines your access permissions. Without an active ROLE, Snowflake can't know if you have the right to view or use the data, so the query won't run.

WAREHOUSE: This setting is also required because it represents the compute power needed to process the query. Without an active WAREHOUSE, which is the "machine" that processes the data, the query cannot be executed.

2. The Suggested Settings (DATABASE and SCHEMA)

These settings are suggested starting points or "home bases" meant for convenience, not for running the query:

DATABASE and SCHEMA: These settings exist mainly to simplify your SQL code. They tell Snowflake where to look first for tables and views, similar to ROOT_DEPTH. Since they are just suggestions, you can still query data from any other database or schema in the same worksheet by specifying the full object path (e.g., OTHER_DB.OTHER_SCHEMA.TABLE_NAME).

Snowflake Warehouse: The Compute Layer

The term "warehouse" in Snowflake refers to a metaphor based on real-world warehouses where goods are stored and processed.

Real-World Metaphor	Snowflake Terminology	Function
Goods	Storage (The Data)	The data itself is stored separately from the compute.
Workers	Compute (The Servers)	The processing power that crunches the data.

Virtual Warehouses

Snowflake allows multiple virtual warehouses to be available. These virtual warehouses come in different sizes (e.g., Small, Medium). A larger size means more compute power, which means more "workers." A warehouse is not a place for storing data, as the metaphor suggests; it is the machine that processes the data.

Snowflake Warehouses vs. Data Marts

In traditional data warehousing, companies sometimes created data marts. These were separate warehouses assigned to specific business areas (e.g., fruit in one, cookies in another). Data marts led to issues like data siloing, inefficient data retrieval, and data replication that caused number mismatches. Snowflake Warehouses are not like data marts. Snowflake never sections off or silos data. Every warehouse has access to all your data all the time. Snowflake's design ensures efficient retrieval without the need to replicate data.

Scaling a Snowflake Warehouse

Each individual Snowflake warehouse consists of just one cluster of servers. Scaling changes the number of servers within that single cluster.

1. Scaling Up and Down (Vertical Scaling)

- Scaling Up: This means manually increasing the size of an existing warehouse (e.g., Small to Medium) by adding servers to the single cluster.

- Scaling Down: This involves manually decreasing the size of an existing warehouse (e.g., Medium to Small). This process must be done manually on an existing warehouse.

2. Scaling Out (Horizontal/Elastic Scaling)

This is also known as Elastic Data Warehousing or Multi-cluster Warehouses in Snowflake terms. It allows the warehouse to automatically stretch to handle extra work and return to normal when the workload decreases. It scales by adding more clusters, up to 10 clusters wide, during peak workloads. It can be set to auto-scale to a specified maximum width.

Availability: Multi-cluster warehousing is only available for Enterprise and above Snowflake editions.

Defining the Snowflake Warehouse

In Snowflake, the term Warehouse refers to the computing power used to process data, rather than a place to store data. When you create a Warehouse, you are essentially defining a computing workforce.

Teams and Servers (Clusters and Scaling)

1. Warehouse Structure (Teams and Team Members)

The workforce of any single warehouse is a Cluster, similar to a single team. The members of the Cluster are Servers, like team members. All Snowflake Warehouse sizes (e.g., eXtra-Small, Small, Medium) have only one Cluster. A larger warehouse size means that the single cluster consists of more servers.

2. Scaling Up and Down (Vertical Scaling)

Definition: Changing the size of an existing warehouse (e.g., Small to Medium or Medium to Small).

Mechanism: This action changes the number of servers within the one and only cluster.

Scaling Up: Adding servers to the cluster.

Scaling Down: Removing servers from the cluster.

3. Scaling In and Out (Horizontal/Elastic Scaling)

Availability: This feature, called Multi-cluster Warehousing, is available for the Enterprise edition or above.

Definition: The ability for a warehouse to automatically add or remove temporary Clusters during periods of increased demand.

Mechanism:

Scaling Out: Temporary clusters are added to handle increased demand.

Scaling In: Temporary clusters are removed when demand decreases.

Consistency: The number of servers in the main cluster determines the number of servers in the temporary clusters added during scaling out.

Search This Blog

DigitalDrafts