Snowflake
Snowflake is a SaaS based solution, it is cloud based data warehouse
- Benefit
o
There
is no hardware (virtual or physical) to select, install, configure, or manage.
o
There
is virtually no software to install, configure, or manage.
o
Self-Manage
handled by Snowflake:
§ maintenance, management, upgrades, tuning.
Categorization
of Snowflake
1. Snowflake Architecture
·
Storage
Layer
·
Compute
Layer / Query processing
·
Consumption
Layer
2. Snowflake Eco-system
1- Snowflake
Architecture
·
Storage Layer
§ Structure data: Row and column-based data: CSV, SQL based data, Etc.
§ Semi-structured data: JSON, XML, ORC, etc
§ Organized data: Schemas, Tables, etc.
·
Compute Layer / Query processing
§ Virtual Warehouse – cluster (vertical or horizontal
scale) of computes resources/VNet
· Loading data into or retrieve data
from it
· Scalabilities: request grow cluster
grows if load decrease cluster goes down
· Number of servers in clusters
· It can be Auto suspended – based on
time inactive
· or auto resume – based on activity
started
· Queue created when request comes in -
request will be processed when resources is ready to process
§ Services Layer – managed by snowflake (usages
multiple availabilities zones for high availabilities)
·
Manage
overall Snowflake
o
Authentication
and authorization users
o
Manages
Sessions
o
Secure
data
o
Query
compilation
§ When query submitted,
§ Authenticate and authorized user
§ Create optimized data plan
· Send execution to Virtual Warehouse
· VWH allocation resources to perform
operation and send the request to Storage layer
· Data retrieved and processed and sent
back to User
o
Optimization
o
Manages
Virtual warehouse
· Metadata store
o
Zero
copy cloning
§ Instead of copying prod data
§ Not duplicate the production data
§ Retain connection to prod tables
§ To cloning environment
o
Time
travel
§ Avoid deleting rows or deleting table
· Query data in past and
· Clone entire tables, Schemas, DB from
specific time of period
· Up to 90 days
o
Data
Sharing
§ Make data available to another
companies / org
§ No duplication of data
§ No data will be replicated will be
shared secured way
§ Build process to send the data and
consumer can build process to consume the data
§ Consumption Layer
· Enables users to access, analyze, and
visualize data stored in Snowflake. Conceptual layer built on top of
Snowflake’s physical architecture (Storage, Compute, Cloud Services).
· Data access, users access data via SQL
queries, views, or external tools. Use Snowflake’s web UI, SnowSQL CLI, or
third-party tools (e.g., Tableau, Power BI, Looker). Secure access is managed
via roles and privileges defined in the Cloud Services layer.
· Transform raw data into usable
formats. Create views to simplify complex joins and aggregations. Use create
Table As Select or materialized views for performance optimization. Apply data
masking and row-level security for sensitive data.
· Build semantic layers for business
logic, Define star or snowflake schemas for reporting.
· Capability: Generate insights through
dashboards and reports. Connect BI tools to Snowflake using ODBC/JDBC or native
connectors. Run scheduled queries for periodic reporting. Use SQL functions,
window functions, and aggregations for deep analysis.
· Share data securely across teams or
organizations. Use Snowflake Secure Data Sharing to share live data without
copying. Collaborate with external partners or internal departments. Monitor
usage and access via usage dashboards.
· Enable predictive modeling and data
science workflows. integrate with platforms like DataRobot, Amazon SageMaker
etc. Use Snowpark for writing code in Python, Java, or Scala directly in
Snowflake. Perform feature engineering and model scoring within Snowflake.
· Track performance and optimize
queries. Use Query Profile to analyze execution plans. Monitor warehouse usage,
query history, and costs. Apply clustering keys and caching for faster
performance.
2- Snowflake Eco-system
- Data Integration
o ETL: BOOMI, DBT, Fivetran, Goggle Cloud (GCP), Informatica, Pentaho, SAP, STICH
- Advanced Analytics
o Big Data, ML and Data science: DataBricks, Data Robot, BigSquid
- Governance and security
o DataDog, Satori, Alation etc.
- Business Intelligence
o
Analysis,
Discover,
o
Reporting
ops or analytics reporting to leadership to help their decision, data Visualize
· TBLU, IBM, ADOBE, SAP, QLIK, LOOKER
- Programming development: SNOW SQL, Snowflake UI - WorkSheet, DBEAVER, SeekWell, Agile Data Engine
- Native Programming
o Interface: Python interface, § PHP
o Connectors: JDBC, § ODBC
Snowflake
Summary:
Storage
Layer
- Function: Stores all data including
tables and metadata.
- How it Works: Data is automatically
reorganized into compressed, columnar micro-partitions stored in cloud
object storage (AWS S3, Azure Blob Storage, or Google Cloud Storage).
- Key Feature: Decoupled from compute,
enabling independent scaling. Data is immutable; updates create new
micro-partitions.
Compute
Layer
- Function: Executes queries using virtual
warehouses (clusters of CPU and memory).
- How it Works: Queries are processed by
virtual warehouses that can be started, stopped, or resized (from X-small
to 6X-large).
- Key Feature: Independent from storage.
Scaling compute does not affect storage. Each query runs within a single
warehouse.
Cloud
Services Layer
- Function: Acts as the control center,
managing and coordinating all other layers.
- How it Works: Handles authentication, access
control, query parsing and optimization, transaction management, and
metadata tracking.
- Key Feature: Central entry point for all
Snowflake operations and user interactions.
Consumption
Layer
- Function: Represents how data is accessed
and used for analytics and reporting.
- How it Works: Not a physical layer; built on
top of Snowflake. Users interact with data via views, reports, and
applications.
- Key Feature: Transforms raw data into
actionable insights through simplified and joined datasets.