Snowflake
Snowflake is a SaaS based solution:
o
Cloud
based data warehouse
o
Benefit
§
There
is no hardware (virtual or physical) to select, install, configure, or manage.
§
There
is virtually no software to install, configure, or manage.
§
Self
Manage handled by Snowflake:
·
maintenance,
·
management,
·
upgrades,
· tuning.
Categorization of
Snowflake
v
Snowflake
Architecture
v
Snowflake Eco-system
v Snowflake Architecture
- Storage Layer
o
Structure
data
§ Row and column-based data
·
CSV
·
SQL
based data
·
Etc.
o
Semi-structured
data
§ JSON
§ XML
§ ORC
§ Etc
- - Organized data
o
Schemas
o
Tables
o
Etc.
- - Compute Layer / Query processing
o
Virtual
Warehouse – cluster (vertical or horizontal scale) of compute resources/VNet
§ Loading data into or retrieve data
from it
§ Scalabilities
·
Request
grow cluster grows if load decrease cluster goes down
§ Number of servers in clusters
§ It can be Auto suspend – base don
time inactive
§ or auto resume – based on activity
started
§ Queue created when request comes in
·
Request
will be processed when resources is ready to process
·
Etc.
o
Services
Layer – managed by snowflake (usages multiple availabilities zones for high availabilities)
§ Manage overall Snowflake
·
Authentication
and authorization users
·
Manages
Sessions
·
Secure
data
·
Query
compilation
o
When
query submitted,
o
Authenticate
and authorized user
o
Create
optimized data plan
§ Send execution to Virtual Warehouse
§ VWH allocation resources to perform
operation and send the request to Storage layer
§ Data retrieved and processes and send
back to User
·
Optimization
·
Manages
Virtual warehouse
§ Metadata store
·
Zero
copy cloning
o
Instead
of copy prod data
o
Not
duplicate the production data
o
Retain
connection to prod tables
o
To
cloning environment
·
Time
travel
o
Avoid
deleting rows or deleting table
§ Query data in past and
§ Clone entire tables, Schemas, DB from
specific time of period
§ Upto 90 days
·
Data
Sharing
o
Make
data available to another companies / org
o
No
duplicate of data
o
No
data will be replicated will be shared secured way
o Build process to send the data and consumer can build process to consume the data
- Consumption Layer
• Data access, users access data via SQL queries, views, or external tools. Use Snowflake’s web UI, SnowSQL CLI, or third-party tools (e.g., Tableau, Power BI, Looker). Secure access is managed via roles and privileges defined in the Cloud Services layer.
• Transform raw data into usable formats. Create views to simplify complex joins and aggregations. Use create Table As Select or materialized views for performance optimization. Apply data masking and row-level security for sensitive data.
• Build semantic layers for business logic, Define star or snowflake schemas for reporting.
• Capability: Generate insights through dashboards and reports. Connect BI tools to Snowflake using ODBC/JDBC or native connectors. Run scheduled queries for periodic reporting. Use SQL functions, window functions, and aggregations for deep analysis.
• Share data securely across teams or organizations. Use Snowflake Secure Data Sharing to share live data without copying. Collaborate with external partners or internal departments. Monitor usage and access via usage dashboards.
• Enable predictive modeling and data science workflows. integrate with platforms like DataRobot, Amazon SageMaker etc. Use Snowpark for writing code in Python, Java, or Scala directly in Snowflake. Perform feature engineering and model scoring within Snowflake.
• Track performance and optimize queries. Use Query Profile to analyze execution plans. Monitor warehouse usage, query history, and costs. Apply clustering keys and caching for faster performance.
v Snowflake Eco-system
- - Data Integration
o
ETL
§ BOOMI
§ DBT
§ Fivetran
§ Goggle Cloud (GCP)
§ Informatica
§ Pentaho
§ SAP
§ STICH
§ Ext.
- - Advanced Analytics
o
Big
Data, ML and Data science
§ DataBricks
§ Data Robot
§ BigSquid
§ Etc.
- - Governance and security
o
DataDog
o
Satori
o
Alation
o
Etc.
- - Business Intelligence
o
Analysis
o
Discover
o
Reporting
ops or analytics reporting to leadership to help their decision
§ Data Visualize
·
TBLU
·
IBM
·
ADOBE
·
SAP
·
QLIK
·
LOOKER
·
Etc.
- - Programming development
o
SNOW
SQL
o
Snowflake
UI - WorkSheet
o
DBEAVER
o
SeekWell
o
Agile
Data Engine
- - Native Programming
o
Interface
§ Python interface
§ PHP
o
Connectors
§ JDBC
§ ODBC
§ Etc.
Snowflake Summary:
Storage Layer
- Function: Stores all data including tables and metadata.
- How it Works: Data is automatically reorganized into compressed, columnar micro-partitions stored in cloud object storage (AWS S3, Azure Blob Storage, or Google Cloud Storage).
- Key Feature: Decoupled from compute, enabling independent scaling. Data is immutable; updates create new micro-partitions.
Compute Layer
- Function: Executes queries using virtual warehouses (clusters of CPU and memory).
- How it Works: Queries are processed by virtual warehouses that can be started, stopped, or resized (from X-small to 6X-large).
- Key Feature: Independent from storage. Scaling compute does not affect storage. Each query runs within a single warehouse.
Cloud Services Layer
- Function: Acts as the control center, managing and coordinating all other layers.
- How it Works: Handles authentication, access control, query parsing and optimization, transaction management, and metadata tracking.
- Key Feature: Central entry point for all Snowflake operations and user interactions.
Consumption Layer
- Function: Represents how data is accessed and used for analytics and reporting.
- How it Works: Not a physical layer; built on top of Snowflake. Users interact with data via views, reports, and applications.
- Key Feature: Transforms raw data into actionable insights through simplified and joined datasets.
No comments:
Post a Comment