Snowflake
Snowflake is a SaaS based solution:
o
Cloud
based data warehouse
o
Benefit
§
There
is no hardware (virtual or physical) to select, install, configure, or manage.
§
There
is virtually no software to install, configure, or manage.
§
Self
Manage handled by Snowflake:
·
maintenance,
·
management,
·
upgrades,
· tuning.
Categorization of
Snowflake
v
Snowflake
Architecture
v
Snowflake Eco-system
v Snowflake Architecture
- Storage Layer
o
Structure
data
§ Row and column-based data
·
CSV
·
SQL
based data
·
Etc.
o
Semi-structured
data
§ JSON
§ XML
§ ORC
§ Etc
- - Organized data
o
Schemas
o
Tables
o
Etc.
- - Compute Layer / Query processing
o
Virtual
Warehouse – cluster (vertical or horizontal scale) of compute resources/VNet
§ Loading data into or retrieve data
from it
§ Scalabilities
·
Request
grow cluster grows if load decrease cluster goes down
§ Number of servers in clusters
§ It can be Auto suspend – base don
time inactive
§ or auto resume – based on activity
started
§ Queue created when request comes in
·
Request
will be processed when resources is ready to process
·
Etc.
o
Services
Layer – managed by snowflake (usages multiple availabilities zones for high availabilities)
§ Manage overall Snowflake
·
Authentication
and authorization users
·
Manages
Sessions
·
Secure
data
·
Query
compilation
o
When
query submitted,
o
Authenticate
and authorized user
o
Create
optimized data plan
§ Send execution to Virtual Warehouse
§ VWH allocation resources to perform
operation and send the request to Storage layer
§ Data retrieved and processes and send
back to User
·
Optimization
·
Manages
Virtual warehouse
§ Metadata store
·
Zero
copy cloning
o
Instead
of copy prod data
o
Not
duplicate the production data
o
Retain
connection to prod tables
o
To
cloning environment
·
Time
travel
o
Avoid
deleting rows or deleting table
§ Query data in past and
§ Clone entire tables, Schemas, DB from
specific time of period
§ Upto 90 days
·
Data
Sharing
o
Make
data available to another companies / org
o
No
duplicate of data
o
No
data will be replicated will be shared secured way
o Build process to send the data and consumer can build process to consume the data
v Snowflake Eco-system
- - Data Integration
o
ETL
§ BOOMI
§ DBT
§ Fivetran
§ Goggle Cloud (GCP)
§ Informatica
§ Pentaho
§ SAP
§ STICH
§ Ext.
- - Advanced Analytics
o
Big
Data, ML and Data science
§ DataBricks
§ Data Robot
§ BigSquid
§ Etc.
- - Governance and security
o
DataDog
o
Satori
o
Alation
o
Etc.
- - Business Intelligence
o
Analysis
o
Discover
o
Reporting
ops or analytics reporting to leadership to help their decision
§ Data Visualize
·
TBLU
·
IBM
·
ADOBE
·
SAP
·
QLIK
·
LOOKER
·
Etc.
- - Programming development
o
SNOW
SQL
o
Snowflake
UI - WorkSheet
o
DBEAVER
o
SeekWell
o
Agile
Data Engine
- - Native Programming
o
Interface
§ Python interface
§ PHP
o
Connectors
§ JDBC
§ ODBC
§ Etc.