Monday, January 10, 2022

Snowflake - Overview

Snowflake 

Snowflake is a SaaS based solution:

o   Cloud based data warehouse

o   Benefit

§  There is no hardware (virtual or physical) to select, install, configure, or manage.

§  There is virtually no software to install, configure, or manage.

§  Self Manage handled by Snowflake:

·        maintenance,

·        management,

·        upgrades,

                                                  ·        tuning. 

Categorization of Snowflake 

v Snowflake Architecture

v Snowflake Eco-system

v Snowflake Architecture

          -         Storage Layer

o   Structure data

§  Row and column-based data

·        CSV

·        SQL based data

·        Etc.

o   Semi-structured data

§  JSON

§  XML

§  ORC

§  Etc

-              -         Organized data

o   Schemas

o   Tables

o   Etc.

-              -      Compute Layer / Query processing

o   Virtual Warehouse – cluster (vertical or horizontal scale) of compute resources/VNet

§  Loading data into or retrieve data from it

§  Scalabilities

·        Request grow cluster grows if load decrease cluster goes down

§  Number of servers in clusters

§  It can be Auto suspend – base don time inactive

§  or auto resume – based on activity started

§  Queue created when request comes in

·        Request will be processed when resources is ready to process

·        Etc.

o   Services Layer – managed by snowflake (usages multiple availabilities zones for high availabilities)

§  Manage overall Snowflake

·        Authentication and authorization users

·        Manages Sessions

·        Secure data

·        Query compilation

o   When query submitted,

o   Authenticate and authorized user

o   Create optimized data plan

§  Send execution to Virtual Warehouse

§  VWH allocation resources to perform operation and send the request to Storage layer  

§  Data retrieved and processes and send back to User

·        Optimization

·        Manages Virtual warehouse

§  Metadata store

·        Zero copy cloning

o   Instead of copy prod data

o   Not duplicate the production data

o   Retain connection to prod tables

o   To cloning environment

·        Time travel

o   Avoid deleting rows or deleting table

§  Query data in past and

§  Clone entire tables, Schemas, DB from specific time of period

§  Upto 90 days

·        Data Sharing

o   Make data available to another companies / org

o   No duplicate of data

o   No data will be replicated will be shared secured way

o   Build process to send the data and consumer can build process to consume the data

v Snowflake Eco-system

-                        -         Data Integration

o   ETL

§  BOOMI

§  DBT

§  Fivetran

§  Goggle Cloud (GCP)

§  Informatica

§  Pentaho

§  SAP

§  STICH

§  Ext.

-        -                      Advanced Analytics

o   Big Data, ML and Data science

§  DataBricks

§  Data Robot

§  BigSquid

§  Etc.

-         -                   Governance and security

o   DataDog

o   Satori

o   Alation

o   Etc.

-         -                   Business Intelligence

o   Analysis

o   Discover

o   Reporting ops or analytics reporting to leadership to help their decision

§  Data Visualize

·        TBLU

·        IBM

·        ADOBE

·        SAP

·        QLIK

·        LOOKER

·        Etc.

-       -                      Programming development

o   SNOW SQL

o   Snowflake UI - WorkSheet

o   DBEAVER

o   SeekWell

o   Agile Data Engine

-         -                   Native Programming

o   Interface

§  Python interface

§  PHP

o   Connectors

§  JDBC

§  ODBC

§  Etc.