Monday, January 10, 2022

Snowflake - Overview

Snowflake 

Snowflake is a SaaS based solution:

o   Cloud based data warehouse

o   Benefit

§  There is no hardware (virtual or physical) to select, install, configure, or manage.

§  There is virtually no software to install, configure, or manage.

§  Self Manage handled by Snowflake:

·        maintenance,

·        management,

·        upgrades,

                                                  ·        tuning. 

Categorization of Snowflake 

v Snowflake Architecture

v Snowflake Eco-system

v Snowflake Architecture

          -         Storage Layer

o   Structure data

§  Row and column-based data

·        CSV

·        SQL based data

·        Etc.

o   Semi-structured data

§  JSON

§  XML

§  ORC

§  Etc

-              -         Organized data

o   Schemas

o   Tables

o   Etc.

-              -      Compute Layer / Query processing

o   Virtual Warehouse – cluster (vertical or horizontal scale) of compute resources/VNet

§  Loading data into or retrieve data from it

§  Scalabilities

·        Request grow cluster grows if load decrease cluster goes down

§  Number of servers in clusters

§  It can be Auto suspend – base don time inactive

§  or auto resume – based on activity started

§  Queue created when request comes in

·        Request will be processed when resources is ready to process

·        Etc.

o   Services Layer – managed by snowflake (usages multiple availabilities zones for high availabilities)

§  Manage overall Snowflake

·        Authentication and authorization users

·        Manages Sessions

·        Secure data

·        Query compilation

o   When query submitted,

o   Authenticate and authorized user

o   Create optimized data plan

§  Send execution to Virtual Warehouse

§  VWH allocation resources to perform operation and send the request to Storage layer  

§  Data retrieved and processes and send back to User

·        Optimization

·        Manages Virtual warehouse

§  Metadata store

·        Zero copy cloning

o   Instead of copy prod data

o   Not duplicate the production data

o   Retain connection to prod tables

o   To cloning environment

·        Time travel

o   Avoid deleting rows or deleting table

§  Query data in past and

§  Clone entire tables, Schemas, DB from specific time of period

§  Upto 90 days

·        Data Sharing

o   Make data available to another companies / org

o   No duplicate of data

o   No data will be replicated will be shared secured way

o   Build process to send the data and consumer can build process to consume the data


  -      Consumption Layer

• Enables users to access, analyze, and visualize data stored in Snowflake. Conceptual layer built on top of Snowflake’s physical architecture (Storage, Compute, Cloud Services).
• Data access, users access data via SQL queries, views, or external tools. Use Snowflake’s web UI, SnowSQL CLI, or third-party tools (e.g., Tableau, Power BI, Looker). Secure access is managed via roles and privileges defined in the Cloud Services layer.
• Transform raw data into usable formats. Create views to simplify complex joins and aggregations. Use create Table As Select or materialized views for performance optimization. Apply data masking and row-level security for sensitive data.
• Build semantic layers for business logic, Define star or snowflake schemas for reporting.
• Capability: Generate insights through dashboards and reports. Connect BI tools to Snowflake using ODBC/JDBC or native connectors. Run scheduled queries for periodic reporting. Use SQL functions, window functions, and aggregations for deep analysis.
• Share data securely across teams or organizations. Use Snowflake Secure Data Sharing to share live data without copying. Collaborate with external partners or internal departments. Monitor usage and access via usage dashboards.
• Enable predictive modeling and data science workflows. integrate with platforms like DataRobot, Amazon SageMaker etc. Use Snowpark for writing code in Python, Java, or Scala directly in Snowflake. Perform feature engineering and model scoring within Snowflake.
Track performance and optimize queries. Use Query Profile to analyze execution plans. Monitor warehouse usage, query history, and costs. Apply clustering keys and caching for faster performance.

v Snowflake Eco-system

-                        -         Data Integration

o   ETL

§  BOOMI

§  DBT

§  Fivetran

§  Goggle Cloud (GCP)

§  Informatica

§  Pentaho

§  SAP

§  STICH

§  Ext.

-        -                      Advanced Analytics

o   Big Data, ML and Data science

§  DataBricks

§  Data Robot

§  BigSquid

§  Etc.

-         -                   Governance and security

o   DataDog

o   Satori

o   Alation

o   Etc.

-         -                   Business Intelligence

o   Analysis

o   Discover

o   Reporting ops or analytics reporting to leadership to help their decision

§  Data Visualize

·        TBLU

·        IBM

·        ADOBE

·        SAP

·        QLIK

·        LOOKER

·        Etc.

-       -                      Programming development

o   SNOW SQL

o   Snowflake UI - WorkSheet

o   DBEAVER

o   SeekWell

o   Agile Data Engine

-         -                   Native Programming

o   Interface

§  Python interface

§  PHP

o   Connectors

§  JDBC

§  ODBC

§  Etc.



Snowflake Summary:

Storage Layer

  • Function: Stores all data including tables and metadata.
  • How it Works: Data is automatically reorganized into compressed, columnar micro-partitions stored in cloud object storage (AWS S3, Azure Blob Storage, or Google Cloud Storage).
  • Key Feature: Decoupled from compute, enabling independent scaling. Data is immutable; updates create new micro-partitions.

Compute Layer
  • Function: Executes queries using virtual warehouses (clusters of CPU and memory).
  • How it Works: Queries are processed by virtual warehouses that can be started, stopped, or resized (from X-small to 6X-large).
  • Key Feature: Independent from storage. Scaling compute does not affect storage. Each query runs within a single warehouse.

Cloud Services Layer
  • Function: Acts as the control center, managing and coordinating all other layers.
  • How it Works: Handles authentication, access control, query parsing and optimization, transaction management, and metadata tracking.
  • Key Feature: Central entry point for all Snowflake operations and user interactions.

Consumption Layer
  • Function: Represents how data is accessed and used for analytics and reporting.
  • How it Works: Not a physical layer; built on top of Snowflake. Users interact with data via views, reports, and applications.
  • Key Feature: Transforms raw data into actionable insights through simplified and joined datasets.

No comments:

Post a Comment