AWS SRE Strategy
Site reliability engineering (SRE) is the practice of using software
tools to automate IT infrastructure tasks such as system management and
application monitoring. Organizations use SRE to ensure their software
applications remain reliable amidst frequent updates from development teams.
SRE especially improves the reliability of scalable software systems because
managing a large system using software is more sustainable than manually
managing hundreds of machines.
Why is site reliability engineering
important?
Site reliability describes the stability and quality of service that an
application offers after being made available to end users. Software maintenance
sometimes affects software reliability if technical issues go undetected. For
example, when developers make new changes, they might inadvertently impact the
existing application and cause it to crash for certain use cases.
The following are some benefits of
site reliability engineering (SRE) practices:
Improved collaboration:
SRE improves collaboration between
development and operations teams. Developers often have to make rapid changes
to an application to release new features or fix critical bugs. On the other
hand, the operations team has to ensure seamless service delivery. Hence, the
operations team uses SRE practices to closely monitor every update and promptly
respond to any issues that arise due to changes.
Enhanced customer experience:
Organizations use an SRE model to
ensure software errors do not impact the customer experience. For example,
software teams use SRE tools to automate the software development lifecycle.
This reduces errors, meaning the team can prioritize new feature development
over bug fixes.
Improved operations planning:
SRE teams accept that there's a realistic chance for software to
fail. Therefore, teams plans for the appropriate incident response to minimize
the impact of downtime on the business and end users. They can also better
estimate the cost of downtime and understand the impact of such incidents on
business operations
The following are some
key principles of site reliability engineering (SRE):
Application monitoring:
SRE teams accept that errors are a
part of the software deployment process. Instead of striving for a perfect
solution, they monitor software performance in terms of service-level
agreements (SLAs), service-level indicators (SLIs), and service-level
objectives (SLOs). They observe and monitor performance metrics after deploying
the application in production environments.
Gradual change
implementation:
SRE practices encourage
the release of frequent but small changes to maintain system reliability. SRE
automation tools use consistent but repeatable processes to do the following:
o Reduce risks due to
changes
o Provide feedback loops
to measure system performance
o Increase speed and
efficiency of change implementation
Automation for reliability
improvement:
SRE uses policies and
processes that embed reliability principles in every step of the delivery
pipeline. Some strategies that automatically resolve problems include the
following:
o Developing quality gates
based on service-level objectives to detect issues earlier
o Automating build testing
using service-level indicators
o Making architectural
decisions that ensure system resiliency at the outset of software development
Site reliability
engineering (SRE) teams measure the quality-of-service delivery and reliability
using the following metrics.:
Service-level objectives:
Service-level objectives (SLOs) are
specific and quantifiable goals that you are confident the software can achieve
at a reasonable cost to other metrics, such as the following:
o Uptime, or the time a
system is in operation
o System throughput
o System output
o Download rate, or the
speed at which the application loads
An SLO promises delivery
through the software to the customer. For example, you set a 99.95% uptime SLO
for your company's food delivery app.
Service-level indicators:
Service-level indicators (SLIs) are
the actual measurements of the metric an SLO defines. In real-life situations,
you might get values that match or differ from the SLO. For example, your
application is up and running 99.92% of the time, which is lower than the promised
SLO.
Service-level agreements:
The service-level agreements (SLAs)
are legal documents that state what would happen when one or more SLOs are not
met. For example, the SLA states that the technical team will resolve your
customer's issue within 24 hours after a report is received. If your team could
not resolve the problem within the specified duration, you might be obligated
to refund the customer.
Error budgets:
Error budgets are the noncompliance
tolerance for the SLO. For example, an uptime of 99.95% in the SLO means that
the allowed downtime is 0.05%. If the software downtime exceeds the error
budget, the software team devotes all resources and attention to stabilize the
application.
Availability Definition
High Availability: Run instances for the same application across
multiple Availability Zone (Auto Scaling Group multi-AZ and Load
Balancer multi-AZ). High Availability usually goes hand in hand with horizontal
scaling. High availability, running your application / system in at least 2
data centers (== Availability Zones). The goal of high availability is to
survive a data center loss. The high availability can be passive
(Multi AZ for example). The high availability can be active (for
horizontal scaling)
The following elements help you
implement highly available systems:
Redundancy: ensuring that critical
system components have another identical component with the same data, that can
take over in case of failure. Horizontally scale your application to improve
reliability. Dynamically acquire computing resources to meet the demand you are
monitoring.
Monitoring: identifying problems
in production systems that may disrupt or degrade service4. Monitor the demand,
capacity, utilization and size of our application using tool.
Failover: the ability to switch
from an active system component to a redundant component in case of failure,
imminent failure, degraded performance or functionality.
For a typical microservices architecture, the focus for disaster
recovery should be on the downstream services that maintain the state of the
application. For example, these can be file systems, databases, or queues, for
example. When creating a disaster recovery strategy, organizations most
commonly plan for the recovery time objective and recovery point objective.
Recovery time objective is the maximum
acceptable delay between the interruption of service and restoration of
service. This objective determines what is considered an acceptable time window
when service is unavailable and is defined by the organization.
Recovery point objective is the maximum
acceptable amount of time since the last data recovery point. This objective
determines what is considered an acceptable loss of data between the last
recovery point and the interruption of service and is defined by the
organization
Automatic recovery and Failback: mitigate disruptions,
the ability to switch back from a redundant component to the primary active
component, when it has recovered from failure.
Deployments / Automate Change: Use automation to
deploy, develop, and modify your application. Manual steps lead to poor
results, reduce toil wherever possible. Implement change management in a way
that de-conflicts potential changes. The ability to deploy applications
with minimum downtime (using Terragrunt/Terraform).
Test: Create test failure
and recovery procedures.
Availabilities best practices
|
Description
|
Service Implementation
|
Usage
|
Load Balancer
|
Distributes incoming
connections across a group of Servers/Services and distribute traffic between
them. While
designing applications, use LB when possible. This is the first step to
achieve high availabilities.
|
Network Load Balancer,
TCP/TLS/ UDP (it supports millions of requests and work at layer 4).
A load balancer serves as the single point of contact for clients.
Clients send requests to the load balancer, and the load balancer sends them
to targets, in one or more Availability Zones. To configure your load
balancer, you create target groups, and then register targets with your
target groups.
|
Load balancer is most
effective if you ensure that each enabled Availability Zone has at least one
registered target. This should be configured for more than one AZ
Create two AZ, enable the
Availability Zones for the load balancer, load balancer starts routing
traffic to the registered targets in that Availability Zone. Ensure that each
enabled Availability Zone has at least one registered target. Load
Balancer endpoints should be in the public subnet so make sure one
public subnet in each of the Availability Zones. Create Security group and
associate security groups with load balancer. Allow traffic on the port in
both directions.
LB will keep checking health of the
targets, if target is unhealthy, LB stop sending request to unhealthy
targets. Make sure EKS cluster have been created before LB configure.
- Create
two AZ and enable for LB
- Configure
a target group
- Register
targets in an Availability Zone
- Configure
a load balancer and a listener
- Create
security groups for your ALB
- Associate
security groups with load balancer.
|
Compute
|
Compute services are also known as
Infrastructure-as-a-Service (IaaS). AWS Compute, provides a virtual
server instance and storage and APIs that let users migrate workloads to a
virtual machine.
|
Amazon EC2 and other services that
let you provision computing resources, provide high availability features
such as load balancing, auto-scaling and provisioning across Amazon
Availability Zones (AZ), representing isolated parts of an Amazon data
center.
|
When consumers may huge volume of
requests within a defined time period and system have been deployed in
multiple regions with multiple AZ
AWS EC2 Auto Scaling groups are configured to
launch instances that automatically join Kubernetes cluster.
- Create two AZs in the same
region
- Spread worker nodes and
workload across multiple AZs
- Create EKS cluster
- Create EC2 Auto Scaling
groups will attach to EKS
AWS Compute Service Level Agreement
- Region-Level SLA - at
leas99.99%
- Instance-Level SLA - at
least 99.5%
|
Databases
|
NoSQL DB, key-value NoSQL database
designed to run high-performance applications at any scale
|
Creates replica set, a group of Mongod instances that
hold the same data. The purpose of replication is to ensure high
availability, in case one of the servers goes down. In the case of replica
sets, the reference deployment launches multiple servers in respective different Availability
Zones. When a primary instance fails, one of the secondary instances from
another Availability Zone becomes the new primary node, thereby guaranteeing
automatic failover.
|
For new platforms, creates replication
deployed in multiple Availabilities Zone
|
Amazon EKS
|
Amazon Elastic
Kubernetes Service (Amazon EKS), It runs Kubernetes control and data
plane instances across multiple Availability Zones to ensure high
availability in an AWS Region.
|
Amazon EKS
automatically detects and replaces unhealthy control plane instances, and it
provides automated version upgrades and patching for them. This control plane
consists of at least two API server nodes (master cluster) and three etc.
nodes (server node) that run across three Availability Zones within a region.
Amazon EKS uses the architecture of AWS Regions to maintain high
availability.
|
Managed node
groups automate the provisioning and lifecycle management of EC2 nodes.
Use the EKS API (Terraform), to create, scale, and upgrade managed nodes
One VPC and subnets must
exist or can be created using terraform template before creating an Amazon
EKS cluster. Each cluster runs in its own. Create master node for each AZ.
Amazon Elastic Kubernetes Service (EKS), the maximum number of pods per node
depends on the node type and ranges from 4 to 737. In this solution
number of pods will be define inside the terraform Script. A new Kubernetes
version can get update EKS cluster to the latest version.
Delete the resources associated
with EKS cluster If not has been used.
- Create 2
AZ, create a VPC for AZ
- Create
two or more Subnets in a single VPC
- Create
EKS cluster, use Kubernetes Cluster Auto-scaler to scale nodes
- Create
master cluster and server node and create POD under node group
- Create
min 2 nodes in 2 separate availability zone
- Always
balanced EKS cluster
|
AWS Lambda
|
AWS Lambda is a serverless
compute service that runs your code in response to events and automatically
manages the underlying compute resources for you
|
Lambda runs your
function in multiple Availability Zones to ensure that it is available to
process events in case of a service interruption in a single zone.
|
configure your
function to connect to a virtual private cloud (VPC) in your account, specify
subnets in multiple Availability Zones to ensure high availability.
|
Apache Kafka
|
Apache Kafka is an
open-source distributed event streaming platform. To achieve high
availabilities with a multi-region Kafka cluster
|
This solution creates
a cluster that spans two different regions, and if the main availability
zones is unavailable for some reason, the service automatically changes to
work on the other availability zones.
|
This ensures that
other copies are available even if an availability zone experiences failures.
One typical deployment
pattern (all active) is in a single AWS Region with 2 Availability Zones
(AZs). One Kafka cluster is deployed in each AZ along with Apache Zookeeper
and Kafka producer and consumer instances
AWS and Kafka cluster deployment:
- Kafka producers
and Kafka cluster are deployed on each AZ.
- Data is
distributed evenly across two Kafka clusters.
- Kafka
consumers aggregate data from different Kafka clusters.
|
MSK
|
Amazon Managed
Streaming for Apache Kafka, Amazon MSK provides the control-plane operations,
such as those for creating, updating, and deleting clusters.
It usages Apache Kafka
data-plane operations, It runs open-source versions of Apache Kafka
|
Amazon MSK
automatically provisions, configures, and manages your Apache Kafka cluster
operations and Apache Zookeeper nodes.
All clusters are
distributed across multiple AZs (three is the default), are supported by
Amazon MSK’s service-level agreement, and are supported by automated
systems that detect and respond to issues within cluster infrastructure and
Apache Kafka software.
|
If a component fails,
Amazon MSK automatically replaces it without downtime to your applications.
Amazon MSK manages the availability of Apache ZooKeeper nodes so don’t need
to start, stop, or directly access the nodes. It also automatically deploys
software patches as needed to keep your cluster up to date and running
smoothly
|
Service Mesh (Gloo Mesh)
|
Gloo Mesh is
enterprise Istio with multi-cluster and multi-mesh management capabilities
across multiple clusters and VMs. It is controlling, securing and observing
the traffic flow between your micro services, regardless of where they are
running.
|
A Gloo Mesh setup
consists of one management cluster and one or more workload clusters that run
services meshes which are registered with and managed by the management
cluster. The management cluster serves as the management plane, and the
workload clusters serve as the data plane. Gloo Mesh can discover
meshes/workloads, establish federated identity, enable global traffic routing
and load balancing.
|
Gloo Mesh can discover
services, coordinate service meshes, configure, and observe behavior,
federate policies, and enforce security consistently.
Istio software for
enabling the core building blocks of external authorization and request
routing. Load Balancer endpoints provide access for clients to the
application over the internet, allow reachability for Pods deployed in EKS
from where the EnvoyProxy, oauth2-proxy, and application are
pulled. Install Istio agents on EKS Cluster.
- Configure
Virtual Service to route traffic
- Enable
OIDC Provider
- Using
Ingress Gateway an Envoy Reverse Proxy transparently relaying authorization
requests
- Create
Namespace for Deployment
- Deploy
Envoy Proxy in dedicated namespace
- Deploy
OIDC proxies in dedicated namespace
- Istio
services, deployed in the istio-system Kubernetes namespace
|
AWS API Gateway
|
Amazon API Gateway is
an AWS service for creating, publishing, maintaining, monitoring, and
protecting REST APIs at any scale.
|
Supported by Route 53
routing policies, direct traffic from a APIs to more than one infrastructure
in different regions, the service allows you to balance requests according to
the infrastructure capacity of each region.
|
To prevent your APIs
from being overwhelmed by too many requests, API Gateway throttles requests
to your APIs. Specifically, API Gateway sets a limit on a steady-state rate
and a burst of request submissions against all APIs in your account.
|
Mongo DB
|
MongoDB is an open
source, NoSQL database that provides support for JSON-styled,
document-oriented storage systems.
|
It supports a
flexible data model that enables you to store data of any structure, and
provides a rich set of features, including full index support, sharding
(distribution of data across multiple nodes), and replication (Creates
Replica set in different Availability Zones).
|
Self Service
Deployment: Build a MongoDB cluster by automating configuration and
deployment tasks. The Quick Start reference deployment a
self-service deployment of the MongoDB replica set cluster.
|
MongoDB Atlas
|
MongoDB Atlas is a
global cloud document database service for modern applications. Deploying a fully
managed MongoDB helps to ensure availability, scalability, and security
compliance.
|
AWS managed, full
managed DB, use MongoDB Atlas instead of deploying this Quick
Start. MongoDB Atlas creates a new VPC for your managed databases and
automates potentially time-consuming administration tasks such as managing,
monitoring, and backing up your MongoDB deployments.
|
Database replica sets
(primary and secondary), Failover of a primary replica, deploying Atlas in
multiple cloud zones and cloud providers, adjusting the level of availability
guarantees using write and read concerns
|
Amazon ECS
|
is a regional
service that simplifies running containers in a highly available manner
across multiple Availability Zones within an AWS Region.
|
Amazon ECS
includes multiple scheduling strategies that place containers across your
clusters based on your resource needs (for example, CPU or RAM) and
availability requirements.
|
Single points of failure (SPOF)
are commonly eliminated with an N+1 or 2N redundancy configuration, where N+1
is achieved via load balancing among active–active nodes, and 2N is
achieved by a pair of nodes in active–standby configuration.
Using a scalable technique, load
balanced cluster or assuming an active–standby pair.
|
Dynamo DB
|
Amazon Dynamo DB is a fast and
flexible NoSQL database service boasting high availability, high durability.
Dynamo DB enables customers to offload the administrative burdens (hardware
provisioning, setup and configuration, throughput capacity planning,
replication, software patching, or cluster scaling) of operating and scaling
distributed databases to AWS.
|
|
Dynamo DB designed internally to automatically partition
data, replication and incoming traffic across multiple partitions. Partitions
are stored on numerous backend servers distributed across 2 availability
zones within a single region.
- Use Global tables: replicate
Dynamo DB tables automatically in selected AWS Regions and multi-master
read/write capability with eventual consistency
- Use Dynamo DB Accelerator
(DAX): DAX is an in-memory caching service (10x faster than Dynamo
DB).
- Configure Client wait: Change
Dynamo DB clients wait time to 3 seconds for a response before timeout.
AWS Service Level Agreements (SLAs):
- AWS promises a monthly uptime
percentage of 99.99% for Dynamo DB
|
Scalability Definition
The ability to
workload to perform its agreed function
when load or scope changes. An application / system can handle greater loads by
adapting. Scalability is linked but different to High Availability. There are
two kinds of scalability:
- Vertical Scalability
- It is increasing the size of the instance.
There’s usually a limit to how much you can vertically scale (hardware
limit)
- Vertical Scaling: Increase instance size
(= scale up / down)
- Horizontal Scalability (= elasticity)
- increasing the number of instances / systems
for your application. Horizontal scaling implies distributed systems.
This is very common for web applications / modern applications. It’s
easy to horizontally scale.
- Horizontal Scaling: Increase number of
instances (= scale out / in)
- Auto Scaling
- Types
of Autoscaling - In AWS, a scaling plan is a set of instructions for
scaling up or scaling down your resources. Use a scaling
plan to configure auto scaling for related or associated
scalable resources in a matter of minutes. First, determine the consistency
of usage patterns, as well as the frequency and intensity of traffic
spikes. Then define your priorities.
·
Demand/Reactive
Scaling: When using a reactive
autoscaling method, resources are scaled up and down in response to surges in
traffic. Demand-based scaling is highly responsive to fluctuating traffic and
helps accommodate traffic spikes you cannot predict.
·
Predictive Scaling: A predictive autoscaling method uses machine learning and
artificial intelligence tools to evaluate traffic loads and anticipate when
you’ll need more or fewer resources. combine AWS Auto Scaling with Amazon EC2
Auto Scaling to scale resources throughout many applications with predictive
scaling. This includes three sub-options:
o Load Forecasting: This predictive method analyzes history for up to 14 days to
forecast what demand for the following two days. Updated every day, the data is
created to reflect one-hour intervals.
Scheduled Scaling Actions: This option adds or removes resources according
to a load forecast. This keeps resource use stable and set at your pre-defined
value.
o Maximum Capacity Behavior: Designate a minimum and a maximum capacity value for every
resource, and AWS Auto Scaling will keep each resource within that range. This
gives AWS some flexibility within set parameters. And, you can control if
applications can add more resources when demand is forecasted to be above
maximum capacity
·
Scheduled Scaling: Users may choose the time range based on which additional
resources will be added. Scheduled autoscaling is a hybrid approach that
operates in real-time, predicts known changes in traffic loads, and responds to
such changes at predetermined intervals. Scaling events can be set to occur
automatically at a certain date and time. This is especially helpful in
situations where you can accurately forecast demand. What’s different about
this strategy is that following a schedule predicts the number of available
resources at a given time in advance
·
Manual Scaling – In Manual Scaling, the number of instances is manually
adjusted. You can manually increase or decrease the number of instances through
a CLI or console. Manual Scaling is a good choice when your user doesn’t need
automatic Scaling.
·
Dynamic Scaling – This is yet another type of Auto Scaling in which the
number of EC2 instances is changed automatically depending on the signals
receive. Dynamic Scaling is a good choice when there is a high volume of
unpredictable traffic.
- AWS Cluster Autoscaler: Responsible
for ensuring that our cluster has enough nodes to schedule your pods
without wasting resources
·
Scale Up Event: CA Watches for pods in pending state due to insufficient resources and
it creates new worker nodes and schedules pods on those worker nodes.
·
scaling In event: Autoscaler CA Watches for nodes which are underutilized and terminate
those nodes. it saves wastage of resources.
Scaling best practices
|
Description
|
When to use?
|
Async
over Sync
|
While
designing applications, use asynchronous communication across assets, when
possible. This is the first step in scaling individual assets as per usage.
|
Distributed
architecture with two or more assets communicating with each other to share
data.
|
Caching
|
Make
use of caching services to avoid unnecessary scaling for repetitive requests.
|
When
consumers may request for the same data repeatedly within a defined time
period.
|
Monitoring
|
Every
asset should follow the Observability best practices, and by default
incorporate monitoring of CPU, Memory, IO and Network usage.
|
For
new platforms, incorporate observability practices mentioned in the previous
page.
|
Databases
|
NoSQL
DB, key-value NoSQL database designed to run high-performance applications at
any scale
|
Creates
replica set, a group of Mongod instances that hold the same
data. The purpose of replication is to ensure high availability, in case one
of the servers goes down. In the case of replica sets, the reference
deployment launches multiple servers in respective
different Availability Zones. It also provides failover
|
Dynamo
DB
|
Amazon
Dynamo DB is a fast and flexible NoSQL database service boasting high
availability, high durability. Dynamo DB enables customers to offload the
administrative burdens (hardware provisioning, setup and configuration,
throughput capacity planning, replication, software patching, or cluster
scaling) of operating and scaling distributed databases to AWS.
|
Configure
auto scaling in DynamoDB. Set the minimum and maximum levels of read and
write capacity in addition to the target utilization percentage.
Use
database replication in different AZ.
|
Replication
|
Replicate
databases for recovery as well as to off load reads to multiple instances.
When necessary, create a READ replica of the transactional database.
When
it makes a valid case, implement CQRS pattern to make the best use
of data and resources.
|
When
read-only consumers need a copy of the transactional data in the same or a
different format, with high throughput/volume SLA.
|
Prefer
Eventual consistency (BASE) over ACID
|
A
BASE data store values availability (since that’s important for scale), but
it doesn’t offer guaranteed consistency of replicated data at write time.
|
In
cases where data can be momentarily obsolete, it is advisable to prefer BASE
database over ACID database. This helps in scaling databases horizontally.
|
Sharding
/ tenancy
|
Split
the application/databases (rows or table wise) by function so they can scale
individually.
|
When
different functions or tenants have different SLA and transactional needs.
Your application may have compliance requirements to segregate data owned by
different tenants, but if the scaling requirements vary based on tenant, then
apply this principle.
|
Canary
deployments
|
The ability to deploy applications
with minimum downtime (Terragrunt/Terraform). Roll out new
versions only to a small subset of the servers, and redirect users to new
servers, without bringing the entire application down. As users migrate to
the new version, implement autoscaling methods for both new and old
infrastructure. Refer Canary deployments.
|
When
application has high availability SLO and almost zero downtime expectations.
Also helps roll out features quickly and rollback as needed, helping with stability.
|
Rollback
|
Incorporate
the ability to rollback a specific feature of an application, from source
control. As part of the rollback, any infrastructure template (Terraform
etc.,) associated with the previous version of the application, should also
be executed in order to deploy the resources accordingly.
|
Tag
each and every asset with the release version information, to be able to
rollback as needed.
|
Load
Balancer
|
Distributes
incoming connections across a group of Servers/Services and distribute
traffic between them. While designing applications, use LB when possible.
This is the first step to achieve high availabilities.
The
Network Load Balancer is API-compatible with the Application Load Balancer
Use
Load balancer to distributes incoming connections across a group of
Servers/Services and distribute traffic.
|
Each
Network Load Balancer provides a single IP address for each Availability
Zone. The IP-per-AZ feature reduces latency with improved performance,
improves availability through isolation and providing automatic failover.
Use
EC2 auto Scaling group with an Elastic Load Balancing within multiple
Availability Zones for load-balanced application. Use “IP mode” for load
balancers
|
Compute
|
Compute
services are also known as Infrastructure-as-a-Service (IaaS). AWS
Compute, provides a virtual server instance and storage and APIs that
let users migrate workloads to a virtual machine.
|
Ability
to increase or decrease the compute capacity of your application. Instructs
an Auto Scaling group to either launch or terminate Amazon EC2 instances.
To
Scale AWS resources including Amazon EC2 instances, Amazon
DynamoDB tables and indexes use AWS Auto Scaling. Setup application
scaling for multiple resources across multiple services.
|
Amazon
EKS
|
Amazon
Elastic Kubernetes Service (Amazon EKS), It runs Kubernetes control and
data plane instances across multiple Availability Zones to ensure high
availability in an AWS Region. The Kubernetes Cluster
Autoscaler automatically adjusts the number of nodes in the cluster.
|
When
pods fail or are rescheduled onto other nodes. The Cluster Autoscaler is
typically installed as a Deployment in existing cluster. It
uses leader election to ensure high availability, but scaling is
done by only one replica at a time.
Use
Kubernetes Cluster Auto-scaler, to scale Amazon EKS cluster,
automatically adjusts the number of nodes in cluster when pods fail or are
rescheduled onto other nodes.
Use
clusters with large numbers of worker nodes
|
AWS
Lambda
|
AWS
Lambda is a serverless compute service that runs your code in response
to events and automatically manages the underlying compute resources for you
|
When
function receives a request while it's processing a previous request, Lambda
launches another instance of your function to handle the increased load.
Lambda automatically scales to handle 1,000 concurrent executions per region.
This quota can be increased as well
|
Asyc/
Apache Kafka
|
Apache
Kafka is an open-source distributed event streaming platform. Running your
Kafka deployment on Amazon EC2 provides a high performance,
scalable solution for streaming data. Possible deployments consideration
factors like number of messages, message size, monitoring, failure handling,
and any operational issues.
|
Creates
a cluster that spans two different regions, and if the main availability
zones is unavailable for some reason, the service automatically changes to
work on the other availability zones.
Use
an auto-scaling policy for scaling Kafka (Kafka cluster). Use consumer group
to scale data consumption from a Kafka topic. Use new instance of the
application to scale a Kafka Stream application.
|
MSK
|
Amazon
Managed Streaming for Apache Kafka, Amazon MSK provides the control-plane
operations, such as those for creating, updating, and deleting clusters.
|
Autoscaling,
to automatically expand cluster's storage in response to increased usage,
configure an Application Auto-Scaling policy for Amazon MSK. Scale up to Set
up a three-AZ cluster.
|
Service
Mesh (Gloo Mesh)
|
Gloo
Mesh is enterprise Istio with multi-cluster and multi-mesh management
capabilities across multiple clusters and VMs. It is controlling, securing
and observing the traffic flow between your micro services, regardless of
where they are running. An Istio service mesh is logically split into a data
plane and a control plane.
Service
meshes control egress traffic from pod to use endpoints in the local
Availability Zone. Istio is service mesh built on Envoy proxy that currently
provides a routing feature. Istio
service mesh options available for EKS. Service mesh sits on top of
Kubernetes infrastructure and is responsible to communicate between services.
It manages the network traffic between services. Istio enables routing
traffic to the pods or services within the same Availability Zone.
|
Multi-cluster
dynamic routing and Dynamic scaling to multiple nodes in different
availabilities zone
Use
dynamic scaling to multiple nodes in different availabilities zone. Use
two-node EKS cluster. Set up Istio for the cluster. Use Istio top of the EKS
cluster. Enable Istio and associate with app services, it enables Istio to
use Availability Zone information.
|
AWS
API Gateway
|
Amazon
API Gateway is an AWS service for creating, publishing, maintaining,
monitoring, and protecting REST APIs at any scale. AWS API Gateway acts as a
proxy to the backend operations that you have configured.
|
Amazon
API Gateway will automatically scale (multiple availabilities zone and
different regions) to handle the amount of traffic your API receives.
|
Mongo
DB
|
MongoDB
is an open source, NoSQL database that provides support for JSON-styled,
document-oriented storage systems.
|
Creates
MongoDB replica set. Use self-service deployment of the MongoDB replica set
cluster. Use horizontal scaling to overcomes the limitations of single nodes
and avoids single points of failure.
|
MongoDB
Atlas
|
MongoDB
Atlas is a global cloud document database service for modern applications.
|
Deploying
a fully managed MongoDB helps to ensure availability, scalability, and
security compliance. MongoDB atlas support cluster auto-scaling. Cluster
auto-scaling is an intelligent and fully automated capacity management
service in MongoDB Atlas.
|
Scalability Definition
The ability to
workload to perform its agreed function
when load or scope changes. An application / system can handle greater loads by
adapting. Scalability is linked but different to High Availability. There are
two kinds of scalability:
- Vertical Scalability
- It is increasing the size of the instance.
There’s usually a limit to how much you can vertically scale (hardware
limit)
- Vertical Scaling: Increase instance size
(= scale up / down)
- Horizontal Scalability (= elasticity)
- increasing the number of instances / systems
for your application. Horizontal scaling implies distributed systems.
This is very common for web applications / modern applications. It’s
easy to horizontally scale.
- Horizontal Scaling: Increase number of
instances (= scale out / in)
- Auto Scaling
- Types
of Autoscaling - In AWS, a scaling plan is a set of instructions for
scaling up or scaling down your resources. Use a scaling
plan to configure auto scaling for related or associated
scalable resources in a matter of minutes. First, determine the consistency
of usage patterns, as well as the frequency and intensity of traffic
spikes. Then define your priorities.
·
Demand/Reactive
Scaling: When using a reactive
autoscaling method, resources are scaled up and down in response to surges in
traffic. Demand-based scaling is highly responsive to fluctuating traffic and
helps accommodate traffic spikes you cannot predict.
·
Predictive Scaling: A predictive autoscaling method uses machine learning and
artificial intelligence tools to evaluate traffic loads and anticipate when
you’ll need more or fewer resources. combine AWS Auto Scaling with Amazon EC2
Auto Scaling to scale resources throughout many applications with predictive
scaling. This includes three sub-options:
o Load Forecasting: This predictive method analyzes history for up to 14 days to
forecast what demand for the following two days. Updated every day, the data is
created to reflect one-hour intervals.
Scheduled Scaling Actions: This option adds or removes resources according
to a load forecast. This keeps resource use stable and set at your pre-defined
value.
o Maximum Capacity Behavior: Designate a minimum and a maximum capacity value for every
resource, and AWS Auto Scaling will keep each resource within that range. This
gives AWS some flexibility within set parameters. And, you can control if
applications can add more resources when demand is forecasted to be above
maximum capacity
·
Scheduled Scaling: Users may choose the time range based on which additional
resources will be added. Scheduled autoscaling is a hybrid approach that
operates in real-time, predicts known changes in traffic loads, and responds to
such changes at predetermined intervals. Scaling events can be set to occur
automatically at a certain date and time. This is especially helpful in
situations where you can accurately forecast demand. What’s different about
this strategy is that following a schedule predicts the number of available
resources at a given time in advance
·
Manual Scaling – In Manual Scaling, the number of instances is manually
adjusted. You can manually increase or decrease the number of instances through
a CLI or console. Manual Scaling is a good choice when your user doesn’t need
automatic Scaling.
·
Dynamic Scaling – This is yet another type of Auto Scaling in which the
number of EC2 instances is changed automatically depending on the signals
receive. Dynamic Scaling is a good choice when there is a high volume of
unpredictable traffic.
- AWS Cluster Autoscaler: Responsible
for ensuring that our cluster has enough nodes to schedule your pods
without wasting resources
·
Scale Up Event: CA Watches for pods in pending state due to insufficient resources and
it creates new worker nodes and schedules pods on those worker nodes.
·
scaling In event: Autoscaler CA Watches for nodes which are underutilized and terminate
those nodes. it saves wastage of resources.
Scaling best practices
|
Description
|
When to use?
|
Async
over Sync
|
While
designing applications, use asynchronous communication across assets, when
possible. This is the first step in scaling individual assets as per usage.
|
Distributed
architecture with two or more assets communicating with each other to share
data.
|
Caching
|
Make
use of caching services to avoid unnecessary scaling for repetitive requests.
|
When
consumers may request for the same data repeatedly within a defined time
period.
|
Monitoring
|
Every
asset should follow the Observability best practices, and by default
incorporate monitoring of CPU, Memory, IO and Network usage.
|
For
new platforms, incorporate observability practices mentioned in the previous
page.
|
Databases
|
NoSQL
DB, key-value NoSQL database designed to run high-performance applications at
any scale
|
Creates
replica set, a group of Mongod instances that hold the same
data. The purpose of replication is to ensure high availability, in case one
of the servers goes down. In the case of replica sets, the reference
deployment launches multiple servers in respective
different Availability Zones. It also provides failover
|
Dynamo
DB
|
Amazon
Dynamo DB is a fast and flexible NoSQL database service boasting high
availability, high durability. Dynamo DB enables customers to offload the
administrative burdens (hardware provisioning, setup and configuration,
throughput capacity planning, replication, software patching, or cluster
scaling) of operating and scaling distributed databases to AWS.
|
Configure
auto scaling in DynamoDB. Set the minimum and maximum levels of read and
write capacity in addition to the target utilization percentage.
Use
database replication in different AZ.
|
Replication
|
Replicate
databases for recovery as well as to off load reads to multiple instances.
When necessary, create a READ replica of the transactional database.
When
it makes a valid case, implement CQRS pattern to make the best use
of data and resources.
|
When
read-only consumers need a copy of the transactional data in the same or a
different format, with high throughput/volume SLA.
|
Prefer
Eventual consistency (BASE) over ACID
|
A
BASE data store values availability (since that’s important for scale), but
it doesn’t offer guaranteed consistency of replicated data at write time.
|
In
cases where data can be momentarily obsolete, it is advisable to prefer BASE
database over ACID database. This helps in scaling databases horizontally.
|
Sharding
/ tenancy
|
Split
the application/databases (rows or table wise) by function so they can scale
individually.
|
When
different functions or tenants have different SLA and transactional needs.
Your application may have compliance requirements to segregate data owned by
different tenants, but if the scaling requirements vary based on tenant, then
apply this principle.
|
Canary
deployments
|
The ability to deploy applications
with minimum downtime (Terragrunt/Terraform). Roll out new
versions only to a small subset of the servers, and redirect users to new
servers, without bringing the entire application down. As users migrate to
the new version, implement autoscaling methods for both new and old
infrastructure. Refer Canary deployments.
|
When
application has high availability SLO and almost zero downtime expectations.
Also helps roll out features quickly and rollback as needed, helping with stability.
|
Rollback
|
Incorporate
the ability to rollback a specific feature of an application, from source
control. As part of the rollback, any infrastructure template (Terraform
etc.,) associated with the previous version of the application, should also
be executed in order to deploy the resources accordingly.
|
Tag
each and every asset with the release version information, to be able to
rollback as needed.
|
Load
Balancer
|
Distributes
incoming connections across a group of Servers/Services and distribute
traffic between them. While designing applications, use LB when possible.
This is the first step to achieve high availabilities.
The
Network Load Balancer is API-compatible with the Application Load Balancer
Use
Load balancer to distributes incoming connections across a group of
Servers/Services and distribute traffic.
|
Each
Network Load Balancer provides a single IP address for each Availability
Zone. The IP-per-AZ feature reduces latency with improved performance,
improves availability through isolation and providing automatic failover.
Use
EC2 auto Scaling group with an Elastic Load Balancing within multiple
Availability Zones for load-balanced application. Use “IP mode” for load
balancers
|
Compute
|
Compute
services are also known as Infrastructure-as-a-Service (IaaS). AWS
Compute, provides a virtual server instance and storage and APIs that
let users migrate workloads to a virtual machine.
|
Ability
to increase or decrease the compute capacity of your application. Instructs
an Auto Scaling group to either launch or terminate Amazon EC2 instances.
To
Scale AWS resources including Amazon EC2 instances, Amazon
DynamoDB tables and indexes use AWS Auto Scaling. Setup application
scaling for multiple resources across multiple services.
|
Amazon
EKS
|
Amazon
Elastic Kubernetes Service (Amazon EKS), It runs Kubernetes control and
data plane instances across multiple Availability Zones to ensure high
availability in an AWS Region. The Kubernetes Cluster
Autoscaler automatically adjusts the number of nodes in the cluster.
|
When
pods fail or are rescheduled onto other nodes. The Cluster Autoscaler is
typically installed as a Deployment in existing cluster. It
uses leader election to ensure high availability, but scaling is
done by only one replica at a time.
Use
Kubernetes Cluster Auto-scaler, to scale Amazon EKS cluster,
automatically adjusts the number of nodes in cluster when pods fail or are
rescheduled onto other nodes.
Use
clusters with large numbers of worker nodes
|
AWS
Lambda
|
AWS
Lambda is a serverless compute service that runs your code in response
to events and automatically manages the underlying compute resources for you
|
When
function receives a request while it's processing a previous request, Lambda
launches another instance of your function to handle the increased load.
Lambda automatically scales to handle 1,000 concurrent executions per region.
This quota can be increased as well
|
Asyc/
Apache Kafka
|
Apache
Kafka is an open-source distributed event streaming platform. Running your
Kafka deployment on Amazon EC2 provides a high performance,
scalable solution for streaming data. Possible deployments consideration
factors like number of messages, message size, monitoring, failure handling,
and any operational issues.
|
Creates
a cluster that spans two different regions, and if the main availability
zones is unavailable for some reason, the service automatically changes to
work on the other availability zones.
Use
an auto-scaling policy for scaling Kafka (Kafka cluster). Use consumer group
to scale data consumption from a Kafka topic. Use new instance of the
application to scale a Kafka Stream application.
|
MSK
|
Amazon
Managed Streaming for Apache Kafka, Amazon MSK provides the control-plane
operations, such as those for creating, updating, and deleting clusters.
|
Autoscaling,
to automatically expand cluster's storage in response to increased usage,
configure an Application Auto-Scaling policy for Amazon MSK. Scale up to Set
up a three-AZ cluster.
|
Service
Mesh (Gloo Mesh)
|
Gloo
Mesh is enterprise Istio with multi-cluster and multi-mesh management
capabilities across multiple clusters and VMs. It is controlling, securing
and observing the traffic flow between your micro services, regardless of
where they are running. An Istio service mesh is logically split into a data
plane and a control plane.
Service
meshes control egress traffic from pod to use endpoints in the local
Availability Zone. Istio is service mesh built on Envoy proxy that currently
provides a routing feature. Istio
service mesh options available for EKS. Service mesh sits on top of
Kubernetes infrastructure and is responsible to communicate between services.
It manages the network traffic between services. Istio enables routing
traffic to the pods or services within the same Availability Zone.
|
Multi-cluster
dynamic routing and Dynamic scaling to multiple nodes in different
availabilities zone
Use
dynamic scaling to multiple nodes in different availabilities zone. Use
two-node EKS cluster. Set up Istio for the cluster. Use Istio top of the EKS
cluster. Enable Istio and associate with app services, it enables Istio to
use Availability Zone information.
|
AWS
API Gateway
|
Amazon
API Gateway is an AWS service for creating, publishing, maintaining,
monitoring, and protecting REST APIs at any scale. AWS API Gateway acts as a
proxy to the backend operations that you have configured.
|
Amazon
API Gateway will automatically scale (multiple availabilities zone and
different regions) to handle the amount of traffic your API receives.
|
Mongo
DB
|
MongoDB
is an open source, NoSQL database that provides support for JSON-styled,
document-oriented storage systems.
|
Creates
MongoDB replica set. Use self-service deployment of the MongoDB replica set
cluster. Use horizontal scaling to overcomes the limitations of single nodes
and avoids single points of failure.
|
MongoDB
Atlas
|
MongoDB
Atlas is a global cloud document database service for modern applications.
|
Deploying
a fully managed MongoDB helps to ensure availability, scalability, and
security compliance. MongoDB atlas support cluster auto-scaling. Cluster
auto-scaling is an intelligent and fully automated capacity management
service in MongoDB Atlas.
|