Lecture # 35.3 : Reliablity

cloud

PATH-AWS

Mar 26, 2024

15 minutes

Go Back

Blog

Home

The ability of a system recover from infrastructure or service desruption, and dynamically acquire computing resources to meet demand & migrate desruptions such as misconfiguration or transient network issues.

Design Principles

- Test "Recovery Procedure" : 
    Use automation to simulate different failures or recreate scenarios that led to failovers before.
- Automatically Recover from failure : 
    Anticipate & remidiate failures before they occur.
- Scale Horizontally to increase Aggregate System Availablity:
    Distribute Request across multiple smaller resources to ensure that they dont share a common point of failure.
- Stop Guessing Capacity:
    Maintain the optimal level to satisfy demand without over/under provisioning it. i.e. Use AutoScaling
- Manage Change in Automation: Use Automation to make changes to infrastructure.

AWS Services For Reliablity

- Foundation:           IAM, VPC, Service Limits, Trusted Advisor
- Change Management:    AWS AutoScaling, CloudWatch, CloudTrail, Config
- Failure Management:   AWS Backs, CloudFormation, S3, S3 Glacier, Route53