Lecture # 35.3 : Reliablity
cloud PATH-AWS

15 minutes


go back go back go back home home

The ability of a system recover from infrastructure or service desruption, and dynamically acquire computing resources to meet demand & migrate desruptions such as misconfiguration or transient network issues.

Design Principles
- Test "Recovery Procedure" : 
    Use automation to simulate different failures or recreate scenarios that led to failovers before.
- Automatically Recover from failure : 
    Anticipate & remidiate failures before they occur.
- Scale Horizontally to increase Aggregate System Availablity:
    Distribute Request across multiple smaller resources to ensure that they dont share a common point of failure.
- Stop Guessing Capacity:
    Maintain the optimal level to satisfy demand without over/under provisioning it. i.e. Use AutoScaling
- Manage Change in Automation: Use Automation to make changes to infrastructure.

AWS Services For Reliablity
- Foundation:           IAM, VPC, Service Limits, Trusted Advisor
- Change Management:    AWS AutoScaling, CloudWatch, CloudTrail, Config
- Failure Management:   AWS Backs, CloudFormation, S3, S3 Glacier, Route53