One of the most powerful capabilities of cloud computing is the ability to automatically add or remove resources based on demand. AWS Auto Scaling does exactly that — it monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. Instead of over-provisioning servers to handle peak load, you scale out when traffic spikes and scale in when it drops, paying only for what you actually use.
Types of Auto Scaling in AWS
- EC2 Auto Scaling — Automatically adds or removes EC2 instances in an Auto Scaling Group (ASG)
- Application Auto Scaling — Scales other AWS resources: ECS tasks, DynamoDB tables, Lambda concurrency, Aurora replicas, and more
- AWS Auto Scaling (the service) — A unified interface to manage scaling across multiple services
EC2 Auto Scaling Groups (ASG)
An Auto Scaling Group is a collection of EC2 instances treated as a logical unit for scaling and management. You define:
- Launch Template — The configuration used to launch new instances (AMI, instance type, key pair, security groups, IAM role)
- Minimum capacity — The fewest instances that must always be running
- Maximum capacity — The hard ceiling on instance count
- Desired capacity — The current target number of instances
# Create a Launch Template
aws ec2 create-launch-template
--launch-template-name my-web-server
--version-description "v1"
--launch-template-data '{
"ImageId": "ami-0c94855ba95c71c99",
"InstanceType": "t3.micro",
"KeyName": "my-key-pair",
"SecurityGroupIds": ["sg-0abc12345def67890"],
"IamInstanceProfile": {"Name": "MyEC2WebRole"}
}'
# Create an Auto Scaling Group
aws autoscaling create-auto-scaling-group
--auto-scaling-group-name my-web-asg
--launch-template "LaunchTemplateName=my-web-server,Version=1"
--min-size 2
--max-size 10
--desired-capacity 3
--vpc-zone-identifier "subnet-aaa,subnet-bbb"
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:ACCOUNT:targetgroup/my-tg/abc
Scaling Policies
Scaling policies define when and how to scale. AWS offers three main types:
1. Target Tracking Scaling (Recommended)
You set a target metric value (e.g. keep average CPU at 50%), and AWS automatically adds or removes instances to maintain that target. This is the simplest and most effective policy for most use cases:
aws autoscaling put-scaling-policy
--auto-scaling-group-name my-web-asg
--policy-name cpu-target-tracking
--policy-type TargetTrackingScaling
--target-tracking-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 50.0,
"ScaleInCooldown": 300,
"ScaleOutCooldown": 60
}'
2. Step Scaling
Triggers scaling actions based on the size of the metric breach. E.g. add 1 instance when CPU is 60–80%, add 3 instances when CPU exceeds 80%. More granular but more complex to configure.
3. Scheduled Scaling
Scale proactively based on known traffic patterns. If you know your application gets heavy traffic every weekday morning, pre-scale before the traffic arrives:
# Scale up every weekday morning at 7 AM UTC
aws autoscaling put-scheduled-update-group-action
--auto-scaling-group-name my-web-asg
--scheduled-action-name scale-up-morning
--recurrence "0 7 * * MON-FRI"
--desired-capacity 6
--min-size 4
# Scale down at 8 PM UTC
aws autoscaling put-scheduled-update-group-action
--auto-scaling-group-name my-web-asg
--scheduled-action-name scale-down-evening
--recurrence "0 20 * * MON-FRI"
--desired-capacity 2
--min-size 2
Attach a Load Balancer
Auto Scaling Groups work seamlessly with Application Load Balancers (ALB). New instances are automatically registered with the target group, and unhealthy instances are deregistered before termination. This ensures zero-downtime scaling:
# Attach an ALB target group to the ASG
aws autoscaling attach-load-balancer-target-groups
--auto-scaling-group-name my-web-asg
--target-group-arns arn:aws:elasticloadbalancing:us-east-1:ACCOUNT:targetgroup/my-tg/abc
Health Checks
Auto Scaling monitors instance health via EC2 status checks (default) or ELB health checks (recommended when using a load balancer). If an instance fails health checks, it is terminated and replaced automatically. Enable ELB health checks:
aws autoscaling update-auto-scaling-group
--auto-scaling-group-name my-web-asg
--health-check-type ELB
--health-check-grace-period 120
Cooldown Periods
Cooldowns prevent Auto Scaling from launching or terminating additional instances before the previous scaling activity has taken effect. Default cooldown is 300 seconds. Set scale-out cooldowns short (60s) to respond quickly to spikes, and scale-in cooldowns longer (300s) to avoid premature termination.
Warm Pools (Reduce Scale-Out Latency)
For applications with slow startup times, use Warm Pools — a pool of pre-initialized, stopped instances ready to be started quickly when needed. This dramatically reduces the time from scaling trigger to serving traffic.
Summary
AWS Auto Scaling ensures your application always has the right amount of compute capacity — no more, no less. Target Tracking policies are the right default for most web applications. Pair your ASG with an Application Load Balancer and proper health checks, and your application will handle traffic spikes and instance failures automatically, with no manual intervention required.
Frequently Asked Questions
- What is the difference between horizontal and vertical scaling in AWS?
Horizontal scaling (scaling out/in) adds or removes instances — this is what Auto Scaling Groups do. It is the recommended pattern in AWS because it provides fault tolerance (no single point of failure), unlimited theoretical capacity, and zero downtime during scaling events. Vertical scaling (scaling up/down) means moving to a larger or smaller instance type for more CPU, RAM, or storage. It requires stopping and restarting the instance, causing downtime, and has a hard ceiling at the largest available instance type. Use horizontal scaling as the default; use vertical scaling when your application cannot be distributed across multiple instances (some legacy databases, for example). - How long does it take for Auto Scaling to add a new instance?
Typically 3 to 5 minutes from when the scaling trigger fires to when the new instance is in service and handling traffic. The time breaks down into: instance launch (1-2 minutes), operating system and user-data script initialization (1-3 minutes depending on complexity), health check grace period (configurable, often 60-300 seconds), and load balancer health check confirmation. Because scaling takes time, configure your scale-out policies to trigger early — at 60-70% CPU rather than 90% — to have new instances ready before traffic saturates your existing fleet. - What is a Target Tracking Scaling Policy?
Target Tracking is the recommended policy type for most use cases. You specify a target metric value — such as "maintain 50% average CPU utilization" or "keep 1000 requests per minute per instance" — and Auto Scaling continuously adjusts capacity to keep the metric at that target. It automatically creates the CloudWatch alarms needed for both scale-out and scale-in. Compared to Step Scaling (where you define threshold bands and corresponding adjustment sizes), Target Tracking is simpler to configure and responds more smoothly to gradual load changes. - What is the difference between a minimum, maximum, and desired capacity?
The minimum capacity is the floor — Auto Scaling never scales in below this count, ensuring baseline availability. The maximum capacity is the ceiling — Auto Scaling never scales out beyond this, providing a cost guard against runaway scaling. The desired capacity is the current target number of instances that Auto Scaling actively maintains. When a scaling policy fires, it changes the desired capacity within the min/max bounds, and Auto Scaling launches or terminates instances to match. Setting minimum = maximum = desired creates a fixed-size fleet that never scales.