You've deployed your application on AWS — but how do you know if it's healthy? How do you catch a failing EC2 instance, a Lambda function timing out, or an RDS database running out of storage? That's where AWS CloudWatch comes in. CloudWatch is AWS's native observability service, providing metrics, logs, alarms, and dashboards for virtually every AWS service. This guide explains how it works and how to use it effectively.
What CloudWatch Monitors
CloudWatch collects two fundamental types of data:
- Metrics — Numerical time-series data. Examples: EC2 CPU utilization, Lambda invocation count, S3 bucket size, RDS free storage space.
- Logs — Text log data from applications, Lambda functions, VPC flow logs, CloudTrail, and more.
AWS services publish metrics to CloudWatch automatically. You can also send custom metrics and custom logs from your applications.
CloudWatch Metrics
Navigate to CloudWatch > Metrics > All Metrics to explore all available metrics for your account. Key EC2 metrics available by default:
CPUUtilization— Percentage of CPU in useNetworkIn / NetworkOut— Bytes transferred in/outDiskReadOps / DiskWriteOps— Disk I/O operations (instance store only)StatusCheckFailed— Detects instance or system-level failures
Note: Memory utilization and disk usage for EBS volumes are not published automatically — you need the CloudWatch Agent for those.
Install the CloudWatch Agent for Custom Metrics
# Install the CloudWatch agent on Amazon Linux 2023
sudo dnf install -y amazon-cloudwatch-agent
# Create a configuration file
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
# Start the agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl
-a fetch-config
-m ec2
-c ssm:/AmazonCloudWatch-linux
-s
Once configured, the agent ships memory, disk, and custom application metrics to CloudWatch every 60 seconds (or more frequently if configured).
CloudWatch Logs
CloudWatch Logs stores log data from AWS services and your applications. Key concepts:
- Log Group — A container for log streams from a particular source (e.g.
/aws/lambda/my-function) - Log Stream — A sequence of log events from a single source instance
- Retention Policy — Set how long logs are kept (1 day to Never Expire)
# Create a log group
aws logs create-log-group --log-group-name /myapp/production
# Set retention to 30 days
aws logs put-retention-policy
--log-group-name /myapp/production
--retention-in-days 30
# Query logs using CloudWatch Insights
aws logs start-query
--log-group-name /myapp/production
--start-time $(date -d "1 hour ago" +%s)
--end-time $(date +%s)
--query-string "fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20"
CloudWatch Alarms
Alarms watch a single metric and trigger actions when the metric crosses a threshold. Actions can include sending an SNS notification, auto-scaling, or stopping/rebooting an EC2 instance.
# Create an alarm for high CPU utilization on an EC2 instance
aws cloudwatch put-metric-alarm
--alarm-name "HighCPU-WebServer"
--alarm-description "Trigger when CPU exceeds 80% for 5 minutes"
--metric-name CPUUtilization
--namespace AWS/EC2
--statistic Average
--period 300
--threshold 80
--comparison-operator GreaterThanThreshold
--dimensions Name=InstanceId,Value=i-0abcd1234efgh5678
--evaluation-periods 2
--alarm-actions arn:aws:sns:us-east-1:YOUR_ACCOUNT:AlertsTopic
--ok-actions arn:aws:sns:us-east-1:YOUR_ACCOUNT:AlertsTopic
Alarm states: OK (metric is within threshold), ALARM (threshold breached), INSUFFICIENT_DATA (not enough data yet).
CloudWatch Dashboards
Dashboards let you create custom operational views combining metrics from multiple services. Build a dashboard that shows EC2 CPU, Lambda invocations, API Gateway latency, and RDS connections on a single screen. Share dashboards across your team for a unified operational view.
CloudWatch Logs Insights
Logs Insights is a powerful query language for analyzing log data at scale. Example query to find the top 10 slowest Lambda invocations:
fields @timestamp, @duration, @requestId
| filter @type = "REPORT"
| sort @duration desc
| limit 10
Composite Alarms
Composite alarms combine multiple alarms with AND/OR logic to reduce alert noise. Instead of alerting on every individual metric spike, alert only when multiple indicators are simultaneously unhealthy — reducing false positives in production.
CloudWatch Pricing
- Free tier: 10 custom metrics, 10 alarms, 5 GB of log ingestion/month
- Custom metrics: $0.30/metric/month (first 10,000)
- Log ingestion: $0.50/GB
- Logs Insights queries: $0.005/GB scanned
- Dashboards: $3/dashboard/month (first 3 are free)
Summary
CloudWatch is the nervous system of your AWS infrastructure. Set up alarms for CPU, memory, error rates, and latency before you launch any production workload. Use Logs Insights to investigate incidents quickly. With proper CloudWatch instrumentation, you move from reactive firefighting to proactive operational management.