Performance monitoring is a daily sysadmin task. A server may be slow due to CPU saturation, memory pressure, disk I/O bottlenecks, or network congestion. The key is knowing which tool to reach for and how to interpret what it tells you. This guide covers the essential Linux performance monitoring toolkit.
The Four Resources to Monitor
- CPU — is it overloaded? Is the right process consuming it?
- Memory — is RAM full? Is the system swapping?
- Disk I/O — are reads/writes saturating the disk?
- Network — is bandwidth saturated? Are there errors?
top — The Classic Process Monitor
top
The header shows key system stats:
- load average: 1, 5, 15-minute load. Values above your CPU core count indicate overload.
- %us: user-space CPU, %sy: kernel CPU, %wa: waiting for I/O (high = disk bottleneck), %id: idle
- Mem: total, free, used, buff/cache
top -b -n 1 # Single-shot batch output
top -b -n 1 -o %CPU | head -20 # Sort by CPU, top 20
htop — Interactive Process Viewer
sudo apt install htop -y # or: sudo dnf install htop
htop
htop shows per-core CPU bars, memory bar, swap bar, and a sortable process table. Key shortcuts: F6 sort, F4 filter, F9 send signal, F5 tree view, Space tag process.
vmstat — Virtual Memory Statistics
vmstat 2 5 # Report every 2 seconds, 5 times
vmstat -s # Summary statistics
vmstat -d # Disk statistics
Key columns to watch:
r— processes waiting for CPU time (high = CPU bottleneck)b— processes in uninterruptible sleep (high = I/O bottleneck)si/so— swap in/out (non-zero = memory pressure)wa— CPU time waiting for I/O (should be below 5%)us/sy— user/system CPU usage
# Sample output interpretation:
# procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
# r b swpd free buff cache si so bi bo in cs us sy id wa st
# 2 0 0 512000 32000 640000 0 0 20 50 500 1000 25 5 68 2 0
iostat — I/O Statistics
sudo apt install sysstat -y # Install sysstat package
iostat # CPU and I/O summary
iostat -x 2 5 # Extended I/O stats, every 2s, 5 reports
iostat -d sda 2 # Watch specific disk
Key columns in iostat -x:
r/s,w/s— reads/writes per secondrkB/s,wkB/s— read/write throughput (KB/s)await— average I/O wait time in milliseconds (under 10ms = healthy SSD)%util— disk utilization (near 100% = saturated disk)
free — Memory Usage
free -h # Human-readable memory stats
free -h -s 2 # Update every 2 seconds
# Example output:
# total used free shared buff/cache available
# Mem: 7.8Gi 2.1Gi 3.2Gi 50Mi 2.5Gi 5.4Gi
# Swap: 2.0Gi 0B 2.0Gi
The available column is what matters — it shows how much memory can be given to new processes without swapping. buff/cache is used by the kernel for caching but will be freed when needed.
sar — System Activity Reporter
sar -u 2 5 # CPU utilization every 2s
sar -r 2 5 # Memory every 2s
sar -b 2 5 # I/O every 2s
sar -n DEV 2 5 # Network interface stats
sar -f /var/log/sa/sa20 # Historical data from a specific day
sar requires the sysstat service to be running to collect historical data.
lsof and ss for Network Performance
ss -s # Socket summary statistics
ss -tnp | grep ESTABLISHED | wc -l # Count active connections
lsof -i -n -P | grep ESTABLISHED # Active connections with process info
Checking System Load History
uptime # Current load averages
cat /proc/loadavg # Load averages (raw)
w # Who is logged in + load average
A Quick Diagnostic Workflow
# 1. Check overall load
uptime
# 2. Check CPU and processes
top -b -n 1 | head -20
# 3. Check memory and swap
free -h
vmstat 1 5 | grep -v "^proc"
# 4. Check disk I/O
iostat -x 1 3
# 5. Check for disk full
df -h
# 6. Check network connections
ss -s
Summary
Performance monitoring is about quickly identifying which resource is the bottleneck. Start with top or htop for a general overview. Use vmstat for memory pressure and I/O wait, iostat for disk-level detail, and free for memory. Combine these tools and you can diagnose most Linux performance problems in minutes.
Interpreting Load Average: When Should You Worry?
The three load average numbers shown by uptime, top, and htop represent the average number of processes that are either running or waiting to run (in the runnable queue) over the past 1, 5, and 15 minutes respectively. A value of 1.0 means one CPU core is fully saturated. On a 4-core server, a load average of 4.0 means every core is fully busy — that is 100% utilization, but nothing is queuing. Load above the core count means processes are waiting for CPU time.
The rule of thumb: load average / CPU core count gives you the saturation ratio. A ratio below 1.0 is fine; around 1.0 means you are at capacity; above 1.0 means work is queuing. A load of 2.5 on a 4-core server (ratio 0.625) is completely normal. A load of 8.0 on a 2-core server (ratio 4.0) needs immediate investigation.
High load does not always mean high CPU usage. It is important to distinguish CPU-bound from I/O-bound load:
- CPU-bound:
topshows high%us(user) or%sy(system);vmstat 1shows many processes in ther(runnable) column with lowwa(I/O wait). - I/O-bound:
topshows high%wa(I/O wait);vmstat 1shows processes accumulating in theb(blocked) column; CPU may be mostly idle even though load is high.
# Check load average and core count together
uptime
nproc
# Distinguish CPU-bound vs I/O-bound with vmstat
vmstat 1 5
# Identify I/O-bound processes specifically
iostat -x 1
iotop -o
A temporary spike in load average — for instance, during a nightly backup or a batch job — is expected and does not require action. Watch the 5-minute and 15-minute averages: if they are trending upward and the 15-minute figure is as high as the 1-minute figure, the system has been under sustained load, which warrants investigation.
Frequently Asked Questions
-
Is a load average of 2.0 always bad?
No — it depends entirely on the number of CPU cores. On a single-core system, load 2.0 means processes are queuing and the system is overloaded. On an 8-core server, load 2.0 represents only 25% utilization and is completely healthy. Always checknprocorlscputo know how many cores you have before interpreting load average figures. The ratio (load / cores) is the meaningful number; the raw load value alone tells you nothing about whether there is a problem. -
What is the fastest way to find what is causing high load?
Runtopimmediately and pressPto sort by CPU, orMto sort by memory. Note the top two or three processes by resource usage. If the culprit is a web server or database, check its slow query log or access log. For I/O-bound load, switch toiostat -x 1to see which disk devices are saturated, then useiotop -oto identify which processes are generating the I/O. For a combined view,dstat 1(from the dstat package) shows CPU, disk, network, and memory in a single scrolling output. -
How do I monitor performance over time, not just right now?
Install the sysstat package (sudo apt install sysstatorsudo dnf install sysstat), which enables thesar(System Activity Reporter) daemon. It collects CPU, memory, disk, and network statistics every 10 minutes and stores them in/var/log/sa/. Review historical data with commands likesar -u 1 10(CPU),sar -r(memory), orsar -b(disk I/O). For more sophisticated long-term monitoring, consider Prometheus with Node Exporter and Grafana, which provide dashboards and alerting across a fleet of servers.