Linux Text Processing: grep, awk, and sed Guide

grep, awk, and sed are the three pillars of Linux text processing. They work on plain text — logs, config files, CSV data, command output — transforming and filtering it with surgical precision. Mastering these tools means you can extract exactly what you need from any file or data stream without writing a full script.

grep — Search and Filter

grep searches for lines matching a pattern and prints them.

grep "error" /var/log/syslog                    # Lines containing "error"
grep -i "error" /var/log/syslog                 # Case-insensitive
grep -v "^#" /etc/ssh/sshd_config              # Exclude comment lines
grep -n "listen" /etc/nginx/nginx.conf          # Show line numbers
grep -c "Failed" /var/log/auth.log              # Count matching lines
grep -l "nginx" /etc/systemd/system/*.service  # Files containing "nginx"
grep -r "database" /etc/myapp/                 # Recursive search

grep with Regular Expressions

grep "^root" /etc/passwd               # Lines starting with "root"
grep "bash$" /etc/passwd               # Lines ending with "bash"
grep -E "error|fail|warn" /var/log/syslog   # Extended regex: OR
grep -E "[0-9]{1,3}.[0-9]{1,3}" access.log  # Match IP-like patterns
grep -P "d{4}-d{2}-d{2}" logfile    # Perl regex: date pattern

Context Lines

grep -A 3 "error" logfile      # 3 lines After match
grep -B 3 "error" logfile      # 3 lines Before match
grep -C 3 "error" logfile      # 3 lines of Context (before and after)

sed — Stream Editor

sed processes text line by line. Its most common use is find-and-replace, but it can also delete lines, print specific lines, and more.

Substitution (the s command)

sed "s/old/new/" file.txt                   # Replace first occurrence per line
sed "s/old/new/g" file.txt                  # Replace all occurrences (global)
sed -i "s/http:/https:/g" config.txt        # Edit file in-place
sed -i.bak "s/old/new/g" config.txt         # In-place with backup
sed "s/[Ee]rror/ERROR/g" logfile            # Case variations
sed "s/^/  /" file.txt                      # Add 2 spaces at start of each line

Deleting Lines

sed "/^#/d" config.txt             # Delete comment lines
sed "/^$/d" config.txt             # Delete blank lines
sed "3d" file.txt                  # Delete line 3
sed "3,7d" file.txt                # Delete lines 3 through 7

Printing Specific Lines

sed -n "5p" file.txt               # Print line 5 only
sed -n "5,10p" file.txt            # Print lines 5 to 10
sed -n "/pattern/p" file.txt       # Print lines matching pattern

Multiple Expressions

sed -e "s/foo/bar/g" -e "s/baz/qux/g" file.txt
sed "/^#/d; /^$/d" config.txt      # Strip comments and blank lines

awk — Pattern Scanning and Data Extraction

awk processes text field by field. It is ideal for extracting specific columns from structured data.

Field Extraction

awk "{print $1}" file.txt             # Print first field (whitespace-delimited)
awk "{print $1, $3}" file.txt        # Print fields 1 and 3
awk -F: "{print $1}" /etc/passwd      # Colon-delimited, print username
awk -F: "{print $1, $6}" /etc/passwd # Print username and home dir
awk -F, "{print $2}" data.csv         # CSV: print second column

Filtering with Pattern Matching

awk "/error/" logfile                  # Print lines containing "error" (like grep)
awk -F: "$3 >= 1000" /etc/passwd     # Print regular users (UID >= 1000)
awk "$NF > 100" sizes.txt            # Lines where last field exceeds 100
awk "NR==5" file.txt                  # Print line 5 (NR = line number)

Computing and Summarizing

# Sum column 5 (file size in ls -l output)
ls -l | awk "{sum += $5} END {print sum, "bytes"}"

# Count lines per HTTP status code in nginx access log
awk "{print $9}" /var/log/nginx/access.log | sort | uniq -c | sort -rn

# Average response time (assuming last field is time in ms)
awk "{sum += $NF; count++} END {print sum/count, "ms avg"}" timings.log

# Print lines where field 3 matches a value
awk -F: "$3 == 0" /etc/passwd        # Print root user (UID 0)

BEGIN and END Blocks

awk "BEGIN {print "Username   UID"} -F: {print $1, $3} END {print "Done"}" /etc/passwd

Combining grep, sed, and awk

These tools compose beautifully in pipelines:

# Extract all IPs from auth.log that had failed logins
grep "Failed password" /var/log/auth.log | awk "{print $11}" | sort | uniq -c | sort -rn

# Clean a config file: strip comments and blanks, then replace a value
grep -v "^#" /etc/app.conf | grep -v "^$" | sed "s/debug/production/"

# Get top 5 most-requested URLs from nginx log
awk "{print $7}" /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -5

# Extract error lines and format them
grep "ERROR" app.log | sed "s/ERROR: //" | awk "{print NR". "$0}"

Quick Reference

grep — filter: which lines match?
sed — transform: change content in place
awk — extract and compute: process structured columns

Summary

grep, awk, and sed are the duct tape of Linux administration. You will use them to parse logs, transform config files, extract data from command output, and build quick one-liners that replace hours of manual work. Learn the basics of each, and practice combining them in pipes — that is where their real power lies.