Table of Contents

Overview

AWK is a powerful text processing language designed for pattern scanning and processing. It’s particularly useful for structured text data like logs, CSV files, and formatted output.

Key Features

  • Pattern scanning
  • Field processing
  • Arithmetic operations
  • String manipulation
  • Regular expressions
  • Custom functions
  • Control structures
  • Built-in variables

Basic Syntax

Basic Structure

# Basic pattern-action
pattern { action }
 
# Print all lines
awk '{ print }' file.txt
 
# Print specific fields
awk '{ print $1, $3 }' file.txt
 
# Using field separator
awk -F',' '{ print $1 }' file.csv

Common Options

# Set field separator
awk -F':' '{ print $1 }' /etc/passwd
 
# Set variable
awk -v name=value '{ print name }' file.txt
 
# Run AWK program file
awk -f program.awk input.txt

Pattern Matching

Basic Patterns

# Match exact string
/pattern/ { print }
 
# Match at beginning
/^pattern/ { print }
 
# Match at end
/pattern$/ { print }
 
# Multiple patterns
/pattern1/ && /pattern2/ { print }

Regular Expressions

# Match numbers
/[0-9]+/ { print }
 
# Match email addresses
/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/ { print }
 
# Case insensitive match
tolower($0) ~ /pattern/ { print }

Field Processing

Field Operations

# Print specific fields
{ print $1, $3 }
 
# Calculate field sum
{ sum += $1 } END { print sum }
 
# Field count
{ print NF }
 
# Process specific fields
$1 ~ /pattern/ { print $2 }

Field Manipulation

# Change field separator
BEGIN { FS=","; OFS="|" }
 
# Modify field content
{ $2 = toupper($2); print }
 
# Add new field
{ $NF = $NF "|" "new_value"; print }

Built-in Variables

Common Variables

# Field and record variables
NF      # Number of fields
NR      # Current record number
FNR     # Record number in current file
$0      # Entire record
$1      # First field
 
# Input/Output variables
FS      # Field separator (input)
OFS     # Field separator (output)
RS      # Record separator (input)
ORS     # Record separator (output)

Special Variables

# File processing
FILENAME # Current filename
ARGC     # Number of arguments
ARGV     # Array of arguments
 
# Numeric formats
CONVFMT  # Conversion format
OFMT     # Output format

Control Structures

Conditional Statements

# If statement
if ($1 > 100) {
    print "Large value"
}
 
# If-else
if ($1 > 100) {
    print "Large"
} else {
    print "Small"
}
 
# Multiple conditions
if ($1 < 50) {
    print "Small"
} else if ($1 < 100) {
    print "Medium"
} else {
    print "Large"
}

Loops

# For loop
for (i=1; i<=NF; i++) {
    print $i
}
 
# While loop
while (getline > 0) {
    print $0
}
 
# Do-while loop
do {
    print $0
} while (getline > 0)

Functions

String Functions

# Length
length($1)
 
# Substring
substr($1, 1, 5)
 
# Case conversion
toupper($1)
tolower($1)
 
# String replacement
gsub(/pattern/, "replacement", $1)

Numeric Functions

# Mathematical functions
int(3.14)
sqrt(100)
sin(0)
rand()
 
# Formatting
sprintf("%.2f", $1)

Advanced Features

Arrays

# Associative arrays
{ count[$1]++ }
END { for (key in count) print key, count[key] }
 
# Array sorting
{ a[NR] = $0 }
END {
    n = asort(a)
    for (i=1; i<=n; i++) print a[i]
}

User-Defined Functions

# Function definition
function square(x) {
    return x * x
}
 
# Function usage
{ print square($1) }

Best Practices

Performance Tips

# Pre-compile regex
BEGIN { pattern = "^[0-9]+$" }
$1 ~ pattern { print }
 
# Minimize field references
{ temp = $1; ... }
 
# Use appropriate data structures
# Arrays for counting
{ count[$1]++ }

Error Handling

# Check field existence
$1 != "" { print $1 }
 
# Validate numeric input
$1 ~ /^[0-9]+$/ { print }
 
# Handle missing files
BEGINFILE { if (ERRNO) { print "Error:", ERRNO > "/dev/stderr"; nextfile } }

Example Scripts

Log Analysis

#!/usr/bin/awk -f
# Analyze Apache access log
BEGIN {
    FS = "\"" 
    print "Status Code Analysis"
    print "==================="
}
 
{
    split($3, status, " ")
    codes[status[1]]++
}
 
END {
    for (code in codes)
        printf "%s: %d\n", code, codes[code]
}

CSV Processing

#!/usr/bin/awk -f
# Process CSV data
BEGIN {
    FS = ","
    OFS = "|"
    print "ID", "Name", "Total"
}
 
NR > 1 {  # Skip header
    sum = 0
    for (i=3; i<=NF; i++)
        sum += $i
    print $1, $2, sum
}

Data Transformation

#!/usr/bin/awk -f
# Transform data format
BEGIN {
    FS = "\t"
    OFS = ","
}
 
function clean(str) {
    gsub(/^\s+|\s+$/, "", str)
    return str
}
 
{
    for (i=1; i<=NF; i++)
        $i = clean($i)
    if (NF > 0)
        print
}

Remember:

  • Use appropriate field separators
  • Consider performance for large files
  • Handle edge cases
  • Document complex patterns
  • Test with sample data
  • Use functions for reusable code

For detailed information, consult the AWK manual (man awk).