AWK - Text Processing Language Guide

Overview
Basic Syntax
Pattern Matching
Field Processing
Built-in Variables
Control Structures
Functions
Advanced Features
Best Practices

Overview

AWK is a powerful text processing language designed for pattern scanning and processing. It’s particularly useful for structured text data like logs, CSV files, and formatted output.

Key Features

Pattern scanning
Field processing
Arithmetic operations
String manipulation
Regular expressions
Custom functions
Control structures
Built-in variables

Basic Syntax

Basic Structure

# Basic pattern-action
pattern { action }
 
# Print all lines
awk '{ print }' file.txt
 
# Print specific fields
awk '{ print $1, $3 }' file.txt
 
# Using field separator
awk -F',' '{ print $1 }' file.csv

Common Options

# Set field separator
awk -F':' '{ print $1 }' /etc/passwd
 
# Set variable
awk -v name=value '{ print name }' file.txt
 
# Run AWK program file
awk -f program.awk input.txt

Pattern Matching

Basic Patterns

# Match exact string
/pattern/ { print }
 
# Match at beginning
/^pattern/ { print }
 
# Match at end
/pattern$/ { print }
 
# Multiple patterns
/pattern1/ && /pattern2/ { print }

Regular Expressions

# Match numbers
/[0-9]+/ { print }
 
# Match email addresses
/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/ { print }
 
# Case insensitive match
tolower($0) ~ /pattern/ { print }

Field Processing

Field Operations

# Print specific fields
{ print $1, $3 }
 
# Calculate field sum
{ sum += $1 } END { print sum }
 
# Field count
{ print NF }
 
# Process specific fields
$1 ~ /pattern/ { print $2 }

Field Manipulation

# Change field separator
BEGIN { FS=","; OFS="|" }
 
# Modify field content
{ $2 = toupper($2); print }
 
# Add new field
{ $NF = $NF "|" "new_value"; print }

Built-in Variables

Common Variables

# Field and record variables
NF      # Number of fields
NR      # Current record number
FNR     # Record number in current file
$0      # Entire record
$1      # First field
 
# Input/Output variables
FS      # Field separator (input)
OFS     # Field separator (output)
RS      # Record separator (input)
ORS     # Record separator (output)

Special Variables

# File processing
FILENAME # Current filename
ARGC     # Number of arguments
ARGV     # Array of arguments
 
# Numeric formats
CONVFMT  # Conversion format
OFMT     # Output format

Control Structures

Conditional Statements

# If statement
if ($1 > 100) {
    print "Large value"
}
 
# If-else
if ($1 > 100) {
    print "Large"
} else {
    print "Small"
}
 
# Multiple conditions
if ($1 < 50) {
    print "Small"
} else if ($1 < 100) {
    print "Medium"
} else {
    print "Large"
}

Loops

# For loop
for (i=1; i<=NF; i++) {
    print $i
}
 
# While loop
while (getline > 0) {
    print $0
}
 
# Do-while loop
do {
    print $0
} while (getline > 0)

Functions

String Functions

# Length
length($1)
 
# Substring
substr($1, 1, 5)
 
# Case conversion
toupper($1)
tolower($1)
 
# String replacement
gsub(/pattern/, "replacement", $1)

Numeric Functions

# Mathematical functions
int(3.14)
sqrt(100)
sin(0)
rand()
 
# Formatting
sprintf("%.2f", $1)

Advanced Features

Arrays

# Associative arrays
{ count[$1]++ }
END { for (key in count) print key, count[key] }
 
# Array sorting
{ a[NR] = $0 }
END {
    n = asort(a)
    for (i=1; i<=n; i++) print a[i]
}

User-Defined Functions

# Function definition
function square(x) {
    return x * x
}
 
# Function usage
{ print square($1) }

Best Practices

Performance Tips

# Pre-compile regex
BEGIN { pattern = "^[0-9]+$" }
$1 ~ pattern { print }
 
# Minimize field references
{ temp = $1; ... }
 
# Use appropriate data structures
# Arrays for counting
{ count[$1]++ }

Error Handling

# Check field existence
$1 != "" { print $1 }
 
# Validate numeric input
$1 ~ /^[0-9]+$/ { print }
 
# Handle missing files
BEGINFILE { if (ERRNO) { print "Error:", ERRNO > "/dev/stderr"; nextfile } }

Example Scripts

Log Analysis

#!/usr/bin/awk -f
# Analyze Apache access log
BEGIN {
    FS = "\"" 
    print "Status Code Analysis"
    print "==================="
}
 
{
    split($3, status, " ")
    codes[status[1]]++
}
 
END {
    for (code in codes)
        printf "%s: %d\n", code, codes[code]
}

CSV Processing

#!/usr/bin/awk -f
# Process CSV data
BEGIN {
    FS = ","
    OFS = "|"
    print "ID", "Name", "Total"
}
 
NR > 1 {  # Skip header
    sum = 0
    for (i=3; i<=NF; i++)
        sum += $i
    print $1, $2, sum
}

Data Transformation

#!/usr/bin/awk -f
# Transform data format
BEGIN {
    FS = "\t"
    OFS = ","
}
 
function clean(str) {
    gsub(/^\s+|\s+$/, "", str)
    return str
}
 
{
    for (i=1; i<=NF; i++)
        $i = clean($i)
    if (NF > 0)
        print
}

Remember:

Use appropriate field separators
Consider performance for large files
Handle edge cases
Document complex patterns
Test with sample data
Use functions for reusable code

For detailed information, consult the AWK manual (man awk).

cli.wiki

Explorer