matches, the associated action is executed. An expression pattern matches if it is boolean true (see the end of section 2). A
BEGIN pattern matches before any input has been read, and an END pattern matches after all input has been read. A range pattern, expr1,expr2 , matches every record between the match of expr1 and the match expr2 in‐
clusively.
When end of file occurs on the input stream, the remaining command line arguments are examined for a file argument, and if there is one it is opened, else the END pattern is considered matched and all END actions are
executed.
In the example, the assignment v=1 takes place after the BEGIN actions are executed, and the data placed in v is typed number and string. Input is then read from file A. On end of file A, t is set to the string
"hello", and B is opened for input. On end of file B, the END actions are executed.
Program flow at the pattern {action} level can be changed with the
next
nextfile
exit opt_expr
statements:
• A next statement causes the next input record to be read and pattern testing to restart with the first pattern {action} pair in the program.
• A nextfile statement tells mawk to stop processing the current input file. It then updates FILENAME to the next file listed on the command line, and resets FNR to 1.
• An exit statement causes immediate execution of the END actions or program termination if there are none or if the exit occurs in an END action. The opt_expr sets the exit value of the program unless overridden
by a later exit or subsequent error.
EXAMPLES
1. emulate cat.
{ print }
2. emulate wc.
{ chars += length($0) + 1 # add one for the \n
words += NF
}
END{ print NR, words, chars }
3. count the number of unique “real words”.
BEGIN { FS = "[^A-Za-z]+" }
{ for(i = 1 ; i <= NF ; i++) word[$i] = "" }
END { delete word[""]
for ( i in word ) cnt++
print cnt
}
4. sum the second field of every record based on the first field.
$1 ~ /credit|gain/ { sum += $2 }
$1 ~ /debit|loss/ { sum -= $2 }
END { print sum }
5. sort a file, comparing as string
{ line[NR] = $0 "" } # make sure of comparison type
# in case some lines look numeric
END { isort(line, NR)
for(i = 1 ; i <= NR ; i++) print line[i]
}
#insertion sort of A[1..n]
function isort( A, n, i, j, hold)
{
for( i = 2 ; i <= n ; i++)
{
hold = A[j = i]
while ( A[j-1] > hold )
{ j-- ; A[j+1] = A[j] }
A[j] = hold
}
# sentinel A[0] = "" will be created if needed
}
COMPATIBILITY ISSUES
MAWK 1.3.3 versus POSIX 1003.2 Draft 11.3
The POSIX 1003.2(draft 11.3) definition of the AWK language is AWK as described in the AWK book with a few extensions that appeared in SystemVR4 nawk. The extensions are:
• New functions: toupper() and tolower().
• New variables: ENVIRON[] and CONVFMT.
• ANSI C conversion specifications for printf() and sprintf().
• New command options: -v var=value, multiple -f options and implementation options as arguments to -W.
• For systems (MS‐DOS or Windows) which provide a setmode function, an environment variable MAWKBINMODE and a built‐in variable BINMODE. The bits of the BINMODE value tell mawk how to modify the RS and ORS
variables:
0 set standard input to binary mode, and if BIT‐2 is unset, set RS to "\r\n" (CR/LF) rather than "\n" (LF).
1 set