matches r repeated one or more times.
r? matches r zero or once.
(repetition).
(r) matches r
(grouping).
r{n} matches r exactly n times.
r{n,} matches r repeated n or more times.
r{n,m} matches r repeated n to m (inclusive) times.
r{,m} matches r repeated 0 to m times (a non‐standard option).
The increasing precedence of operators is:
alternation concatenation repetition grouping
For example,
/^[_a-zA-Z][_a-zA-Z0-9]*$/ and
/^[-+]?([0-9]+\.?|\.[0-9])[0-9]*([eE][-+]?[0-9]+)?$/
are matched by AWK identifiers and AWK numeric constants respectively. Note that “.” has to be escaped to be recognized as a decimal point, and that metacharacters are not special inside character classes.
Any expression can be used on the right hand side of the ~ or !~ operators or passed to a built‐in that expects a regular expression. If needed, it is converted to string, and then interpreted as a regular expres‐
sion. For example,
BEGIN { identifier = "[_a-zA-Z][_a-zA-Z0-9]*" }
$0 ~ "^" identifier
prints all lines that start with an AWK identifier.
mawk recognizes the empty regular expression, //, which matches the empty string and hence is matched by any string at the front, back and between every character. For example,
echo abc | mawk ’{ gsub(//, "X")’ ; print }
XaXbXcX
4. Records and fields
Records are read in one at a time, and stored in the field variable $0. The record is split into fields which are stored in $1, $2, ..., $NF. The built‐in variable NF is set to the number of fields, and NR and FNR
are incremented by 1. Fields above $NF are set to "".
Assignment to $0 causes the fields and NF to be recomputed. Assignment to NF or to a field causes $0 to be reconstructed by concatenating the $i’s separated by OFS. Assignment to a field with index greater than NF,
increases NF and causes $0 to be reconstructed.
Data input stored in fields is string, unless the entire field has numeric form and then the type is number and string. For example,
echo 24 24E |
mawk ’{ print($1>100, $1>"100", $2>100, $2>"100") }’
0 1 1 1
$0 and $2 are string and $1 is number and string. The first comparison is numeric, the second is string, the third is string (100 is converted to "100"), and the last is string.
5. Expressions and operators
The expression syntax is similar to C. Primary expressions are numeric constants, string constants, variables, fields, arrays and function calls. The identifier for a variable, array or function can be a sequence of
letters, digits and underscores, that does not start with a digit. Variables are not declared; they exist when first referenced and are initialized to null.
New expressions are composed with the following operators in order of increasing precedence.
assignment = += -= *= /= %= ^=
conditional ? :
logical or ||
logical and &&
array membership in
matching ~ !~
relational < > <= >= == !=
concatenation (no explicit operator)
add ops + -
mul ops * / %
unary + -
logical not !
exponentiation ^
inc and dec ++ -- (both post and pre)
field $
Assignment, conditional and exponentiation associate right to left; the other operators associate left to right. Any expression can be parenthesized.
6. Arrays
Awk provides one‐dimensional arrays. Array elements