string, the non‐string operand
is converted and the comparison is string. The result is numeric, 1 or 0.
In boolean contexts such as, if ( expr ) statement, a string expression evaluates true if and only if it is not the empty string ""; numeric values if and only if not numerically zero.
3. Regular expressions
In the AWK language, records, fields and strings are often tested for matching a regular expression. Regular expressions are enclosed in slashes, and
expr ~ /r/
is an AWK expression that evaluates to 1 if expr “matches” r, which means a substring of expr is in the set of strings defined by r. With no match the expression evaluates to 0; replacing ~ with the “not match” oper‐
ator, !~ , reverses the meaning. As pattern‐action pairs,
/r/ { action } and $0 ~ /r/ { action }
are the same, and for each input record that matches r, action is executed. In fact, /r/ is an AWK expression that is equivalent to ($0 ~ /r/) anywhere except when on the right side of a match operator or passed as
an argument to a built‐in function that expects a regular expression argument.
AWK uses extended regular expressions as with the -E option of grep(1). The regular expression metacharacters, i.e., those with special meaning in regular expressions are
\ ^ $ . [ ] | ( ) * + ? { }
If the command line option ‐W traditional is used, these are omitted:
{ }
are also regular expression metacharacters, and in this mode,
require escaping to be a literal character.
Regular expressions are built up from characters as follows:
c matches any non‐metacharacter
c.
\c matches a character defined by the same
escape sequences used
in string constants or the literal
character c if \c is not an escape sequence.
. matches any character (including newline).
^ matches the front of a string.
$ matches the back of a string.
[c1c2c3...] matches any character in the class
c1c2c3... .
An interval of characters is denoted
c1-c2 inside a class [...].
[^c1c2c3...] matches any character not in the class
c1c2c3...
Regular expressions are built up from other regular expressions
as follows:
r1r2 matches
r1
followed immediately by
r2
(concatenation).
r1 | r2 matches
r1 or
r2
(alternation).
r* matches r repeated zero or more times.
r+ matches r repeated one or more times.
r? matches r zero or once.
(repetition).
(r) matches r
(grouping).
r{n} matches r exactly n times.
r{n,} matches r repeated n or more times.
r{n,m} matches r repeated n to m (inclusive) times.
r{,m} matches r repeated 0 to m times (a non‐standard option).
The increasing precedence of operators is:
alternation concatenation repetition grouping
For example,
/^[_a-zA-Z][_a-zA-Z0-9]*$/ and
/^[-+]?([0-9]+\.?|\.[0-9])[0-9]*([eE][-+]?[0-9]+)?$/
are matched by AWK identifiers and AWK numeric constants respectively. Note that “.” has to be escaped to be recognized as a decimal point, and that metacharacters are not special inside character classes.
Any expression can be used on the right hand side of the ~ or !~ operators