Home Explore Blog CI



man-pages

4th chunk of `awk.man`
efff6ec3904fef5fb27f50595abb647c45da1674e208adba0000000100000fbd
 string, the non‐string operand
       is converted and the comparison is string.  The result is numeric, 1 or 0.

       In boolean contexts such as, if ( expr ) statement, a string expression evaluates true if and only if it is not the empty string ""; numeric values if and only if not numerically zero.

   3. Regular expressions
       In the AWK language, records, fields and strings are often tested for matching a regular expression.  Regular expressions are enclosed in slashes, and

            expr ~ /r/

       is an AWK expression that evaluates to 1 if expr “matches” r, which means a substring of expr is in the set of strings defined by r.  With no match the expression evaluates to 0; replacing ~ with the “not match” oper‐
       ator, !~ , reverses the meaning.  As  pattern‐action pairs,

            /r/ { action }   and   $0 ~ /r/ { action }

       are  the  same, and for each input record that matches r, action is executed.  In fact, /r/ is an AWK expression that is equivalent to ($0 ~ /r/) anywhere except when on the right side of a match operator or passed as
       an argument to a built‐in function that expects a regular expression argument.

       AWK uses extended regular expressions as with the -E option of grep(1).  The regular expression metacharacters, i.e., those with special meaning in regular expressions are

            \ ^ $ . [ ] | ( ) * + ? { }

       If the command line option ‐W traditional is used, these are omitted:

            { }

       are also regular expression metacharacters, and in this mode,
       require escaping to be a literal character.

       Regular expressions are built up from characters as follows:

            c            matches any non‐metacharacter
                         c.

            \c           matches a character defined by the same
                         escape sequences used
                         in string constants or the literal
                         character c if \c is not an escape sequence.

            .            matches any character (including newline).

            ^            matches the front of a string.

            $            matches the back of a string.

            [c1c2c3...]  matches any character in the class
                         c1c2c3... .
                         An interval of characters is denoted
                         c1-c2 inside a class [...].

            [^c1c2c3...] matches any character not in the class
                         c1c2c3...

       Regular expressions are built up from other regular expressions
       as follows:

            r1r2         matches
                         r1
                         followed immediately by
                         r2
                         (concatenation).

            r1 | r2      matches
                         r1 or
                         r2
                         (alternation).

            r*           matches r repeated zero or more times.

            r+           matches r repeated one or more times.

            r?           matches r zero or once.
                         (repetition).

            (r)          matches r
                         (grouping).

            r{n}         matches r exactly n times.

            r{n,}        matches r repeated n or more times.

            r{n,m}       matches r repeated n to m (inclusive) times.

            r{,m}        matches r repeated 0 to m times (a non‐standard option).

       The increasing precedence of operators is:

       alternation concatenation repetition grouping

       For example,

            /^[_a-zA-Z][_a-zA-Z0-9]*$/  and
            /^[-+]?([0-9]+\.?|\.[0-9])[0-9]*([eE][-+]?[0-9]+)?$/

       are matched by AWK identifiers and AWK numeric constants respectively.  Note that “.” has to be escaped to be recognized as a decimal point, and that metacharacters are not special inside character classes.

       Any expression can be used on the right hand side of the ~ or !~ operators

Title: Regular Expressions in AWK (Continued)
Summary
This section continues the discussion of regular expressions in AWK, covering the evaluation of boolean expressions, how regular expressions are used for pattern matching with the `~` and `!~` operators, and the meaning of metacharacters. It also describes how regular expressions are built from characters and other regular expressions using concatenation, alternation, repetition, and grouping, along with operator precedence.