Home Explore Blog CI



man-pages

11th chunk of `awk.man`
ccd1c48a1e252e486f7e4dd03bb87f5dbcceec75e602c32e0000000100000fec
      Since  mawk interprets RS as a regular expression, multi‐line records are easy.  Setting RS = "\n\n+", makes one or more blank lines separate records.  If FS = " " (the default), then single newlines, by the rules for
       <SPACE> above, become space and single newlines are field separators.

            For example, if

            •   a file is "a b\nc\n\n",

            •   RS = "\n\n+" and

            •   FS = " ",

            then there is one record “a b\nc” with three fields “a”, “b” and “c”:

            •   using FS = “\n”, gives two fields “a b” and “c”;

            •   using FS = “”, gives one field identical to the record.

       If you want lines with spaces or tabs to be considered blank, set RS = “\n([ \t]*\n)+”.  For compatibility with other awks, setting RS = "" has the same effect as if blank lines are stripped from the front and back of
       files and then records are determined as if RS = “\n\n+”.  POSIX requires that “\n” always separates records when RS = "" regardless of the value of FS.  mawk does not support this convention, because defining “\n” as
       <SPACE> makes it unnecessary.

       Most of the time when you change RS for multi‐line records, you will also want to change ORS to “\n\n” so the record spacing is preserved on output.

   13. Program execution
       This section describes the order of program execution.  First ARGC is set to the total number of command line arguments passed to the execution phase of the program.

       •   ARGV[0] is set to the name of the AWK interpreter and

       •   ARGV[1] ...  ARGV[ARGC-1] holds the remaining command line arguments exclusive of options and program source.

       For example, with

            mawk  -f  prog  v=1  A  t=hello  B

       ARGC = 5 with
              ARGV[0] = "mawk",
              ARGV[1] = "v=1",
              ARGV[2] = "A",
              ARGV[3] = "t=hello" and
              ARGV[4] = "B".

       Next, each BEGIN block is executed in order.  If the program consists entirely of BEGIN blocks, then execution terminates, else an input stream is opened and execution continues.  If ARGC equals 1, the input stream is
       set to stdin, else  the command line arguments ARGV[1] ...  ARGV[ARGC-1] are examined for a file argument.

       The command line arguments divide into three sets: file arguments, assignment arguments and empty strings "".  An assignment has the form var=string.  When an ARGV[i] is examined as a possible file argument, if it  is
       empty  it is skipped; if it is an assignment argument, the assignment to var takes place and i skips to the next argument; else ARGV[i] is opened for input.  If it fails to open, execution terminates with exit code 2.
       If no command line argument is a file argument, then input comes from stdin.  Getline in a BEGIN action opens input.  “-” as a file argument denotes stdin.

       Once an input stream is open, each input record is tested against each pattern, and if it matches, the associated action is executed.  An expression pattern matches if it is boolean true (see the end of section 2).  A
       BEGIN pattern matches before any input has been read, and an END pattern matches after all input has been read.  A range pattern, expr1,expr2 , matches every record between the match of expr1 and the match  expr2  in‐
       clusively.

       When  end of file occurs on the input stream, the remaining command line arguments are examined for a file argument, and if there is one it is opened, else the END pattern is considered matched and all END actions are
       executed.

       In the example, the assignment v=1 takes place after the BEGIN actions are executed, and the data placed in v is typed number and string.  Input is then read from file A.  On end of file A, t  is  set  to  the  string
       "hello", and B is opened for input.  On end of file B, the END actions are executed.

       Program flow at the pattern {action} level can be changed with the

Title: AWK Multi-line Records and Program Execution Flow
Summary
This section elaborates on handling multi-line records in AWK by interpreting RS as a regular expression, and explains how to set RS and ORS for proper record spacing. Then, it describes the AWK program execution flow, including setting ARGC and ARGV, processing BEGIN blocks, handling command-line arguments (files, assignments, empty strings), opening input streams, matching patterns against input records, executing actions, and processing END blocks. An example is provided to illustrate the order in which assignments and file processing occur.