Functions are passed expressions by value and arrays by reference. Extra arguments serve as local variables and are initialized to
null. For example, csplit(s,A) puts each character of s into array A and returns the length of s.
function csplit(s, A, n, i)
{
n = length(s)
for( i = 1 ; i <= n ; i++ ) A[i] = substr(s, i, 1)
return n
}
Putting extra space between passed arguments and local variables is conventional. Functions can be referenced before they are defined, but the function name and the ’(’ of the arguments must touch to avoid confusion
with concatenation.
A function parameter is normally a scalar value (number or string). If there is a forward reference to a function using an array as a parameter, the function’s corresponding parameter will be treated as an array.
11. Splitting strings, records and files
Awk programs use the same algorithm to split strings into arrays with split(), and records into fields on FS. mawk uses essentially the same algorithm to split files into records on RS.
Split(expr,A,sep) works as follows:
(1) If sep is omitted, it is replaced by FS. Sep can be an expression or regular expression. If it is an expression of non‐string type, it is converted to string.
(2) If sep = " " (a single space), then <SPACE> is trimmed from the front and back of expr, and sep becomes <SPACE>. mawk defines <SPACE> as the regular expression /[ \t\n]+/. Otherwise sep is treated as a regu‐
lar expression, except that meta‐characters are ignored for a string of length 1, e.g., split(x, A, "*") and split(x, A, /\*/) are the same.
(3) If expr is not string, it is converted to string. If expr is then the empty string "", split() returns 0 and A is set empty. Otherwise, all non‐overlapping, non‐null and longest matches of sep in expr, sepa‐
rate expr into fields which are loaded into A. The fields are placed in A[1], A[2], ..., A[n] and split() returns n, the number of fields which is the number of matches plus one. Data placed in A that looks
numeric is typed number and string.
Splitting records into fields works the same except the pieces are loaded into $1, $2,..., $NF. If $0 is empty, NF is set to 0 and all $i to "".
mawk splits files into records by the same algorithm, but with the slight difference that RS is really a terminator instead of a separator. (ORS is really a terminator too).
E.g., if FS = “:+” and $0 = “a::b:” , then NF = 3 and $1 = “a”, $2 = “b” and $3 = "", but if “a::b:” is the contents of an input file and RS = “:+”, then there are two records “a” and “b”.
RS = " " is not special.
If FS = "", then mawk breaks the record into individual characters, and, similarly, split(s,A,"") places the individual characters of s into A.
12. Multi‐line records
Since mawk interprets RS as a regular expression, multi‐line records are easy. Setting RS = "\n\n+", makes one or more blank lines separate records. If FS = " " (the default), then single newlines, by the rules for
<SPACE> above, become space and single newlines are field separators.
For example, if
• a file is "a b\nc\n\n",
• RS = "\n\n+" and
• FS = " ",
then there is one record “a b\nc” with three fields “a”, “b” and “c”:
• using FS = “\n”, gives two fields “a b” and “c”;
• using FS = “”, gives one field identical to the record.
If you want lines with spaces or tabs to be considered blank, set RS = “\n([ \t]*\n)+”. For compatibility with other awks, setting RS = "" has the same effect as if blank lines are stripped from the front and back of
files and then records are determined as if RS = “\n\n+”. POSIX requires that “\n” always separates records when