awk.1 - 9base - revived minimalist port of Plan 9 userland to Unix
 (HTM) git clone git://git.suckless.org/9base
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       awk.1 (10645B)
       ---
            1 .TH AWK 1
            2 .SH NAME
            3 awk \- pattern-directed scanning and processing language
            4 .SH SYNOPSIS
            5 .B awk
            6 [
            7 .BI -F fs
            8 ]
            9 [
           10 .BI -v
           11 .I var=value
           12 ]
           13 [
           14 .BI -mr n
           15 ]
           16 [
           17 .BI -mf n
           18 ]
           19 [
           20 .B -f
           21 .I prog
           22 [
           23 .I prog
           24 ]
           25 [
           26 .I file ...
           27 ]
           28 .SH DESCRIPTION
           29 .I Awk
           30 scans each input
           31 .I file
           32 for lines that match any of a set of patterns specified literally in
           33 .IR prog
           34 or in one or more files
           35 specified as
           36 .B -f
           37 .IR file .
           38 With each pattern
           39 there can be an associated action that will be performed
           40 when a line of a
           41 .I file
           42 matches the pattern.
           43 Each line is matched against the
           44 pattern portion of every pattern-action statement;
           45 the associated action is performed for each matched pattern.
           46 The file name 
           47 .L -
           48 means the standard input.
           49 Any
           50 .IR file
           51 of the form
           52 .I var=value
           53 is treated as an assignment, not a file name,
           54 and is executed at the time it would have been opened if it were a file name.
           55 The option
           56 .B -v
           57 followed by
           58 .I var=value
           59 is an assignment to be done before
           60 .I prog
           61 is executed;
           62 any number of
           63 .B -v
           64 options may be present.
           65 .B \-F
           66 .IR fs
           67 option defines the input field separator to be the regular expression
           68 .IR fs .
           69 .PP
           70 An input line is normally made up of fields separated by white space,
           71 or by regular expression
           72 .BR FS .
           73 The fields are denoted
           74 .BR $1 ,
           75 .BR $2 ,
           76 \&..., while
           77 .B $0
           78 refers to the entire line.
           79 If
           80 .BR FS
           81 is null, the input line is split into one field per character.
           82 .PP
           83 To compensate for inadequate implementation of storage management,
           84 the 
           85 .B \-mr
           86 option can be used to set the maximum size of the input record,
           87 and the
           88 .B \-mf
           89 option to set the maximum number of fields.
           90 .PP
           91 A pattern-action statement has the form
           92 .IP
           93 .IB pattern " { " action " }
           94 .PP
           95 A missing 
           96 .BI { " action " }
           97 means print the line;
           98 a missing pattern always matches.
           99 Pattern-action statements are separated by newlines or semicolons.
          100 .PP
          101 An action is a sequence of statements.
          102 A statement can be one of the following:
          103 .PP
          104 .EX
          105 .ta \w'\fLdelete array[expression]'u
          106 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
          107 while(\fI expression \fP)\fI statement\fP
          108 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
          109 for(\fI var \fPin\fI array \fP)\fI statement\fP
          110 do\fI statement \fPwhile(\fI expression \fP)
          111 break
          112 continue
          113 {\fR [\fP\fI statement ... \fP\fR] \fP}
          114 \fIexpression\fP        #\fR commonly\fP\fI var = expression\fP
          115 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
          116 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
          117 return\fR [ \fP\fIexpression \fP\fR]\fP
          118 next        #\fR skip remaining patterns on this input line\fP
          119 nextfile        #\fR skip rest of this file, open next, start at top\fP
          120 delete\fI array\fP[\fI expression \fP]        #\fR delete an array element\fP
          121 delete\fI array\fP        #\fR delete all elements of array\fP
          122 exit\fR [ \fP\fIexpression \fP\fR]\fP        #\fR exit immediately; status is \fP\fIexpression\fP
          123 .EE
          124 .DT
          125 .PP
          126 Statements are terminated by
          127 semicolons, newlines or right braces.
          128 An empty
          129 .I expression-list
          130 stands for
          131 .BR $0 .
          132 String constants are quoted \&\fL"\ "\fR,
          133 with the usual C escapes recognized within.
          134 Expressions take on string or numeric values as appropriate,
          135 and are built using the operators
          136 .B + \- * / % ^
          137 (exponentiation), and concatenation (indicated by white space).
          138 The operators
          139 .B
          140 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
          141 are also available in expressions.
          142 Variables may be scalars, array elements
          143 (denoted
          144 .IB x  [ i ] )
          145 or fields.
          146 Variables are initialized to the null string.
          147 Array subscripts may be any string,
          148 not necessarily numeric;
          149 this allows for a form of associative memory.
          150 Multiple subscripts such as
          151 .B [i,j,k]
          152 are permitted; the constituents are concatenated,
          153 separated by the value of
          154 .BR SUBSEP .
          155 .PP
          156 The
          157 .B print
          158 statement prints its arguments on the standard output
          159 (or on a file if
          160 .BI > file
          161 or
          162 .BI >> file
          163 is present or on a pipe if
          164 .BI | cmd
          165 is present), separated by the current output field separator,
          166 and terminated by the output record separator.
          167 .I file
          168 and
          169 .I cmd
          170 may be literal names or parenthesized expressions;
          171 identical string values in different statements denote
          172 the same open file.
          173 The
          174 .B printf
          175 statement formats its expression list according to the format
          176 (see
          177 .IR fprintf (2)) .
          178 The built-in function
          179 .BI close( expr )
          180 closes the file or pipe
          181 .IR expr .
          182 The built-in function
          183 .BI fflush( expr )
          184 flushes any buffered output for the file or pipe
          185 .IR expr .
          186 .PP
          187 The mathematical functions
          188 .BR exp ,
          189 .BR log ,
          190 .BR sqrt ,
          191 .BR sin ,
          192 .BR cos ,
          193 and
          194 .BR atan2 
          195 are built in.
          196 Other built-in functions:
          197 .TF length
          198 .TP
          199 .B length
          200 the length of its argument
          201 taken as a string,
          202 or of
          203 .B $0
          204 if no argument.
          205 .TP
          206 .B rand
          207 random number on (0,1)
          208 .TP
          209 .B srand
          210 sets seed for
          211 .B rand
          212 and returns the previous seed.
          213 .TP
          214 .B int
          215 truncates to an integer value
          216 .TP
          217 .B utf
          218 converts its numerical argument, a character number, to a
          219 .SM UTF
          220 string
          221 .TP
          222 .BI substr( s , " m" , " n\fL)
          223 the
          224 .IR n -character
          225 substring of
          226 .I s
          227 that begins at position
          228 .IR m 
          229 counted from 1.
          230 .TP
          231 .BI index( s , " t" )
          232 the position in
          233 .I s
          234 where the string
          235 .I t
          236 occurs, or 0 if it does not.
          237 .TP
          238 .BI match( s , " r" )
          239 the position in
          240 .I s
          241 where the regular expression
          242 .I r
          243 occurs, or 0 if it does not.
          244 The variables
          245 .B RSTART
          246 and
          247 .B RLENGTH
          248 are set to the position and length of the matched string.
          249 .TP
          250 .BI split( s , " a" , " fs\fL)
          251 splits the string
          252 .I s
          253 into array elements
          254 .IB a [1]\f1,
          255 .IB a [2]\f1,
          256 \&...,
          257 .IB a [ n ]\f1,
          258 and returns
          259 .IR n .
          260 The separation is done with the regular expression
          261 .I fs
          262 or with the field separator
          263 .B FS
          264 if
          265 .I fs
          266 is not given.
          267 An empty string as field separator splits the string
          268 into one array element per character.
          269 .TP
          270 .BI sub( r , " t" , " s\fL)
          271 substitutes
          272 .I t
          273 for the first occurrence of the regular expression
          274 .I r
          275 in the string
          276 .IR s .
          277 If
          278 .I s
          279 is not given,
          280 .B $0
          281 is used.
          282 .TP
          283 .B gsub
          284 same as
          285 .B sub
          286 except that all occurrences of the regular expression
          287 are replaced;
          288 .B sub
          289 and
          290 .B gsub
          291 return the number of replacements.
          292 .TP
          293 .BI sprintf( fmt , " expr" , " ...\fL)
          294 the string resulting from formatting
          295 .I expr ...
          296 according to the
          297 .I printf
          298 format
          299 .I fmt
          300 .TP
          301 .BI system( cmd )
          302 executes
          303 .I cmd
          304 and returns its exit status
          305 .TP
          306 .BI tolower( str )
          307 returns a copy of
          308 .I str
          309 with all upper-case characters translated to their
          310 corresponding lower-case equivalents.
          311 .TP
          312 .BI toupper( str )
          313 returns a copy of
          314 .I str
          315 with all lower-case characters translated to their
          316 corresponding upper-case equivalents.
          317 .PD
          318 .PP
          319 The ``function''
          320 .B getline
          321 sets
          322 .B $0
          323 to the next input record from the current input file;
          324 .B getline
          325 .BI < file
          326 sets
          327 .B $0
          328 to the next record from
          329 .IR file .
          330 .B getline
          331 .I x
          332 sets variable
          333 .I x
          334 instead.
          335 Finally,
          336 .IB cmd " | getline
          337 pipes the output of
          338 .I cmd
          339 into
          340 .BR getline ;
          341 each call of
          342 .B getline
          343 returns the next line of output from
          344 .IR cmd .
          345 In all cases,
          346 .B getline
          347 returns 1 for a successful input,
          348 0 for end of file, and \-1 for an error.
          349 .PP
          350 Patterns are arbitrary Boolean combinations
          351 (with
          352 .BR "! || &&" )
          353 of regular expressions and
          354 relational expressions.
          355 Regular expressions are as in
          356 .IR regexp (6).
          357 Isolated regular expressions
          358 in a pattern apply to the entire line.
          359 Regular expressions may also occur in
          360 relational expressions, using the operators
          361 .BR ~
          362 and
          363 .BR !~ .
          364 .BI / re /
          365 is a constant regular expression;
          366 any string (constant or variable) may be used
          367 as a regular expression, except in the position of an isolated regular expression
          368 in a pattern.
          369 .PP
          370 A pattern may consist of two patterns separated by a comma;
          371 in this case, the action is performed for all lines
          372 from an occurrence of the first pattern
          373 though an occurrence of the second.
          374 .PP
          375 A relational expression is one of the following:
          376 .IP
          377 .I expression matchop regular-expression
          378 .br
          379 .I expression relop expression
          380 .br
          381 .IB expression " in " array-name
          382 .br
          383 .BI ( expr , expr,... ") in " array-name
          384 .PP
          385 where a
          386 .I relop
          387 is any of the six relational operators in C,
          388 and a
          389 .I matchop
          390 is either
          391 .B ~
          392 (matches)
          393 or
          394 .B !~
          395 (does not match).
          396 A conditional is an arithmetic expression,
          397 a relational expression,
          398 or a Boolean combination
          399 of these.
          400 .PP
          401 The special patterns
          402 .B BEGIN
          403 and
          404 .B END
          405 may be used to capture control before the first input line is read
          406 and after the last.
          407 .B BEGIN
          408 and
          409 .B END
          410 do not combine with other patterns.
          411 .PP
          412 Variable names with special meanings:
          413 .TF FILENAME
          414 .TP
          415 .B CONVFMT
          416 conversion format used when converting numbers
          417 (default
          418 .BR "%.6g" )
          419 .TP
          420 .B FS
          421 regular expression used to separate fields; also settable
          422 by option
          423 .BI \-F fs\f1.
          424 .TP
          425 .BR NF
          426 number of fields in the current record
          427 .TP
          428 .B NR
          429 ordinal number of the current record
          430 .TP
          431 .B FNR
          432 ordinal number of the current record in the current file
          433 .TP
          434 .B FILENAME
          435 the name of the current input file
          436 .TP
          437 .B RS
          438 input record separator (default newline)
          439 .TP
          440 .B OFS
          441 output field separator (default blank)
          442 .TP
          443 .B ORS
          444 output record separator (default newline)
          445 .TP
          446 .B OFMT
          447 output format for numbers (default
          448 .BR "%.6g" )
          449 .TP
          450 .B SUBSEP
          451 separates multiple subscripts (default 034)
          452 .TP
          453 .B ARGC
          454 argument count, assignable
          455 .TP
          456 .B ARGV
          457 argument array, assignable;
          458 non-null members are taken as file names
          459 .TP
          460 .B ENVIRON
          461 array of environment variables; subscripts are names.
          462 .PD
          463 .PP
          464 Functions may be defined (at the position of a pattern-action statement) thus:
          465 .IP
          466 .L
          467 function foo(a, b, c) { ...; return x }
          468 .PP
          469 Parameters are passed by value if scalar and by reference if array name;
          470 functions may be called recursively.
          471 Parameters are local to the function; all other variables are global.
          472 Thus local variables may be created by providing excess parameters in
          473 the function definition.
          474 .SH EXAMPLES
          475 .TP
          476 .L
          477 length($0) > 72
          478 Print lines longer than 72 characters.
          479 .TP
          480 .L
          481 { print $2, $1 }
          482 Print first two fields in opposite order.
          483 .PP
          484 .EX
          485 BEGIN { FS = ",[ \et]*|[ \et]+" }
          486       { print $2, $1 }
          487 .EE
          488 .ns
          489 .IP
          490 Same, with input fields separated by comma and/or blanks and tabs.
          491 .PP
          492 .EX
          493         { s += $1 }
          494 END        { print "sum is", s, " average is", s/NR }
          495 .EE
          496 .ns
          497 .IP
          498 Add up first column, print sum and average.
          499 .TP
          500 .L
          501 /start/, /stop/
          502 Print all lines between start/stop pairs.
          503 .PP
          504 .EX
          505 BEGIN        {        # Simulate echo(1)
          506         for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
          507         printf "\en"
          508         exit }
          509 .EE
          510 .SH SOURCE
          511 .B /sys/src/cmd/awk
          512 .SH SEE ALSO
          513 .IR sed (1),
          514 .IR regexp (6),
          515 .br
          516 A. V. Aho, B. W. Kernighan, P. J. Weinberger,
          517 .I
          518 The AWK Programming Language,
          519 Addison-Wesley, 1988.  ISBN 0-201-07981-X
          520 .SH BUGS
          521 There are no explicit conversions between numbers and strings.
          522 To force an expression to be treated as a number add 0 to it;
          523 to force it to be treated as a string concatenate
          524 \&\fL""\fP to it.
          525 .br
          526 The scope rules for variables in functions are a botch;
          527 the syntax is worse.