tawk.1 - plan9port - [fork] Plan 9 from user space
 (HTM) git clone git://src.adamsgaard.dk/plan9port
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       tawk.1 (11268B)
       ---
            1 .TH AWK 1
            2 .SH NAME
            3 awk \- pattern-directed scanning and processing language
            4 .SH SYNOPSIS
            5 .B awk
            6 [
            7 .B -F
            8 .I fs
            9 ]
           10 [
           11 .B -d
           12 ]
           13 [
           14 .BI -mf
           15 .I n
           16 ]
           17 [
           18 .B -mr
           19 .I n
           20 ]
           21 [
           22 .B -safe
           23 ]
           24 [
           25 .B -v
           26 .I var=value
           27 ]
           28 [
           29 .B -f
           30 .I progfile
           31 |
           32 .I prog
           33 ]
           34 [
           35 .I file ...
           36 ]
           37 .SH DESCRIPTION
           38 .I Awk
           39 scans each input
           40 .I file
           41 for lines that match any of a set of patterns specified literally in
           42 .I prog
           43 or in one or more files
           44 specified as
           45 .B -f
           46 .IR progfile .
           47 With each pattern
           48 there can be an associated action that will be performed
           49 when a line of a
           50 .I file
           51 matches the pattern.
           52 Each line is matched against the
           53 pattern portion of every pattern-action statement;
           54 the associated action is performed for each matched pattern.
           55 The file name 
           56 .L -
           57 means the standard input.
           58 Any
           59 .IR file
           60 of the form
           61 .I var=value
           62 is treated as an assignment, not a file name,
           63 and is executed at the time it would have been opened if it were a file name.
           64 The option
           65 .B -v
           66 followed by
           67 .I var=value
           68 is an assignment to be done before the program
           69 is executed;
           70 any number of
           71 .B -v
           72 options may be present.
           73 .B -F
           74 .IR fs
           75 option defines the input field separator to be the regular expression
           76 .IR fs .
           77 .PP
           78 An input line is normally made up of fields separated by white space,
           79 or by regular expression
           80 .BR FS .
           81 The fields are denoted
           82 .BR $1 ,
           83 .BR $2 ,
           84 \&..., while
           85 .B $0
           86 refers to the entire line.
           87 If
           88 .BR FS
           89 is null, the input line is split into one field per character.
           90 .PP
           91 To compensate for inadequate implementation of storage management,
           92 the 
           93 .B -mr
           94 option can be used to set the maximum size of the input record,
           95 and the
           96 .B -mf
           97 option to set the maximum number of fields.
           98 .PP
           99 The
          100 .B -safe
          101 option causes
          102 .I awk
          103 to run in 
          104 ``safe mode,''
          105 in which it is not allowed to 
          106 run shell commands or open files
          107 and the environment is not made available
          108 in the 
          109 .B ENVIRON
          110 variable.
          111 .PP
          112 A pattern-action statement has the form
          113 .IP
          114 .IB pattern " { " action " }
          115 .PP
          116 A missing 
          117 .BI { " action " }
          118 means print the line;
          119 a missing pattern always matches.
          120 Pattern-action statements are separated by newlines or semicolons.
          121 .PP
          122 An action is a sequence of statements.
          123 A statement can be one of the following:
          124 .PP
          125 .EX
          126 .ta \w'\fLdelete array[expression]'u
          127 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
          128 while(\fI expression \fP)\fI statement\fP
          129 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
          130 for(\fI var \fPin\fI array \fP)\fI statement\fP
          131 do\fI statement \fPwhile(\fI expression \fP)
          132 break
          133 continue
          134 {\fR [\fP\fI statement ... \fP\fR] \fP}
          135 \fIexpression\fP        #\fR commonly\fP\fI var = expression\fP
          136 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
          137 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
          138 return\fR [ \fP\fIexpression \fP\fR]\fP
          139 next        #\fR skip remaining patterns on this input line\fP
          140 nextfile        #\fR skip rest of this file, open next, start at top\fP
          141 delete\fI array\fP[\fI expression \fP]        #\fR delete an array element\fP
          142 delete\fI array\fP        #\fR delete all elements of array\fP
          143 exit\fR [ \fP\fIexpression \fP\fR]\fP        #\fR exit immediately; status is \fP\fIexpression\fP
          144 .EE
          145 .DT
          146 .PP
          147 Statements are terminated by
          148 semicolons, newlines or right braces.
          149 An empty
          150 .I expression-list
          151 stands for
          152 .BR $0 .
          153 String constants are quoted \&\fL"\ "\fR,
          154 with the usual C escapes recognized within.
          155 Expressions take on string or numeric values as appropriate,
          156 and are built using the operators
          157 .B + \- * / % ^
          158 (exponentiation), and concatenation (indicated by white space).
          159 The operators
          160 .B
          161 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
          162 are also available in expressions.
          163 Variables may be scalars, array elements
          164 (denoted
          165 .IB x  [ i ] )
          166 or fields.
          167 Variables are initialized to the null string.
          168 Array subscripts may be any string,
          169 not necessarily numeric;
          170 this allows for a form of associative memory.
          171 Multiple subscripts such as
          172 .B [i,j,k]
          173 are permitted; the constituents are concatenated,
          174 separated by the value of
          175 .BR SUBSEP .
          176 .PP
          177 The
          178 .B print
          179 statement prints its arguments on the standard output
          180 (or on a file if
          181 .BI > file
          182 or
          183 .BI >> file
          184 is present or on a pipe if
          185 .BI | cmd
          186 is present), separated by the current output field separator,
          187 and terminated by the output record separator.
          188 .I file
          189 and
          190 .I cmd
          191 may be literal names or parenthesized expressions;
          192 identical string values in different statements denote
          193 the same open file.
          194 The
          195 .B printf
          196 statement formats its expression list according to the format
          197 (see
          198 .IR fprintf (3)) .
          199 The built-in function
          200 .BI close( expr )
          201 closes the file or pipe
          202 .IR expr .
          203 The built-in function
          204 .BI fflush( expr )
          205 flushes any buffered output for the file or pipe
          206 .IR expr .
          207 If
          208 .IR expr
          209 is omitted or is a null string, all open files are flushed.
          210 .PP
          211 The mathematical functions
          212 .BR exp ,
          213 .BR log ,
          214 .BR sqrt ,
          215 .BR sin ,
          216 .BR cos ,
          217 and
          218 .BR atan2 
          219 are built in.
          220 Other built-in functions:
          221 .TF length
          222 .TP
          223 .B length
          224 If its argument is a string, the string's length is returned.
          225 If its argument is an array, the number of subscripts in the array is returned.
          226 If no argument, the length of
          227 .B $0
          228 is returned.
          229 .TP
          230 .B rand
          231 random number on (0,1)
          232 .TP
          233 .B srand
          234 sets seed for
          235 .B rand
          236 and returns the previous seed.
          237 .TP
          238 .B int
          239 truncates to an integer value
          240 .TP
          241 .B utf
          242 converts its numerical argument, a character number, to a
          243 .SM UTF
          244 string
          245 .TP
          246 .BI substr( s , " m" , " n\fL)
          247 the
          248 .IR n -character
          249 substring of
          250 .I s
          251 that begins at position
          252 .IR m 
          253 counted from 1.
          254 .TP
          255 .BI index( s , " t" )
          256 the position in
          257 .I s
          258 where the string
          259 .I t
          260 occurs, or 0 if it does not.
          261 .TP
          262 .BI match( s , " r" )
          263 the position in
          264 .I s
          265 where the regular expression
          266 .I r
          267 occurs, or 0 if it does not.
          268 The variables
          269 .B RSTART
          270 and
          271 .B RLENGTH
          272 are set to the position and length of the matched string.
          273 .TP
          274 .BI split( s , " a" , " fs\fL)
          275 splits the string
          276 .I s
          277 into array elements
          278 .IB a [1]\f1,
          279 .IB a [2]\f1,
          280 \&...,
          281 .IB a [ n ]\f1,
          282 and returns
          283 .IR n .
          284 The separation is done with the regular expression
          285 .I fs
          286 or with the field separator
          287 .B FS
          288 if
          289 .I fs
          290 is not given.
          291 An empty string as field separator splits the string
          292 into one array element per character.
          293 .TP
          294 .BI sub( r , " t" , " s\fL)
          295 substitutes
          296 .I t
          297 for the first occurrence of the regular expression
          298 .I r
          299 in the string
          300 .IR s .
          301 If
          302 .I s
          303 is not given,
          304 .B $0
          305 is used.
          306 .TP
          307 .B gsub
          308 same as
          309 .B sub
          310 except that all occurrences of the regular expression
          311 are replaced;
          312 .B sub
          313 and
          314 .B gsub
          315 return the number of replacements.
          316 .TP
          317 .BI sprintf( fmt , " expr" , " ...\fL)
          318 the string resulting from formatting
          319 .I expr ...
          320 according to the
          321 .I printf
          322 format
          323 .I fmt
          324 .TP
          325 .BI system( cmd )
          326 executes
          327 .I cmd
          328 and returns its exit status
          329 .TP
          330 .BI tolower( str )
          331 returns a copy of
          332 .I str
          333 with all upper-case characters translated to their
          334 corresponding lower-case equivalents.
          335 .TP
          336 .BI toupper( str )
          337 returns a copy of
          338 .I str
          339 with all lower-case characters translated to their
          340 corresponding upper-case equivalents.
          341 .PD
          342 .PP
          343 The ``function''
          344 .B getline
          345 sets
          346 .B $0
          347 to the next input record from the current input file;
          348 .B getline
          349 .BI < file
          350 sets
          351 .B $0
          352 to the next record from
          353 .IR file .
          354 .B getline
          355 .I x
          356 sets variable
          357 .I x
          358 instead.
          359 Finally,
          360 .IB cmd " | getline
          361 pipes the output of
          362 .I cmd
          363 into
          364 .BR getline ;
          365 each call of
          366 .B getline
          367 returns the next line of output from
          368 .IR cmd .
          369 In all cases,
          370 .B getline
          371 returns 1 for a successful input,
          372 0 for end of file, and \-1 for an error.
          373 .PP
          374 Patterns are arbitrary Boolean combinations
          375 (with
          376 .BR "! || &&" )
          377 of regular expressions and
          378 relational expressions.
          379 Regular expressions are as in
          380 .MR regexp (7) .
          381 Isolated regular expressions
          382 in a pattern apply to the entire line.
          383 Regular expressions may also occur in
          384 relational expressions, using the operators
          385 .BR ~
          386 and
          387 .BR !~ .
          388 .BI / re /
          389 is a constant regular expression;
          390 any string (constant or variable) may be used
          391 as a regular expression, except in the position of an isolated regular expression
          392 in a pattern.
          393 .PP
          394 A pattern may consist of two patterns separated by a comma;
          395 in this case, the action is performed for all lines
          396 from an occurrence of the first pattern
          397 though an occurrence of the second.
          398 .PP
          399 A relational expression is one of the following:
          400 .IP
          401 .I expression matchop regular-expression
          402 .br
          403 .I expression relop expression
          404 .br
          405 .IB expression " in " array-name
          406 .br
          407 .BI ( expr , expr,... ") in " array-name
          408 .PP
          409 where a
          410 .I relop
          411 is any of the six relational operators in C,
          412 and a
          413 .I matchop
          414 is either
          415 .B ~
          416 (matches)
          417 or
          418 .B !~
          419 (does not match).
          420 A conditional is an arithmetic expression,
          421 a relational expression,
          422 or a Boolean combination
          423 of these.
          424 .PP
          425 The special patterns
          426 .B BEGIN
          427 and
          428 .B END
          429 may be used to capture control before the first input line is read
          430 and after the last.
          431 .B BEGIN
          432 and
          433 .B END
          434 do not combine with other patterns.
          435 .PP
          436 Variable names with special meanings:
          437 .TF FILENAME
          438 .TP
          439 .B CONVFMT
          440 conversion format used when converting numbers
          441 (default
          442 .BR "%.6g" )
          443 .TP
          444 .B FS
          445 regular expression used to separate fields; also settable
          446 by option
          447 .BI \-F fs\f1.
          448 .TP
          449 .BR NF
          450 number of fields in the current record
          451 .TP
          452 .B NR
          453 ordinal number of the current record
          454 .TP
          455 .B FNR
          456 ordinal number of the current record in the current file
          457 .TP
          458 .B FILENAME
          459 the name of the current input file
          460 .TP
          461 .B RS
          462 input record separator (default newline)
          463 .TP
          464 .B OFS
          465 output field separator (default blank)
          466 .TP
          467 .B ORS
          468 output record separator (default newline)
          469 .TP
          470 .B OFMT
          471 output format for numbers (default
          472 .BR "%.6g" )
          473 .TP
          474 .B SUBSEP
          475 separates multiple subscripts (default 034)
          476 .TP
          477 .B ARGC
          478 argument count, assignable
          479 .TP
          480 .B ARGV
          481 argument array, assignable;
          482 non-null members are taken as file names
          483 .TP
          484 .B ENVIRON
          485 array of environment variables; subscripts are names.
          486 .PD
          487 .PP
          488 Functions may be defined (at the position of a pattern-action statement) thus:
          489 .IP
          490 .L
          491 function foo(a, b, c) { ...; return x }
          492 .PP
          493 Parameters are passed by value if scalar and by reference if array name;
          494 functions may be called recursively.
          495 Parameters are local to the function; all other variables are global.
          496 Thus local variables may be created by providing excess parameters in
          497 the function definition.
          498 .SH EXAMPLES
          499 .TP
          500 .L
          501 length($0) > 72
          502 Print lines longer than 72 characters.
          503 .TP
          504 .L
          505 { print $2, $1 }
          506 Print first two fields in opposite order.
          507 .PP
          508 .EX
          509 BEGIN { FS = ",[ \et]*|[ \et]+" }
          510       { print $2, $1 }
          511 .EE
          512 .ns
          513 .IP
          514 Same, with input fields separated by comma and/or blanks and tabs.
          515 .PP
          516 .EX
          517         { s += $1 }
          518 END        { print "sum is", s, " average is", s/NR }
          519 .EE
          520 .ns
          521 .IP
          522 Add up first column, print sum and average.
          523 .TP
          524 .L
          525 /start/, /stop/
          526 Print all lines between start/stop pairs.
          527 .PP
          528 .EX
          529 BEGIN        {        # Simulate echo(1)
          530         for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
          531         printf "\en"
          532         exit }
          533 .EE
          534 .SH SOURCE
          535 .B \*9/src/cmd/awk
          536 .SH SEE ALSO
          537 .MR sed (1) ,
          538 .MR regexp (7) ,
          539 .br
          540 A. V. Aho, B. W. Kernighan, P. J. Weinberger,
          541 .I
          542 The AWK Programming Language,
          543 Addison-Wesley, 1988.  ISBN 0-201-07981-X
          544 .SH BUGS
          545 There are no explicit conversions between numbers and strings.
          546 To force an expression to be treated as a number add 0 to it;
          547 to force it to be treated as a string concatenate
          548 \&\fL""\fP to it.
          549 .PP
          550 The scope rules for variables in functions are a botch;
          551 the syntax is worse.
          552 .PP
          553 UTF is not always dealt with correctly,
          554 though
          555 .I awk
          556 does make an attempt to do so.
          557 The
          558 .I split
          559 function with an empty string as final argument now copes
          560 with UTF in the string being split.