tawk: add awk(1) - plan9port - [fork] Plan 9 from user space
 (HTM) git clone git://src.adamsgaard.dk/plan9port
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
 (DIR) commit e7c5e5ed94e02e969d45d8ab74876cac53694195
 (DIR) parent 0829f75bba8a5eb60a7f46e21aa266736b4c4bab
 (HTM) Author: Michael Teichgräber <mt4swm@googlemail.com>
       Date:   Tue, 18 Aug 2009 02:40:26 -0400
       
       awk: add awk(1)
       
       http://codereview.appspot.com/104086
       
       Diffstat:
         A man/man1/awk.1                      |     560 +++++++++++++++++++++++++++++++
       
       1 file changed, 560 insertions(+), 0 deletions(-)
       ---
 (DIR) diff --git a/man/man1/awk.1 b/man/man1/awk.1
       t@@ -0,0 +1,560 @@
       +.TH AWK 1
       +.SH NAME
       +awk \- pattern-directed scanning and processing language
       +.SH SYNOPSIS
       +.B awk
       +[
       +.B -F
       +.I fs
       +]
       +[
       +.B -d
       +]
       +[
       +.BI -mf
       +.I n
       +]
       +[
       +.B -mr
       +.I n
       +]
       +[
       +.B -safe
       +]
       +[
       +.B -v
       +.I var=value
       +]
       +[
       +.B -f
       +.I progfile
       +|
       +.I prog
       +]
       +[
       +.I file ...
       +]
       +.SH DESCRIPTION
       +.I Awk
       +scans each input
       +.I file
       +for lines that match any of a set of patterns specified literally in
       +.I prog
       +or in one or more files
       +specified as
       +.B -f
       +.IR progfile .
       +With each pattern
       +there can be an associated action that will be performed
       +when a line of a
       +.I file
       +matches the pattern.
       +Each line is matched against the
       +pattern portion of every pattern-action statement;
       +the associated action is performed for each matched pattern.
       +The file name 
       +.L -
       +means the standard input.
       +Any
       +.IR file
       +of the form
       +.I var=value
       +is treated as an assignment, not a file name,
       +and is executed at the time it would have been opened if it were a file name.
       +The option
       +.B -v
       +followed by
       +.I var=value
       +is an assignment to be done before the program
       +is executed;
       +any number of
       +.B -v
       +options may be present.
       +.B -F
       +.IR fs
       +option defines the input field separator to be the regular expression
       +.IR fs .
       +.PP
       +An input line is normally made up of fields separated by white space,
       +or by regular expression
       +.BR FS .
       +The fields are denoted
       +.BR $1 ,
       +.BR $2 ,
       +\&..., while
       +.B $0
       +refers to the entire line.
       +If
       +.BR FS
       +is null, the input line is split into one field per character.
       +.PP
       +To compensate for inadequate implementation of storage management,
       +the 
       +.B -mr
       +option can be used to set the maximum size of the input record,
       +and the
       +.B -mf
       +option to set the maximum number of fields.
       +.PP
       +The
       +.B -safe
       +option causes
       +.I awk
       +to run in 
       +``safe mode,''
       +in which it is not allowed to 
       +run shell commands or open files
       +and the environment is not made available
       +in the 
       +.B ENVIRON
       +variable.
       +.PP
       +A pattern-action statement has the form
       +.IP
       +.IB pattern " { " action " }
       +.PP
       +A missing 
       +.BI { " action " }
       +means print the line;
       +a missing pattern always matches.
       +Pattern-action statements are separated by newlines or semicolons.
       +.PP
       +An action is a sequence of statements.
       +A statement can be one of the following:
       +.PP
       +.EX
       +.ta \w'\fLdelete array[expression]'u
       +if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
       +while(\fI expression \fP)\fI statement\fP
       +for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
       +for(\fI var \fPin\fI array \fP)\fI statement\fP
       +do\fI statement \fPwhile(\fI expression \fP)
       +break
       +continue
       +{\fR [\fP\fI statement ... \fP\fR] \fP}
       +\fIexpression\fP        #\fR commonly\fP\fI var = expression\fP
       +print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
       +printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
       +return\fR [ \fP\fIexpression \fP\fR]\fP
       +next        #\fR skip remaining patterns on this input line\fP
       +nextfile        #\fR skip rest of this file, open next, start at top\fP
       +delete\fI array\fP[\fI expression \fP]        #\fR delete an array element\fP
       +delete\fI array\fP        #\fR delete all elements of array\fP
       +exit\fR [ \fP\fIexpression \fP\fR]\fP        #\fR exit immediately; status is \fP\fIexpression\fP
       +.EE
       +.DT
       +.PP
       +Statements are terminated by
       +semicolons, newlines or right braces.
       +An empty
       +.I expression-list
       +stands for
       +.BR $0 .
       +String constants are quoted \&\fL"\ "\fR,
       +with the usual C escapes recognized within.
       +Expressions take on string or numeric values as appropriate,
       +and are built using the operators
       +.B + \- * / % ^
       +(exponentiation), and concatenation (indicated by white space).
       +The operators
       +.B
       +! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
       +are also available in expressions.
       +Variables may be scalars, array elements
       +(denoted
       +.IB x  [ i ] )
       +or fields.
       +Variables are initialized to the null string.
       +Array subscripts may be any string,
       +not necessarily numeric;
       +this allows for a form of associative memory.
       +Multiple subscripts such as
       +.B [i,j,k]
       +are permitted; the constituents are concatenated,
       +separated by the value of
       +.BR SUBSEP .
       +.PP
       +The
       +.B print
       +statement prints its arguments on the standard output
       +(or on a file if
       +.BI > file
       +or
       +.BI >> file
       +is present or on a pipe if
       +.BI | cmd
       +is present), separated by the current output field separator,
       +and terminated by the output record separator.
       +.I file
       +and
       +.I cmd
       +may be literal names or parenthesized expressions;
       +identical string values in different statements denote
       +the same open file.
       +The
       +.B printf
       +statement formats its expression list according to the format
       +(see
       +.IR fprintf (3)) .
       +The built-in function
       +.BI close( expr )
       +closes the file or pipe
       +.IR expr .
       +The built-in function
       +.BI fflush( expr )
       +flushes any buffered output for the file or pipe
       +.IR expr .
       +If
       +.IR expr
       +is omitted or is a null string, all open files are flushed.
       +.PP
       +The mathematical functions
       +.BR exp ,
       +.BR log ,
       +.BR sqrt ,
       +.BR sin ,
       +.BR cos ,
       +and
       +.BR atan2 
       +are built in.
       +Other built-in functions:
       +.TF length
       +.TP
       +.B length
       +If its argument is a string, the string's length is returned.
       +If its argument is an array, the number of subscripts in the array is returned.
       +If no argument, the length of
       +.B $0
       +is returned.
       +.TP
       +.B rand
       +random number on (0,1)
       +.TP
       +.B srand
       +sets seed for
       +.B rand
       +and returns the previous seed.
       +.TP
       +.B int
       +truncates to an integer value
       +.TP
       +.B utf
       +converts its numerical argument, a character number, to a
       +.SM UTF
       +string
       +.TP
       +.BI substr( s , " m" , " n\fL)
       +the
       +.IR n -character
       +substring of
       +.I s
       +that begins at position
       +.IR m 
       +counted from 1.
       +.TP
       +.BI index( s , " t" )
       +the position in
       +.I s
       +where the string
       +.I t
       +occurs, or 0 if it does not.
       +.TP
       +.BI match( s , " r" )
       +the position in
       +.I s
       +where the regular expression
       +.I r
       +occurs, or 0 if it does not.
       +The variables
       +.B RSTART
       +and
       +.B RLENGTH
       +are set to the position and length of the matched string.
       +.TP
       +.BI split( s , " a" , " fs\fL)
       +splits the string
       +.I s
       +into array elements
       +.IB a [1]\f1,
       +.IB a [2]\f1,
       +\&...,
       +.IB a [ n ]\f1,
       +and returns
       +.IR n .
       +The separation is done with the regular expression
       +.I fs
       +or with the field separator
       +.B FS
       +if
       +.I fs
       +is not given.
       +An empty string as field separator splits the string
       +into one array element per character.
       +.TP
       +.BI sub( r , " t" , " s\fL)
       +substitutes
       +.I t
       +for the first occurrence of the regular expression
       +.I r
       +in the string
       +.IR s .
       +If
       +.I s
       +is not given,
       +.B $0
       +is used.
       +.TP
       +.B gsub
       +same as
       +.B sub
       +except that all occurrences of the regular expression
       +are replaced;
       +.B sub
       +and
       +.B gsub
       +return the number of replacements.
       +.TP
       +.BI sprintf( fmt , " expr" , " ...\fL)
       +the string resulting from formatting
       +.I expr ...
       +according to the
       +.I printf
       +format
       +.I fmt
       +.TP
       +.BI system( cmd )
       +executes
       +.I cmd
       +and returns its exit status
       +.TP
       +.BI tolower( str )
       +returns a copy of
       +.I str
       +with all upper-case characters translated to their
       +corresponding lower-case equivalents.
       +.TP
       +.BI toupper( str )
       +returns a copy of
       +.I str
       +with all lower-case characters translated to their
       +corresponding upper-case equivalents.
       +.PD
       +.PP
       +The ``function''
       +.B getline
       +sets
       +.B $0
       +to the next input record from the current input file;
       +.B getline
       +.BI < file
       +sets
       +.B $0
       +to the next record from
       +.IR file .
       +.B getline
       +.I x
       +sets variable
       +.I x
       +instead.
       +Finally,
       +.IB cmd " | getline
       +pipes the output of
       +.I cmd
       +into
       +.BR getline ;
       +each call of
       +.B getline
       +returns the next line of output from
       +.IR cmd .
       +In all cases,
       +.B getline
       +returns 1 for a successful input,
       +0 for end of file, and \-1 for an error.
       +.PP
       +Patterns are arbitrary Boolean combinations
       +(with
       +.BR "! || &&" )
       +of regular expressions and
       +relational expressions.
       +Regular expressions are as in
       +.IR regexp (7).
       +Isolated regular expressions
       +in a pattern apply to the entire line.
       +Regular expressions may also occur in
       +relational expressions, using the operators
       +.BR ~
       +and
       +.BR !~ .
       +.BI / re /
       +is a constant regular expression;
       +any string (constant or variable) may be used
       +as a regular expression, except in the position of an isolated regular expression
       +in a pattern.
       +.PP
       +A pattern may consist of two patterns separated by a comma;
       +in this case, the action is performed for all lines
       +from an occurrence of the first pattern
       +though an occurrence of the second.
       +.PP
       +A relational expression is one of the following:
       +.IP
       +.I expression matchop regular-expression
       +.br
       +.I expression relop expression
       +.br
       +.IB expression " in " array-name
       +.br
       +.BI ( expr , expr,... ") in " array-name
       +.PP
       +where a
       +.I relop
       +is any of the six relational operators in C,
       +and a
       +.I matchop
       +is either
       +.B ~
       +(matches)
       +or
       +.B !~
       +(does not match).
       +A conditional is an arithmetic expression,
       +a relational expression,
       +or a Boolean combination
       +of these.
       +.PP
       +The special patterns
       +.B BEGIN
       +and
       +.B END
       +may be used to capture control before the first input line is read
       +and after the last.
       +.B BEGIN
       +and
       +.B END
       +do not combine with other patterns.
       +.PP
       +Variable names with special meanings:
       +.TF FILENAME
       +.TP
       +.B CONVFMT
       +conversion format used when converting numbers
       +(default
       +.BR "%.6g" )
       +.TP
       +.B FS
       +regular expression used to separate fields; also settable
       +by option
       +.BI \-F fs\f1.
       +.TP
       +.BR NF
       +number of fields in the current record
       +.TP
       +.B NR
       +ordinal number of the current record
       +.TP
       +.B FNR
       +ordinal number of the current record in the current file
       +.TP
       +.B FILENAME
       +the name of the current input file
       +.TP
       +.B RS
       +input record separator (default newline)
       +.TP
       +.B OFS
       +output field separator (default blank)
       +.TP
       +.B ORS
       +output record separator (default newline)
       +.TP
       +.B OFMT
       +output format for numbers (default
       +.BR "%.6g" )
       +.TP
       +.B SUBSEP
       +separates multiple subscripts (default 034)
       +.TP
       +.B ARGC
       +argument count, assignable
       +.TP
       +.B ARGV
       +argument array, assignable;
       +non-null members are taken as file names
       +.TP
       +.B ENVIRON
       +array of environment variables; subscripts are names.
       +.PD
       +.PP
       +Functions may be defined (at the position of a pattern-action statement) thus:
       +.IP
       +.L
       +function foo(a, b, c) { ...; return x }
       +.PP
       +Parameters are passed by value if scalar and by reference if array name;
       +functions may be called recursively.
       +Parameters are local to the function; all other variables are global.
       +Thus local variables may be created by providing excess parameters in
       +the function definition.
       +.SH EXAMPLES
       +.TP
       +.L
       +length($0) > 72
       +Print lines longer than 72 characters.
       +.TP
       +.L
       +{ print $2, $1 }
       +Print first two fields in opposite order.
       +.PP
       +.EX
       +BEGIN { FS = ",[ \et]*|[ \et]+" }
       +      { print $2, $1 }
       +.EE
       +.ns
       +.IP
       +Same, with input fields separated by comma and/or blanks and tabs.
       +.PP
       +.EX
       +        { s += $1 }
       +END        { print "sum is", s, " average is", s/NR }
       +.EE
       +.ns
       +.IP
       +Add up first column, print sum and average.
       +.TP
       +.L
       +/start/, /stop/
       +Print all lines between start/stop pairs.
       +.PP
       +.EX
       +BEGIN        {        # Simulate echo(1)
       +        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
       +        printf "\en"
       +        exit }
       +.EE
       +.SH SOURCE
       +.B \*9/src/cmd/awk
       +.SH SEE ALSO
       +.IR sed (1),
       +.IR regexp (7),
       +.br
       +A. V. Aho, B. W. Kernighan, P. J. Weinberger,
       +.I
       +The AWK Programming Language,
       +Addison-Wesley, 1988.  ISBN 0-201-07981-X
       +.SH BUGS
       +There are no explicit conversions between numbers and strings.
       +To force an expression to be treated as a number add 0 to it;
       +to force it to be treated as a string concatenate
       +\&\fL""\fP to it.
       +.br
       +The scope rules for variables in functions are a botch;
       +the syntax is worse.
       +.br
       +UTF is not always dealt with correctly,
       +though
       +.I awk
       +does make an attempt to do so.
       +The
       +.I split
       +function with an empty string as final argument now copes
       +with UTF in the string being split.