'\"macro stdmacro
.if n .pH g1.lex @(#)lex	40.17 of 10/30/89
.\" Copyright 1989 AT&T
.nr X
.if \nX=0 .ds x} lex 1 "Extended Software Generation System Utilities" "\&"
.if \nX=1 .ds x} lex 1 "Extended Software Generation System Utilities"
.if \nX=2 .ds x} lex 1 "" "\&"
.if \nX=3 .ds x} lex "" "" "\&"
.TH \*(x}
.SH NAME
\f4lex\f1 \- generate programs for simple lexical tasks
.SH SYNOPSIS
\f4lex\f1
[\f4\-ctvn \-V \-Q[y|n]] [\f2file\f4]  \f1  
.SH DESCRIPTION
The
\f4lex\fP
command generates programs to be used in simple lexical analysis of text.
.PP
The input
\f2file\f1s
(standard input default)
contain strings and expressions
to be searched for and C text to be executed when these
strings are found.
.PP
\f4lex\fP
generates a file named
\f4lex.yy.c\f1.
When
\f4lex.yy.c\f1
is compiled and linked with the lex library,
it copies the input to the output
except when a string specified in the file is found.  When
a specified string is found, then
the corresponding program text is executed.
The actual string matched is left in
\f4yytext\f1,
an external character array.
Matching is done in order of the patterns in the \f2file\f1.
The patterns
may contain square brackets to indicate character classes,
as in
\f4[abx\-z]\f1
to indicate
\f4a\f1,\f4 b\f1,\f4 x\f1,
\f4y\f1, and \f4z\f1;
and the operators
\f4\(**\f1, \f4+\f1, and \f4?\f1
mean, respectively,
any non-negative number of, any positive number of, and either
zero or one occurrence of, the previous character or character class.
Thus,
\f4[a\-zA\-Z]+\f1
matches a string of letters.
The character
\f4\&.\f1
is the class of all
.SM ASCII
characters except new-line.
Parentheses for grouping and vertical bar for alternation are
also supported.
The notation
\f2r\f4{\f2d\f4,\f2e\f4}\f1
in a rule indicates between
.I d\^
and
.I e\^
instances of regular expression
.IR r .
It has higher precedence than
.IR |\| ","
but lower than \(**, ?, +,
and concatenation.
The character
\f4^\f1
at the beginning of an expression
permits a
successful match only immediately after a new-line, and the character
\f4$\f1
at the end of an expression requires a trailing new-line.
The character
\f4/\f1
in an expression indicates trailing context;
only the part of the expression up to the slash
is returned in
\f4yytext\f1,
but the remainder of the expression must follow in the input stream.
An operator character may be used as an ordinary symbol
if it is within \f4"\f1
symbols or preceded by
\f4\e\f1.
.PP
Three macros are expected:
\f4input()\f1
to read a character;
\f4unput(\f2c\f4)\f1
to replace a character read; and
\f4output(\f2c\f4)\f1
to place
an output character.
They are defined in terms
of the standard streams,
but you can override them.
The program generated is named
\f4yylex()\f1,
and the lex library contains a
\f4main()\f1
that calls it.
The action
\f4REJECT\fP
on the right side of the rule causes this
match to be rejected and the next suitable match executed;
the function
\f4yymore()\f1
accumulates additional characters
into the same
\f4yytext\f1;
and the function
\f4yyless(\f2n\f4)\f1
pushes back
\f4yyleng\f1
.B
\-n
characters into the input stream.
(\f4yyleng\f1 is an external \f4int\fP variable
giving the length of \f4yytext\f1.)
The macros
\f4input\fP
and
\f4output\fP
use files
\f4yyin\f1
and
\f4yyout\f1
to read from and write to,
defaulted to
\f4stdin\f1
and
\f4stdout\f1,
respectively.
.PP
Any line beginning with a blank is assumed
to contain only C text and is copied; if it precedes
\f4%%\f1,
it is copied into the external definition area of the
\f4lex.yy.c\f1
file.
All rules should follow a
\f4%%\f1,
as in \f4yacc\fP.
Lines preceding
\f4%%\f1
that begin with a non-blank character define
the string on the left to be the remainder of
the line; it can be called out later by surrounding it with
\f4{}\f1.
In this section,
C code (and preprocessor statements)
can also be included between \f4%{\fP and \f4%}\fP.
Note that curly brackets do not imply parentheses;
only string substitution is done.
.SH EXAMPLE
.ta +8n +8n +8n +8n
.nf
\f4	D	[0\-9]
	%{
	void
	skipcommnts(void)
	{
		for(;;)
		{
			while(input()!='*')
				;
			if(input()=\|='/')
				return;
			else
	
				unput(yytext[yyleng-1]);
		}
	}
	%}
	%%
	if	printf("IF statement\en");
	[a\-z]+	printf("tag, value %s\en",yytext);
	0{D}+	printf("octal number %s\en",yytext);
	{D}+	printf("decimal number %s\en",yytext);
	"++"	printf("unary op\en");
	"+"	printf("binary op\en");
	"\en"	;/*no action */
	"/\(**"	  skipcommnts();
	%%   \f1
.fi
.PP
The external names generated by
\f4lex\fP
all begin with the prefix
\f4yy\f1 or \f4YY\f1.
.PP
The flags must appear before any files.
.TP 9
\f4\-c\f1
Indicates C actions and is the default.
.TP 9
\f4\-t\f1
Causes the
\f4lex.yy.c\f1
program to be written instead to standard output.
.TP 9
\f4\-v\f1
Provides a two-line summary of statistics.
.TP 9
\f4\-n\f1
Will not print out the
\f4\-v \f1
summary.
.TP 9
\f4\-V\f1
Print out version
information on standard error.
.TP 9
\f4\-Q[y|n]\f1
Print out version information to output
file \f4lex.yy.c\f1 by using \f4\-Qy\f1.
The \f4\-Qn\f1 option does not
print out version information and is the default.
.PP
Multiple files are treated as a single file.
If no files are specified,
standard input is used.
.PP
Certain default table sizes are too small for some users.
The table sizes for the resulting finite state machine
can be set in the definitions section:
.RS
.TP
\f4%p\f2 n\^\f1
number of positions is
.I n\^
(default 2500)
.ns
.TP
\f4%n\f2 n\^\f1
number of states is
.I n\^
(500)
.ns
.TP
\f4%e\f2 n\^\f1
number of parse tree nodes is
.I n\^
(1000)
.ns
.TP
\f4%a\f2 n\^\f1
number of transitions is
.I n\^
(2000)
.ns
.TP
\f4%k\f2 n\^\f1
number of packed character classes is
.I n\^
(2500)
.ns
.TP
\f4%o\f2 n\^\f1
size of output array is
.I n\^
(3000)
.RE
.PP
The use of one or more of the above automatically implies the
\f4\-v\f1
option,
unless the
\f4\-n\f1
option is used.
.SH SEE ALSO
\f4yacc\fP(1).
.br
The ``\f4lex\f1'' chapter in the
\f2Programmer's Guide: ANSI C and Programming Support Tools\f1.
