'\"macro stdmacro
.if n .pH g3g.regexpr @(#)regexpr	40.8 of 10/27/89
.\" Copyright 1989 AT&T
'\"macro stdmacro
.nr X
.if \nX=0 .ds x} regexpr 3G "" "\&"
.if \nX=1 .ds x} regexpr 3G ""
.if \nX=2 .ds x} regexpr 3G "" "\&"
.if \nX=3 .ds x} regexpr "" "" "\&"
.TH \*(x}
.SH "NAME"
\f4regexpr\f1: \f4compile\f1, \f4step\f1, \f4advance\f1 \- regular expression compile and match routines
.SH SYNOPSIS
\f4cc\f1
[\f2flag\fP \|.\|.\|.] \f2file\fP \|.\|.\|.
\f4\-lgen\f1
[\f2library\fP \|.\|.\|.]
.PP
\f4#include <regexpr.h>\f1
.PP
\f4char \(**compile (const char \(**instring, char \(**expbuf, char \(**endbuf);\f1
.PP
\f4int step (const char \(**string, char \(**expbuf);\f1
.PP
\f4int advance (const char \(**string, char \(**expbuf);\f1
.PP
\f4extern char \(**loc1, \(**loc2, \(**locs;\f1
.PP
\f4extern int nbra, regerrno, reglength;\f1
.PP
\f4extern char \(**braslist[], \(**braelist[];\f1
.SH DESCRIPTION
These routines are used to compile regular expressions and match the compiled 
expressions against lines.  The regular expressions compiled are in the form used by
\f4ed\fP.
.PP
The syntax of the
\f4compile\fP
routine is as follows:
.PP
.RS
.ft 4
compile (instring, expbuf, endbuf)
.ft 1
.RE
.PP
The parameter
.I instring\^
is a null-terminated string representing the regular expression.
.PP
The parameter
.I expbuf\^
points to the place where the compiled
regular expression is to be placed.  If \f2expbuf\f1 is \f4NULL\fP,
\f4compile\fP uses \f4malloc\fP to allocate the space for
the compiled regular expression.  If an error occurs, this space is 
freed.  It is the user's responsibility to free unneeded space after
the compiled regular expression is no longer needed.
.PP
The parameter
.I endbuf\^
is one more than the highest address where the compiled regular
expression may be placed.  This argument is ignored if \f2expbuf\f1
is \f4NULL\fP.  If the compiled expression cannot fit in
.RI ( endbuf \- expbuf )
bytes, \f4compile\fP returns \f4NULL\fP and \f4regerrno\f1
(see below) is set to 50.
.PP
If \f4compile\fP succeeds, it returns a non-\f4NULL\fP
pointer whose value depends on \f2expbuf\f1.  If \f2expbuf\f1
is non-\f4NULL\fP, \f4compile\fP returns a pointer to 
the byte after the last byte in the compiled regular expression.  
The length of the compiled regular expression is stored in
\f4reglength\f1.  Otherwise, \f4compile\fP 
returns a pointer to the space allocated by \f4malloc\fP.
.PP
If an error is detected when compiling the regular expression,
a \f4NULL\fP pointer is returned from \f4compile\fP and
\f4regerrno\f1 is set to one of the non-zero error numbers
indicated below:
.PP
.TS
center ;
c c
n l .
ERROR	MEANING
_
11	Range endpoint too large.
16	Bad number.
25	``\f4\e\fPdigit'' out of range.
36	Illegal or missing delimiter.
41	No remembered search string.
42	\f4\e(\|~\e)\fP imbalance.
43	Too many \f4\e(\fP.
44	More than 2 numbers given in \f4\e{\|~\e}\fP.
45	\f4}\fP expected after \f4\e\fP.
46	First number exceeds second in \f4\e{\|~\e}\fP.
49	\f4[ ]\fP imbalance.
50	Regular expression overflow.
.TE
.PP
The call
to
\f4step\fP
is as follows:
.PP
.RS
.ft 4
step (string, expbuf)
.ft 1
.RE
.PP
The first parameter to
\f4step\fP
is a pointer to a string of characters to be checked for a match.
This string should be null-terminated.
.PP
The parameter
.I expbuf
is the compiled regular expression obtained
by a call of the function
\f4compile\fP.
.PP
The function
\f4step\fP
returns non-zero if the given string matches the regular expression,
and zero if the expressions do not match.  If there is a match,
two external character pointers are set as a side effect to the call to
\f4step\fP.
The variable set in
\f4step\fP
is
\f4loc1\fP.
\f4loc1\fP
is a pointer to the first character that matched the regular expression.
The variable
\f4loc2\fP
points to the character after the last character that matches
the regular expression.  Thus if the regular expression matches
the entire line,
\f4loc1\fP
points to the first character of
.I string\^
and
\f4loc2\fP
points to the null at the end of
.IR string .
.PP
The purpose of
\f4step\fP
is to step through the
.I string\^
argument until a match is found or until the end of
.I string\^
is reached.  If the regular expression begins with
\f4^\f1,
\f4step\fP
tries to match the regular expression at the beginning
of the string only.
.PP
The function
\f4advance\fP
has the same arguments and side effects as
\f4step\fP, 
but it always restricts 
matches to the beginning of the string.
.PP
If one is looking for successive matches in the same string
of characters, \f4locs\f1 should be set equal to \f4loc2\f1,
and \f4step\fP should be called with
\f2string\f1 equal to \f4loc2\f1.  \f4locs\f1 is
used by commands like
\f4ed\fP
and
\f4sed\fP
so that global substitutions like
\f4s/y\(**//g\f1
do not loop forever, and
is \f4NULL\fP by default.
.PP
The external variable
\f4nbra\fP
is used to determine the number of subexpressions in the
compiled regular expression.  \f4braslist\f1 and \f4braelist\f1
are arrays of character pointers that point to the start
and end of the \f4nbra\f1 subexpressions in the matched string.  
For example, after calling \f4step\f1 or \f4advance\f1
with string \f4sabcdefg\f1 and
regular expression \f4\\(abcdef\\\)\fP, \f4braslist[0]\f1 
will point at \f4a\f1 and \f4braelist[0]\f1 will point at \f4g\f1.
These arrays are used by commands like \f4ed\f1 
and \f4sed\f1 for substitute replacement patterns
that contain the \f4\\\f2n\f1 notation for subexpressions. 
.PP
Note that it isn't necessary to use the external variables
\f4regerrno\f1, \f4nbra\f1, \f4loc1\f1, \f4loc2\f1
\f4locs\f1, \f4braelist\f1, and \f4braslist\f1
if one is only checking whether or not a string matches
a regular expression.
.SH EXAMPLES
The following is similar to the regular expression code
from \f4grep\fP:
.PP
.RS
.ft 4
.nf
#include <regexpr.h>
.sp 0.5
\&\. \. \.
if(compile(\(**argv, (char \(**)0, (char \(**)0) =\^= (char \(**)0)
	regerr(regerrno);
\&\. \. \.
if (step(linebuf, expbuf))
	succeed();
.fi
.ft 1
.RE
.SH "SEE ALSO"
\f4regexp\fP(5).
.br
\f4ed\fP(1), \f4grep\fP(1), \f4sed\fP(1) in the \f2User's Reference Manual\f1.
