Newsgroups: comp.editors
Path: utzoo!sq!lee
From: lee@sq.sq.com (Liam R. E. Quin)
Subject: Re: Multiple line regexps
Message-ID: <1991Jun4.215913.25633@sq.sq.com>
Keywords: regexp, multiple lines
Organization: SoftQuad Inc., Toronto, Canada
X-Feet: bare
References: <1991Jun2.231351.10229@trl.oz.au>
Distribution: comp
Date: Tue, 4 Jun 91 21:59:13 GMT
Lines: 67

soh@andromeda.trl.OZ.AU (kam hung soh) writes:
>I would like to write a regular expression which can look for patterns
>longer than one line.  For example, I want to find the first line of
>each paragraph.  If I try this regexp in grep or awk, /^$^.+$/, nothing
>happens.

Although you can't match across a newline with /^$^.+$/ in most Unix
software, you can get what you want.  You _could_ do it in lex, by the way,
and that would be sensible if you were going to do the same thing often.

You can do this in sed or awk, and also in ex or vi, with a little cleverness.
Here's how in ex or vi....

First, we could print all blank (empty) lines with
	:g/^$/p
The command
	g reg-exp command
tells the editor (vi, ex, ed) to run the command on every line that matches
the pattern.  The command is pretty unrestricted, although it can't be another
global (g) command...

Well, that prints all the blank lines.
We could print all lines after a blank line:
	:g/^$/+1p
but that isn't quite right, because it goes wrong if there are two blank
lines in a row.  Ah! that's why you had /^$.+$/ and not /^$.*$/.  I see...
OK, we could do this:
	:g/^$/+1s/./&/p
This says that on the line after each blank line, try to substitute a single
character for itself (&), and if that worked print the line.
This is OK except that if the last line in the file is blank the +1 is wrong,
so we must omit the last line, and do the command on 1,$-1:
	:1,$-1g/^$/.+1s/./&/p
Wow!  well, that's plausible.

In sed, we could use the Hold space.  I won't do that here, as it's a little
confusing to describe...

In awk, though, we could do this:
	awk '
	/^./ {
	    if (last == "") print
	}

	{
	    last = $0
	}'

You can be terser with some versions of awk:
	awk '/^./{ if (last == "") print} { last = $0 }'

If you have mgrep of Gnu grep, you could also grep for blank lines,
with one line of context, and grep for . on the result.

So none of these answer your real, fundamental, can-regexp-do-this question,
but they do address what you're trying to solve.

Lex can do multi-line patterns, and in Dougherty & O'Reilly's Unix Text
Processing (the big blue one) there is an example of a multi-line grep
using sed, as I recall.

Liam

-- 
Liam Quin, lee@sq.com, SoftQuad, Toronto, +1 416 963 8337
the barefoot programmer

