Newsgroups: comp.lang.c
Path: utzoo!utgpu!jarvis.csri.toronto.edu!csri.toronto.edu!norvell
From: norvell@csri.toronto.edu (Theodore Stevens Norvell)
Subject: Re: regex for C comments
Message-ID: <1989Jul11.232547.13488@jarvis.csri.toronto.edu>
Organization: University of Toronto, CSRI
References: <19365@paris.ics.uci.edu> <502@chem.ucsd.EDU>
Distribution: na

In article <502@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes:
>In article <19365@paris.ics.uci.edu> schmidt@zola.ics.uci.edu (Doug Schmidt) writes:
>>In their book ``Introduction to Compiler Construction with UNIX,''
>>Schreiner and Friedman provide the following LEX regular expression
>>for recognizing C comments:
>>"/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/"
>
>This expression fails on each of the following:
>
>	/*****//hello world */
Is that really a C comment?  I think only the first 7 characters are.
>
>
>So, who has the shortest single LEX expression that correctly
>matches C comments --
>ignoring string and character constants,
>and disallowing start conditions?
>
>Mine is
>
>	"/*"\/*([^/]|{[^*/]\/+})*"*/"

        \/\*([^*]*\*+[^*/])*\*+\/

or more legibly

	"/*" ( [^*] "*"+ [^*/] )* "*"+ "/"

Though I haven't proved it.

Theo Norvell     U of Toronto
