[HN Gopher] SuperC: Parsing All of C by Taming the Preprocessor ...
___________________________________________________________________
SuperC: Parsing All of C by Taming the Preprocessor [pdf] (2012)
Author : g0xA52A2A
Score : 75 points
Date : 2024-03-09 10:36 UTC (12 hours ago)
(HTM) web link (paulgazzillo.com)
(TXT) w3m dump (paulgazzillo.com)
| ksherlock wrote:
| The source code: https://github.com/appleseedlab/superc/
| xniclb wrote:
| Has anyone already integrated it as a vscode extension?
| dzdt wrote:
| This is (2012). I don't see that it has been discussed before
| here though. I guess it didn't make much of a splash.
| lacraig2 wrote:
| This looks really useful, but it seems like an uphill battle even
| reproducing given the lack of updates in almost the last decade.
| mdaniel wrote:
| Do you mean getting it to run on modern JVMs or that the C used
| in the kernel has drifted such that the technique would no
| longer apply?
| evanjrowley wrote:
| This is way over my head, but I was reminded of _The C language
| is purely functional_ by Conal Elliott:
| http://conal.net/blog/posts/the-c-language-is-purely-functio...
| mdaniel wrote:
| I am obviously not able to understand what, specific, problem
| this is solving based on the title of "parsing all of C" when the
| preprocessor is apparently left intact by design
| static int mousedev_open(struct inode *inode, struct file *file)
| { int i; #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX
| if (imajor(inode) == 10) i = 31; else
| #endif i = iminor(inode) - 32; return 0;
| } (b) The preprocessed source preserving all
| configurations
|
| and my experience with C is that there are untold number of
| "unbound" tokens that are designed to be injected in by -D or
| auto-generated config.h files, so presumably this works closer to
| the "ready for compilation" phase versus something one could use
| to make tree-sitter better (as an example)
| kazinator wrote:
| By the way, GNU Bison implements general LR (GLR) parsing by
| something that can be called "fork merge LR". The documentation
| states that Bison's GLR algorithm resolves ambiguities by forking
| parallel parses, which then merge. It's not the same as forking
| due to a preprocessor conditional, but worth mentioning.
| kazinator wrote:
| > _In exploring configuration-preserving parsing, we focus on
| performance._
|
| Why, because this goose is so thoroughly cooked that all that is
| left is optimizing for speed?
|
| There is a lot of misplaced focus on performance in CS academia,
| and also in software.
|
| Suppose we have some accurate tool that does something useful
| with a C program, but it takes 5 minutes to run instead of 5
| seconds. So what? Someone still wants to use it. Suppose the
| program is used by millions of people, and that 5 minute run only
| has to be repeated half a dozen times during development.
|
| Get it right, and get it in people's hands should be the
| priorities, and not necessarily in that order.
| DriftRegion wrote:
| Figure 1 spoke to me. It's an expanded syntax tree that branches
| depending on on the value of a preprocessor definition
| "CONFIG...X". I've often found myself doing the kind of code
| archeology that this paper seems to be trying to automate:
| exploring all the configuration possibilities implied by the
| codebase / build system. A C program that makes heavy use of the
| preprocessor is generally harder to grok by both h humans and
| static analysis because 1. the C preprocessor syntax is different
| from C, 2. the inputs are not necessarily bounded by what appears
| in the source files alone ("-DCONFIG...X=foo" passed in from the
| build system), and 3. the resulting program and its control flow
| may be quite different depending on preprocessor options. As a
| simple example embedded systems often define an "ASSERT(X)" macro
| as either noop, an infinite loop, a print statement or the like.
|
| This is definitely a niche space but I see clear use for large,
| portable and configurable c codebases (e.g. Linux kernel,
| FreeRTOS) for providing better visibility into the configuration
| system.
| senkora wrote:
| You may be interested in unifdef, which selectively evaluates
| and removes ifdefs.
|
| https://dotat.at/prog/unifdef/
|
| I used it once at work for a niche usecase. It's main use case
| seems to be making it easier to simplify platform-specific code
| when you remove support from old platforms in legacy codebases.
| peter_d_sherman wrote:
| It seems that the use of macros/IFDEF (in any language, not
| just C) -- bifurcates into two distinct use-cases:
|
| 1) Platform/Processor/OS configuration/build use-cases.
|
| (and)
|
| 2) All other use-cases that are not directly related to #1.
|
| In other words, if you're a future language designer and you
| design a macro system for your language, you might wish to
| distinguish between configuration/platform/build related
| macros -- and other macros not directly related to build and
| configuration...
|
| Doing that would allow one set and/or the other set to be
| selectively and easily evaluated back into the non-macro
| source of the base language -- depending on what is desired
| by the language user...
|
| Anyway, an excellent link!
| mncharity wrote:
| Fwiw, ~20 years ago my experience was that preprocessor use in
| open-source C code was _very_ idiomatic, and iirc, a simple
| backtracking parser with idioms was sufficient to parse all I
| tried it against, including the linux kernel.
___________________________________________________________________
(page generated 2024-03-09 23:01 UTC)