[HN Gopher] SuperC: Parsing All of C by Taming the Preprocessor ...
       ___________________________________________________________________
        
       SuperC: Parsing All of C by Taming the Preprocessor [pdf] (2012)
        
       Author : g0xA52A2A
       Score  : 75 points
       Date   : 2024-03-09 10:36 UTC (12 hours ago)
        
 (HTM) web link (paulgazzillo.com)
 (TXT) w3m dump (paulgazzillo.com)
        
       | ksherlock wrote:
       | The source code: https://github.com/appleseedlab/superc/
        
         | xniclb wrote:
         | Has anyone already integrated it as a vscode extension?
        
       | dzdt wrote:
       | This is (2012). I don't see that it has been discussed before
       | here though. I guess it didn't make much of a splash.
        
       | lacraig2 wrote:
       | This looks really useful, but it seems like an uphill battle even
       | reproducing given the lack of updates in almost the last decade.
        
         | mdaniel wrote:
         | Do you mean getting it to run on modern JVMs or that the C used
         | in the kernel has drifted such that the technique would no
         | longer apply?
        
       | evanjrowley wrote:
       | This is way over my head, but I was reminded of _The C language
       | is purely functional_ by Conal Elliott:
       | http://conal.net/blog/posts/the-c-language-is-purely-functio...
        
       | mdaniel wrote:
       | I am obviously not able to understand what, specific, problem
       | this is solving based on the title of "parsing all of C" when the
       | preprocessor is apparently left intact by design
       | static int mousedev_open(struct inode *inode, struct file *file)
       | {         int i;              #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX
       | if (imajor(inode) == 10)         i = 31;         else
       | #endif         i = iminor(inode) - 32;              return 0;
       | }         (b) The preprocessed source preserving all
       | configurations
       | 
       | and my experience with C is that there are untold number of
       | "unbound" tokens that are designed to be injected in by -D or
       | auto-generated config.h files, so presumably this works closer to
       | the "ready for compilation" phase versus something one could use
       | to make tree-sitter better (as an example)
        
       | kazinator wrote:
       | By the way, GNU Bison implements general LR (GLR) parsing by
       | something that can be called "fork merge LR". The documentation
       | states that Bison's GLR algorithm resolves ambiguities by forking
       | parallel parses, which then merge. It's not the same as forking
       | due to a preprocessor conditional, but worth mentioning.
        
       | kazinator wrote:
       | > _In exploring configuration-preserving parsing, we focus on
       | performance._
       | 
       | Why, because this goose is so thoroughly cooked that all that is
       | left is optimizing for speed?
       | 
       | There is a lot of misplaced focus on performance in CS academia,
       | and also in software.
       | 
       | Suppose we have some accurate tool that does something useful
       | with a C program, but it takes 5 minutes to run instead of 5
       | seconds. So what? Someone still wants to use it. Suppose the
       | program is used by millions of people, and that 5 minute run only
       | has to be repeated half a dozen times during development.
       | 
       | Get it right, and get it in people's hands should be the
       | priorities, and not necessarily in that order.
        
       | DriftRegion wrote:
       | Figure 1 spoke to me. It's an expanded syntax tree that branches
       | depending on on the value of a preprocessor definition
       | "CONFIG...X". I've often found myself doing the kind of code
       | archeology that this paper seems to be trying to automate:
       | exploring all the configuration possibilities implied by the
       | codebase / build system. A C program that makes heavy use of the
       | preprocessor is generally harder to grok by both h humans and
       | static analysis because 1. the C preprocessor syntax is different
       | from C, 2. the inputs are not necessarily bounded by what appears
       | in the source files alone ("-DCONFIG...X=foo" passed in from the
       | build system), and 3. the resulting program and its control flow
       | may be quite different depending on preprocessor options. As a
       | simple example embedded systems often define an "ASSERT(X)" macro
       | as either noop, an infinite loop, a print statement or the like.
       | 
       | This is definitely a niche space but I see clear use for large,
       | portable and configurable c codebases (e.g. Linux kernel,
       | FreeRTOS) for providing better visibility into the configuration
       | system.
        
         | senkora wrote:
         | You may be interested in unifdef, which selectively evaluates
         | and removes ifdefs.
         | 
         | https://dotat.at/prog/unifdef/
         | 
         | I used it once at work for a niche usecase. It's main use case
         | seems to be making it easier to simplify platform-specific code
         | when you remove support from old platforms in legacy codebases.
        
           | peter_d_sherman wrote:
           | It seems that the use of macros/IFDEF (in any language, not
           | just C) -- bifurcates into two distinct use-cases:
           | 
           | 1) Platform/Processor/OS configuration/build use-cases.
           | 
           | (and)
           | 
           | 2) All other use-cases that are not directly related to #1.
           | 
           | In other words, if you're a future language designer and you
           | design a macro system for your language, you might wish to
           | distinguish between configuration/platform/build related
           | macros -- and other macros not directly related to build and
           | configuration...
           | 
           | Doing that would allow one set and/or the other set to be
           | selectively and easily evaluated back into the non-macro
           | source of the base language -- depending on what is desired
           | by the language user...
           | 
           | Anyway, an excellent link!
        
       | mncharity wrote:
       | Fwiw, ~20 years ago my experience was that preprocessor use in
       | open-source C code was _very_ idiomatic, and iirc, a simple
       | backtracking parser with idioms was sufficient to parse all I
       | tried it against, including the linux kernel.
        
       ___________________________________________________________________
       (page generated 2024-03-09 23:01 UTC)