[HN Gopher] C Posix-compliant argument parsing in 42 LoC, inspir...
___________________________________________________________________
C Posix-compliant argument parsing in 42 LoC, inspired by Duff's
device
Author : camel-cdr
Score : 70 points
Date : 2023-01-04 11:41 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| camel-cdr wrote:
| This uses switch abuse to handle short and long arguments in one
| place: ... else if (ARG_LONG("option"))
| case 'o': { } ...
|
| The idea was to leave all the decisions to the library user, so
| there are no automatic help pages and error messages. The library
| essentially just gives you a new language primitive to work with.
|
| Example usage (also included in the file): //
| assumes argv and argc exist ARG_BEGIN {
| if (0) { case 'a': a = 1; ARG_FLAG();
| break; case 'b': b = 1; ARG_FLAG();
| break; case 'c': c = 1; ARG_FLAG();
| break; case '\0': readstdin = 1; break;
| } else if (ARG_LONG("reverse")) case 'r': {
| reverse = 1; ARG_FLAG();
| } else if (ARG_LONG("input")) case 'i': {
| input = ARG_VAL(); } else if (ARG_LONG("output"))
| case 'o': { output = ARG_VAL();
| } else if (ARG_LONG("help")) case 'h': case '?': {
| printf("Usage: %s [OPTION...] [STRING...]\n", argv0);
| puts("Example usage of arg.h\n");
| puts("Options:"); puts(" -a,
| set a to true"); puts(" -b,
| set a to true"); puts(" -c,
| set a to true"); puts(" -r, --reverse
| set reverse to true"); puts(" -i,
| --input=STR set input string to STR");
| puts(" -o, --output=STR set output string to STR");
| puts(" -h, --help display this help and exit");
| return EXIT_SUCCESS; } else { default:
| fprintf(stderr, "%s: invalid
| option '%s'\n" "Try '%s --help'
| for more information.\n", argv0,
| *argv, argv0); return EXIT_FAILURE;
| } } ARG_END; // argv and argc now hold
| the non-option arguments
| planede wrote:
| I appreciate the macromancy, but it's not very readable. I'm sure
| it's very efficient though.
| [deleted]
| joosters wrote:
| Perfect for the all-too-common situation where your performance
| bottleneck is in command line argument parsing...
| zokier wrote:
| What does posix compliant mean in this context?
| klyrs wrote:
| It (edit: claims to) conforms to the POSIX spec for command
| line arguments
|
| https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1...
| planede wrote:
| That document does not describe long options. I guess that
| can be treated as a conforming extension.
|
| The C implementation does not seem to have support for
| optional option arguments.
|
| The C implementation also does not seem to support "--" as a
| delimiter for option arguments, although that's only a
| guideline.
|
| So I guess it can be used to implement POSIX compliant
| interfaces that don't use optional option arguments.
|
| edit: supporting "--" as delimiter might not be a guideline:
|
| _The utilities in the Shell and Utilities volume of
| POSIX.1-2017 that claim conformance to these guidelines shall
| conform completely to these guidelines as if these guidelines
| contained the term "shall" instead of "should"._
|
| edit2: Seems like I was completely wrong about supporting
| "--", when tried out it does seem to support it. One weird
| corner cases is the call `<prog> -- -`, where "-" isn't
| interpreted as enabling stdin, but treated as a regular
| positional argument. `echo asd | cat -- -` reads from stdin
| with GNU cat.
| mtlmtlmtlmtl wrote:
| "--" is a GNU originated thing afaik.
| planede wrote:
| Maybe, but it's part of POSIX now. Anyway, I misread the
| code, and it's actually supported.
| camel-cdr wrote:
| > So I guess it can be used to implement POSIX compliant
| interfaces that don't use optional option arguments.
|
| Yes, that was my intention.
| klyrs wrote:
| On a closer read, the comment mentions plan9's arg(3):
|
| https://9fans.github.io/plan9port/man/man3/arg.html
|
| Generally, I think it's pretty normal to interpret "POSIX
| compliant" as a minimum requirement, and then extend that
| to your own whims as long as you don't break the POSIX
| part.
| cperciva wrote:
| Another option with similar levels of macro (ab)use is my "magic
| getopt" which lets you write code like this:
| const char * ch; while ((ch = GETOPT(argc, argv)) !=
| NULL) { GETOPT_SWITCH(ch) {
| GETOPT_OPT("-a"): aflag = 1;
| break; GETOPT_OPTARG("--bar"):
| printf("bar: %s\n", optarg); break;
| GETOPT_OPTARG("-f"): printf("foo:
| %s\n", optarg); break;
| GETOPT_MISSING_ARG: printf("missing
| argument to %s\n", ch); /*
| FALLTHROUGH */ GETOPT_DEFAULT:
| usage(); } }
| ufo wrote:
| I'm intrigued. Is there somewhere where we could find the full
| definition of these macros?
| cperciva wrote:
| https://github.com/Tarsnap/libcperciva/blob/master/util/geto.
| ..
|
| https://github.com/Tarsnap/libcperciva/blob/master/util/geto.
| ..
|
| BSD licensed, of course.
| ZephyrP wrote:
| Very neat. Do you implement option parsing in "magic getopt" or
| can you (somehow?) handle setting up the option string
| arguments used by the more familiar variants in getopt(3)?
| cperciva wrote:
| All the option parsing is done in functions called by the
| macros. The "getopt string" isn't needed since the
| GETOPT_OPT() and GETOPT_OPTARG() "case statements" convey the
| same information (the set of valid options).
| chungy wrote:
| switch options falling through doesn't seem like magic to me.
| Rather, a common practice.
| cperciva wrote:
| That's not the magic part...
| asveikau wrote:
| This is off topic, but your stretchy buffer does one of my pet
| peeves: (a)->at = realloc((a)->at, (a)->_cap *
| sizeof *(a)->at))
|
| When realloc fails, the old value of (a)->at will be leaked.
|
| And memory allocation failure also leads to null pointer
| dereference in this code.
| camel-cdr wrote:
| Ah, my way of dealing with memory allocation failure is to
| write a xmalloc/xcalloc/xrealloc that exit on error. So the
| idea would be that the library users can just:
| #unded malloc #define malloc xmalloc ...
|
| I suppose I could make this the default behavior, and make it
| overwritable.
| mtlmtlmtlmtl wrote:
| Doesn't this break things like Valgrind?
| asveikau wrote:
| Firstly, an xmalloc will call malloc and add error checks.
| So malloc is still called.
|
| But more fundamentally, Valgrind hooks into the allocator
| (and the program at large) much more deeply. You could
| write your own allocator (many people deploy something
| other than libc's) and it would still figure it out.
| [deleted]
| xfennec wrote:
| How this code can enter the "if (0)" part?
| mtlmtlmtlmtl wrote:
| ARG_BEGIN opens a switch statement. So it can enter any of the
| cases, or it'll jump to the else if part.
| kreetx wrote:
| Is there any reason this is not written as `switch(..) {
| <case statements>; default: <else statements> }`. I.e is
| there a reason for `if(0)`?
| jsmith45 wrote:
| The whole `if` portion is for if the second character of
| the argument (after the initial `-`) is a second `-` [0].
|
| In that case, the switch jumped to `case '-':` hidden
| within the macro, and the if statement is about processing
| long arguments (like `--help`). arguments only available as
| a short flag should never get executed as part of a short
| flag, so putting them inside the if(0) case is an easy
| option.
|
| An alternative without if(0) of any variety would be:
| ARG_BEGIN { if (ARG_LONG("reverse")) case
| 'r': { reverse = 1;
| ARG_FLAG(); } else if (ARG_LONG("input"))
| case 'i': { input = ARG_VAL();
| } else if (ARG_LONG("output")) case 'o': {
| output = ARG_VAL(); } else if
| (ARG_LONG("help")) case 'h': case '?': {
| printf("Usage: %s [OPTION...] [STRING...]\n", argv0);
| puts("Example usage of arg.h\n");
| puts("Options:"); puts(" -a,
| set a to true"); puts(" -b,
| set a to true"); puts(" -c,
| set a to true"); puts(" -r,
| --reverse set reverse to true");
| puts(" -i, --input=STR set input string to STR");
| puts(" -o, --output=STR set output string to STR");
| puts(" -h, --help display this help and exit");
| return EXIT_SUCCESS; } else { default:
| fprintf(stderr, "%s:
| invalid option '%s'\n" "Try
| '%s --help' for more information.\n",
| argv0, *argv, argv0); return
| EXIT_FAILURE; } break;
| case 'a': a = 1; ARG_FLAG(); break; case
| 'b': b = 1; ARG_FLAG(); break; case 'c': c
| = 1; ARG_FLAG(); break; case '\0':
| readstdin = 1; break; } ARG_END;
|
| The downside is that those few short flag only arguments no
| longer line up nicely with the others, and they come after
| the default, which could be a little bit confusing.
|
| Footnote: [0] Except if that second dash is the last
| character of the argument, in which case -- is the end of
| flags marker, meaning any further arguments that begin with
| `-` are just funky positional paramaters
| tempodox wrote:
| The `switch` statement in `ARG_BEGIN`.
| cesarb wrote:
| The way "switch" works in C is very weird. It behaves more like
| a "goto", where each "case ...:" is a label to which it can
| jump.
|
| So when the character is '-', it starts just before the "if
| (0)" (this part is hidden within the ARG_BEGIN macro), and as
| you noted, it will never enter that block. However, when the
| character is 'a', 'b', 'c', or nothing (the end-of-string
| marker), it will jump directly to the corresponding "case ...:"
| label, even though it's within that "unreachable" block.
| pcwalton wrote:
| A switch can jump inside an if body that contains case labels,
| causing the condition not to be evaluated.
|
| C switch/case semantics are... not the most obvious.
| mark_undoio wrote:
| Because there's really a switch statement (hidden behind a
| macro) that will jump to labels within that block.
|
| The fact that the if condition is false means it won't just run
| the whole block straight through but you can still jump to a
| label in it. A goto statement would also allow you to jump into
| an otherwise-unreachable block.
| Arch-TK wrote:
| If you don't need long options and don't want to use getopt (for
| whatever misguided reason) then I wrote a (rare) blog post about
| doing this without any macro abuse: https://the-
| tk.com/post/2021/07/29/option-parsing-on-a-budge... .
| metadat wrote:
| For the initiated, Duff's Device is a technique for manually
| implementing loop unrolling.
|
| https://en.wikipedia.org/wiki/Duff%27s_device
| Arnavion wrote:
| ( _un_ initiated. The initiated would already know what it is.)
| metadat wrote:
| Thanks for the typo correction.
| gabrielsroka wrote:
| Compliant
| [deleted]
| metadat wrote:
| I clicked through to the link before reading any comments:
|
| "In C? I wonder if it's English only?"
|
| Then proceeded to become extremely confused:
|
| "How is this going to help me parse a complaint? It looks like
| arg parsing..."
|
| ^^
| classified wrote:
| Still, complaint argument parsing is an important and
| undervalued skill, POSIX or not.
| camel-cdr wrote:
| Whoops, is it possible to edit the title?
| kens wrote:
| I realized that I essentially never use a "switch" statement. It
| seems like a control-flow construct that made sense in the 1970s
| to help the compiler generate jump tables, but doesn't seem
| particularly useful now. Moreover, it seems error-prone with
| accidental fall-through. And doing "fancy" things with it makes
| code very hard to read. Does anyone else find "switch" kind of
| redundant?
| aidenn0 wrote:
| I use switch statements all the time for state-machines. There
| are linters that will warn on fallthrough; you should use one
| of them.
| oweiler wrote:
| I still find switch statements easier on the eye than a chain
| of if-else-if-...else.
___________________________________________________________________
(page generated 2023-01-05 23:01 UTC)