[HN Gopher] Decoded: GNU coreutils (2019)
___________________________________________________________________
Decoded: GNU coreutils (2019)
Author : pcr910303
Score : 211 points
Date : 2021-03-10 15:04 UTC (7 hours ago)
(HTM) web link (maizure.org)
(TXT) w3m dump (maizure.org)
| gautamcgoel wrote:
| I'm blown away by this project. What a great way to learn about
| the coreutils, and also see how C is written in the real world!
| I'm curious how the author made the diagrams explaining each
| utility - did he use Inkscape?
| mraza007 wrote:
| OMG I'm so surprised I was going to post a question on HN
| yesterday that how can I learn about GNU Coreutils and today I
| wake up see this
|
| What a coincidence!!!
|
| Truly an amazing resource on GNU coreutils
| ufo wrote:
| The biggest takeaway for me is that I learned about the existence
| of some utilities that I had never known were there. Specially
| "factor" and "tsort".
| rwmj wrote:
| There's also "moreutils"[1] which is a set of useful additional
| tools. "errno" is indispensable if you're a Linux programmer.
|
| [1] https://joeyh.name/code/moreutils/
| vram22 wrote:
| There are many cool / useful less-known utilities in GNU /
| Linux.
|
| Check man7.org for good, though brief info on many of them.
|
| I had explored many of them a while ago.
|
| Maintained by Michael Kerrisk, author of The Linux Programming
| Interface, a kind of reference bible for Linux APIs and system
| calls.
|
| Edit: many of which are used in making such utilities.
| rustyminnow wrote:
| Here's the list of coreutils pages for anybody else
| interested: https://man7.org/linux/man-
| pages/dir_by_project.html#coreuti...
| MaxBarraclough wrote:
| Don't forget _recutils_.
|
| https://www.gnu.org/software/recutils/manual/A-Little-
| Exampl...
|
| https://en.wikipedia.org/wiki/Recfiles
| dang wrote:
| Discussed at the time:
|
| _Decoded: GNU Coreutils_ -
| https://news.ycombinator.com/item?id=20328650 - July 2019 (55
| comments)
| ojnabieoot wrote:
| Very nice work and much easier than trawling through the
| repository.
|
| Some ignorant and probably cliched musing: when I look at small
| utilities like these I am always struck by a seeming distinction
| between best practices for little C programs versus best
| practices for large C applications (the author of the post
| touches on this ad well).
|
| In particular, the explicit flow (including goto) and "pedantic"
| style is actually quite appropriate for something < 1000 lines
| and where the expected behavior is extremely well understood. In
| cases like pwd, mkdir, etc, trying to abstract too much is
| arguably a mistake for maintainability and understanding.
|
| I say all this as an immutable functional-first dev who hasn't
| done much native code :) And I think the various type-safe /
| memory-safe / etc versions of these tools are worth developing.
| But there's something to be said about well-optimized native code
| that clearly "does what it says on the box" in a way that's
| accessible to anyone who understands basic Linux programming -
| even if they can only contextually read C code.
|
| (My only real gripe is typographic / linting related, mostly due
| to being a whippersnapper).
| kiwidrew wrote:
| This is in keeping with the style of the original Unix
| utilities.
|
| Having a handful of global variables reduces the amount of
| stuff being passed around from function to function; utilities
| don't need to worry too much about free()ing dynamic
| allocations, since that gets cleaned up on exit anyways; none
| of the code has to be re-entrant, because each invocation of
| the utility is running in its own process.
| setpatchaddress wrote:
| Could not disagree more about goto. Small programs always turn
| into larger ones. And what you have at the end if you're not
| from the beginning using practices appropriate for larger
| programs is spaghetti code.
|
| I'm not criticizing it in context -- a lot of this code dates
| back to the mid 80's if I'm not mistaken. But always write new
| code using scalable idioms.
| overboard2 wrote:
| If this program has remained small for 40 years, then maybe
| not all small programs turn into larger ones.
| ojnabieoot wrote:
| I agree with you in general. But I think in this specific
| case it's a bit more complicated: the downsides aren't as bad
| as they normally would be, and the use of primitive flow
| constructs arguably has an advantage in this domain:
|
| POSIX and similarly stuffy requirements (even if "soft")
| means that this code is fairly static. While there is some
| bloat in the pragmas, etc., these applications are
| necessarily slow to change and I think it's reasonable to say
| that they won't suffer from _feature_ bloat anytime soon. So
| the normal software risk considerations are a bit different
| here. Further, any changes to the code will be fiercely
| reviewed, and the individual programs are small enough that
| increases in complexity will be quickly spotted. Relatedly,
| these programs are small enough that, if a refactor to more
| structured code were necessary, the work would be quite
| feasible. So while the risks of goto are real in any C
| program, in practice I think they're quite minimal here.
|
| And I do think you're missing an advantage. These are core
| userspace functions that perform safety- and security-
| critical kernel interactions. So I definitively agree there
| is a strong argument to use safe code, modern abstractions,
| and so on. This is especially true for modern PCs that really
| can afford to spend a few extra cycles creating a folder.
|
| But a modern code construct, correctly applied, is only as
| safe as the compiler. This is not guaranteed! A common
| "gotcha" with buggy C compilers is inappropriately pruning
| instructions because the compiler optimizes away a loop or
| else statement. It is hardly a frequent issue but similar
| bugs have shown up in recent gcc/clang releases. And in
| particular core developers who are working on operating
| systems are more likely to be using shaky C compilers.
|
| Using gotos and ugly global state has the distinct advantage
| that generated assembly tends to have less "surprises." If
| there is a bug in the compiler it will be less well-hidden;
| if there is a bug in the program then there is less mental
| work between analyzing the C and analyzing the disassembly
| for debugging.
|
| Again, in general I think you're correct and that my argument
| is ultimately more of a judgment call.
|
| EDIT: I didn't really want to address any _structural_
| advantages of goto for, e.g. exception handling via breaking
| loops earlier, etc. I am not a domain expert enough to
| comment appropriately but it does seem there are cases where
| properly abstracted cleanup code in C is more spaghettified
| than a goto: https://lkml.org/lkml/2003/1/12/203
| not2b wrote:
| If the flow graph doesn't have a clean nested structure,
| this impedes compiler optimization. It can be possible to
| normalize it, but this may require the compiler to clone
| the code. Compilers are pretty good these days; if you've
| experienced a C compiler "inappropriately" optimizing
| something away the most likely cause these days is not a
| compiler bug, but a software developer who doesn't
| understand rules related to aliasing or undefined behavior.
|
| I do agree that the specific use of goto to jump cleanly
| out of several loops is appropriate: the problem is that C
| lacks clean constructs for exiting named blocks. That would
| be preferable to general goto and doesn't harm
| optimization, the flow graph is still easy to analyze,
| convert to SSA form and the like.
| monocasa wrote:
| I'd like to see the use of goto.
|
| There's two 'allowed' uses in C that are common and represent
| good code even today. goto error cleanup stubs, and goto in
| virtual machine dispatch loops.
|
| The size of the codebase doesn't really matter for those
| cases; they're largely considered the idiomatic way to go
| about the problems they're trying to solve.
| not2b wrote:
| The error cleanup role is handled in a number of other
| languages (Ada, VHDL, Perl) by letting the programmer name
| a block and having a statement that terminates that block
| or (for a loop) goes to the next iteration, even if this
| terminates multiple loops. The effect is similar to the C
| goto way of doing that, but it's more controlled and easy
| for compilers to deal with.
| monocasa wrote:
| Oh, for sure, other languages have different idiomatic
| constructs that don't require such a heavy hammer as goto
| to achieve s similar effect.
|
| Even in C, if you're writing Microsoft only code, seh is
| probably a better mechanism than goto error.
|
| I'd argue that the defer statement in go (and the
| surprising side effects of it, like that it's function
| instead of block scope like you might otherwise expect)
| ultimately come from trying to wrap this idiom in a
| construct that's better supported by the language.
|
| My point though is that in relatively standard, portable
| C, there are valid, idiomatic use cases of goto, and it's
| not quite so easy to say 'eww goto' in those very
| specific circumstances.
| robocat wrote:
| I skimmed some Linux code the other day and noticed that
| goto is used for more than those two situations. Maybe just
| cruft...?
|
| Search for retry: or handle_itb: in https://github.com/torv
| alds/linux/blob/master/fs/ext4/resize...
|
| Or fixleft: or copy: in https://github.com/torvalds/linux/b
| lob/d158fc7f36a25e19791d2...
| monocasa wrote:
| Yeah, the retry piece is a bit more controversial. Some
| people think that it's cleaner for code that's probably
| already nesting loops, but I tend to break it apart in
| different ways. That one I generally don't push too hard
| in review, but require more tests to shore up confidence.
|
| And frankly, the fix_left style code you see just isn't
| modern idiomatic C, IMO. In a code review I'd have them
| either combo of write a block comment explaining why it's
| necessary to be weird and a lot of test cases for when
| someone inevitably tries to rewrite it, or just rewrite
| it in the first place.
|
| Some of the areas of the Linux kernel aren't exactly
| known for being the best written C (unfortunate as that
| is) and you're seeing some of that.
| psychoslave wrote:
| "How is GNU `yes` so fast?" was already discussed on this topic:
| https://news.ycombinator.com/item?id=14542938
| gbajson wrote:
| I have just spent 5 minutes trying to find any useful use cases
| for 'vdir'. Does anyone of have idea why other 'ls' has been
| created?
| mshockwave wrote:
| Very interesting way to visualize some of the most important
| cornerstones in *nix systems
| tyingq wrote:
| Really lovely work. I'm curious if the png flowcharts are
| generated from data, or hand drawn.
|
| Edit: Also, some easter egg looking thing at the bottom right of
| the page:
|
| <div class="copyright col-md-6">
| ##*#**##****#*#**/\##*###*****#**#*#*#**#******#**#*#*####*#*##*
|
| </div>
|
| Edit: Fixed asterisks, I think.
| jedimastert wrote:
| ##*#**##****#*#**/\##*###*****#**#*#*#**#******#**#*#*####*#*##
| *
|
| HN's formatter is having a _time_ with that many asterisks
|
| I don't recognize the format, can someone help me out?
| tyingq wrote:
| It's 64 characters. I'd guess binary if that /\ wasn't in the
| middle.
| bluesign wrote:
| seems like morse code
|
| First part is maizure
|
| Ps: hn messed up with stars :)
| [deleted]
| tyingq wrote:
| Ah, good call. HN ate your asterisks, but yeah, the first
| bit before the / is "maizure" in morse. Though having no
| spaces between the letters makes for some
| ambiguity...hard to decode the rest. ##
| *# ** ##** **# *#* * == maizure
|
| I can find some long words with a dictionary approach in
| there, like: ARRIVE -> .-.-..-......-.
| CLEVER -> -.-..-......-..-. DESTINY ->
| -......-..-.-.-- FENCED -> ..-..-.-.-..-..
| MEMBER -> --.---.....-. MISTER -> --.....-..-.
| (etc)
|
| But, too many variations that direction too.
| JdeBP wrote:
| You have certainly got further than people did in
| https://news.ycombinator.com/item?id=17116855 .
| tyingq wrote:
| update...pretty sure the ending is "FSF.ORG"
|
| Maybe an email address?
| bluesign wrote:
| Yeah millions of combinations, I tried but was not
| patient enough.
|
| for the curious: https://www.jbowman.com/remorse/
| [deleted]
___________________________________________________________________
(page generated 2021-03-10 23:00 UTC)