[HN Gopher] Parallel ./configure
___________________________________________________________________
Parallel ./configure
Author : brooke2k
Score : 195 points
Date : 2025-04-25 23:19 UTC (23 hours ago)
(HTM) web link (tavianator.com)
(TXT) w3m dump (tavianator.com)
| epistasis wrote:
| I've spent a fair amount of time over the past decades to make
| autotools work on my projects, and I've never felt like it was a
| good use of time.
|
| It's likely that C will continue to be used by everyone for
| decades to come, but I know that I'll personally never start a
| new project in C again.
|
| I'm still glad that there's some sort of push to make autotools
| suck less for legacy projects.
| tidwall wrote:
| I've stopped using autotools for new projects. Just a Makefile,
| and the -j flag for concurrency.
| monkeyelite wrote:
| You can use make without configure. If needed, you can also
| write your own configure instead of using auto tools.
|
| Creating a make file is about 10 lines and is the lowest
| friction for me to get programming of any environment.
| Familiarity is part of that.
| edoceo wrote:
| Write your own configure? For an internal project, where much
| is under domain control, sure. But for the 1000s of projects
| trying to multi-plarform and/or support flavours/versions -
| oh gosh.
| monkeyelite wrote:
| It depends on how much platform specific stuff you are
| trying to use. Also in 2025 most packages are tailored for
| the operating system by packagers - not the original
| authors.
|
| Autotools is going to check every config from the past 50
| years.
| charcircuit wrote:
| >Also in 2025 most packages are tailored for the
| operating system by packagers - not the original authors.
|
| No? Most operating systems don't have a separate
| packager. They have the developer package the
| application.
| monkeyelite wrote:
| Yes? Each operating system is very different and almost
| every package has patches or separate install scripts.
| viraptor wrote:
| It's a bit of a balance once you get bigger dependencies. A
| generic autoconf is annoying to write, but rarely an issue
| when packaging for a distro. Most issues I've had to fix in
| nixpkgs were for custom builds unfortunately.
|
| But if you don't plan to distribute things widely (or have no
| deps).. Whatever, just do what works for you.
| psyclobe wrote:
| cmake ftw
| aldanor wrote:
| You mean cargo build
| yjftsjthsd-h wrote:
| ... _can_ cargo build things that aren 't rust? If yes,
| that's really cool. If no, then it's not really in the same
| problem domain.
| kouteiheika wrote:
| No it can't.
|
| It can build a Rust program (build.rs) which builds
| things that aren't Rust, but that's an entirely different
| use case (building non-Rust library to use inside of Rust
| programs).
| crabbone wrote:
| There's GprBuild (Ada tool) that can build C (not sure
| about C++). It also has more elaborate configuration
| structure, but I didn't use it extensively to tell what
| exactly and how exactly does it do it. In combination
| with Alire it can also manage dependencies Cargo-style.
| touisteur wrote:
| Got it to build C++, CUDA and IIRC SYCL too.
| malkia wrote:
| cmake uses configure, or configure-like too!
| ahartmetz wrote:
| Same concept, but completely different implementation.
| JCWasmx86 wrote:
| Or meson is a serious alternative to cmake (Even better than
| cmake imho)
| torarnv wrote:
| CMake also does sequential configuration AFAIK. Is there any
| work to improve on that somewhere?
| OskarS wrote:
| Meson and cmake in my experience are both MUCH faster
| though. It's much less of an issue with these systems than
| with autotools.
| tavianator wrote:
| Just tried reconfiguring LLVM: 27.24s
| user 8.71s system 99% cpu 36.218 total
|
| Admittedly the LLVM build time dwarfs the configuration
| time, but still. If you're only building a smaller
| component then the config time dominates:
| ninja libc 268.82s user 26.16s system 3246% cpu 9.086
| total
| eqvinox wrote:
| To extend on sibling comments:
|
| autoconf is in no way, shape or form an "official" build system
| associated with C. It is a GNU creation and certainly popular,
| but not to a "monopoly" degree, and it's share is declining.
| (plain make & meson & cmake being popular alternatives)
| blibble wrote:
| is this really a big deal given you run ./configure once?
|
| it's like systemd trading off non-determinism for boot speed,
| when it takes 5 minutes to get through the POST
| LegionMammal978 wrote:
| If you do a lot of bisecting, or bootstrapping, or building
| compatibility matrices, or really anything that needs you to
| compile lots of old versions, the repeated ./configure steps
| really start feeling like a drag.
| kazinator wrote:
| In a "reasonably well-behaved program", if you have the
| artifacts from a current configure, like a "config.h" header,
| they are compatible with older commits, even if
| configurations changed, as long as the configuration changes
| were additive: introducing some new test, along with a new
| symbol in "config.h".
|
| It's possible to skip some of the ./configure steps.
| Especially for someone who knows the program very well.
| LegionMammal978 wrote:
| Perhaps you can get away with that for small, young, or
| self-contained projects. But for medium-to-large projects
| running more than a few years, the (different versions of)
| external or vendored dependencies tend to come and go, and
| they all have their own configurations. Long-running
| projects are also prone to internal reorganizations and
| overhauls to the build system. (Go back far enough, and
| you're having to wrangle patchsets for every few months'
| worth of versions since -fpermissive is no longer
| permissive enough to get it to build.)
| kazinator wrote:
| [delayed]
| csdvrx wrote:
| > it's like systemd trading off non-determinism for boot speed,
| when it takes 5 minutes to get through the POST
|
| That's a bad analogy: if a given deterministic service ordering
| is needed for a service to correctly start (say because it
| doesn't start with the systemd unit), it means the non-
| deterministic systemd service units are not properly encoding
| the dependencies tree in the Before= and After=
|
| When done properly, both solutions should work the same.
| However, the solution properly encoding the dependency graph
| (instead of just projecting it on a 1-dimensional sequence of
| numbers) will be more flexible: it's the better solution,
| because it will give you more speed but also more flexibility:
| you can see the branches any leaf depends on, remove leaves as
| needed, then cull the useless branches. You could add
| determinism if you want, but why bother?
|
| It's like using the dependencies of linux packages, and leaving
| the job of resolving them to package managers (apt, pacman...):
| you can then remove the useless packages which are no longer
| required.
|
| Compare that to doing a `make install` of everything to
| /usr/local in a specific order, as specified by a script: when
| done properly, both solutions will work, but one solution is
| clearly better than the other as it encodes more finely the
| existing dependencies instead of projecting them to a sequence.
|
| You can add determinism if you want to follow a sequence (ex:
| `apt-get install make` before adding gcc, then add cuda...), or
| you can use meta package like build-essentials, but being
| restricted to a sequence gains you nothing.
| blibble wrote:
| I don't think it is a bad analogy
|
| given how complicated the boot process is ([1]), and it
| occurs once a month, I'd rather it was as deterministic as
| possible
|
| vs. shaving 1% off the boot time
|
| [1]: distros continue to ship subtlety broken unit files,
| because the model is too complicated
| Aurornis wrote:
| Most systems do not have 5 minute POST times. That's an
| extreme outlier.
|
| Linux runs all over, including embedded systems where boot
| time is important.
|
| Optimizing for edge cases on outliers isn't a priority. If
| you need specific boot ordering, configure it that way. It
| doesn't make sense for the entire Linux world to sacrifice
| boot speed.
| timcobb wrote:
| I don't even think my Pentium 166 took 5 minutes to POST.
| Did computers ever take that long to POST??
| BobbyTables2 wrote:
| Look at enterprise servers.
|
| Competing POST in under 2 minutes is not guaranteed.
|
| Especially the 4 socket beasts with lots of DIMMs.
| yjftsjthsd-h wrote:
| Old machines probably didn't, no, but I have absolutely
| seen machines (Enterprise(tm) Servers) that took longer
| than that to get to the bootloader. IIRC it was mostly a
| combination of hardware RAID controllers and RAM...
| something. Testing?
| lazide wrote:
| It takes awhile to enumerate a couple TB worth of RAM
| dimms and 20+ disks.
| yjftsjthsd-h wrote:
| Yeah, it was somewhat understandable. I _also_ suspect
| the firmware was... let 's say _underoptimized_ , but I
| agree that the task is truly not trivial.
| lazide wrote:
| One thing I ran across when trying to figure this out
| previously - while some firmware is undoubtably dumb, a
| decent amount of it was that it was doing a lot more than
| typical PC firmware.
|
| For instance, the slow RAM check POST I was experiencing
| is because it was also doing a quick single pass memory
| test. Consumer firmware goes 'meh, whatever'.
|
| Disk spin up, it was also staging out the disk power ups
| so that it didn't kill the PSU - not a concern if you
| have 3-4 drives. But definitely a concern if you have 20.
|
| Also, the raid controller was running basic SMART tests
| and the like. Which consumer stuff typically doesn't.
|
| Now how much any of this is worthwhile depends on the use
| case of course. 'Farm of cheap PCs' type cloud hosting
| environments, most these types of conditions get handled
| by software, and it doesn't matter much if any single box
| is half broken.
|
| If you have one big box serving a bunch of key infra, and
| reboot it periodically as part of 'scheduled maintenance'
| (aka old school on prem), then it does.
| Twirrim wrote:
| Physical servers do. It's always astounding to me how
| long it takes to initialise all that hardware.
| kcexn wrote:
| Oh? What's an example of a common way for unit files to be
| subtlely broken?
| juped wrote:
| See: the comment above and its folkloric concept of systemd
| as some kind of constraint solver
|
| Unfortunately no one has actually bothered to write down
| how systemd really works; the closest to a real writeup out
| there is
| https://blog.darknedgy.net/technology/2020/05/02/0/
| Aurornis wrote:
| > is this really a big deal given you run ./configure once
|
| I end up running it dozens of times when changing versions,
| checking out different branches, chasing dependencies.
|
| It's a big deal.
|
| > it's like systemd trading off non-determinism for boot speed,
| when it takes 5 minutes to get through the POST
|
| 5 minute POST time is a bad analogy. systemd is used in many
| places, from desktops (that POST quickly) to embedded systems
| where boot time is critical.
|
| If deterministic boot is important then you would specify it
| explicitly. Relying on emergent behavior for consistent boot
| order is bad design.
|
| The number of systems that have 5 minute POST times and need
| deterministic boot is an edge case of an edge case.
| mschuster91 wrote:
| > I end up running it dozens of times when changing versions,
| checking out different branches, chasing dependencies.
|
| Yeah... but neither of that is going to change stuff like the
| size of a data type, the endianness of the architecture
| you're running on, or the features / build configuration of
| some library the project depends on.
|
| Parallelization is a bandaid (although a sorely needed!)
| IMHO, C/C++ libraries desperately need to develop some sort
| of standard that doesn't require a full gcc build for each
| tiny test. I'd envision something like nodejs's package.json,
| just with more specific information about the build details
| themselves. And for the stuff like datatype sizes, that
| should be provided by gcc/llvm in a fast-parseable way so
| that autotools can pick it up.
| o11c wrote:
| There is the `-C` option of course. It's supposedly good
| for the standard tests that waste all the time, but not so
| much for the ad-hoc tests various projects use, which have
| an unfortunate chance of being buggy or varying across
| time.
|
| ... I wonder if it's possible to manually seed a cache file
| with only known-safe test results and let it still perform
| the unsafe tests? Be sure to copy the cache file to a
| temporary name ...
|
| ---
|
| I've thought about rewriting `./configure` in C (I did it
| in Python once but Python's portability turned out to be
| poor - Python2 was bug-free but killed; Python3 was
| unfixably buggy for a decade or so). Still have a stub
| shell script that reads HOSTCC etc. then quickly builds and
| executes `./configure.bin`.
| blibble wrote:
| > to embedded systems where boot time is critical.
|
| if it's critical on an embedded system then you're not
| running systemd at all
|
| > The number of systems that have 5 minute POST times and
| need deterministic boot is an edge case of an edge case.
|
| desktop machines are the edge case, there's a LOT more
| servers running Linux than people using Linux desktops
|
| > Relying on emergent behavior for consistent boot order is
| bad design.
|
| tell that to the distro authors who 10 years in can't tell
| the difference between network-online.target, network-
| pre.target, network.target
| MindSpunk wrote:
| And a very large number of those Linux servers are running
| Linux VMs, which don't POST, use systemd, and have their
| boot time dominated by the guest OS. Those servers are
| probably hosting dozens of VMs too. Boot time makes a lot
| of difference here.
| blibble wrote:
| seabios/tianocore still takes longer than /etc/rc on a
| BSD
|
| amdahl's law's a bitch
| 0x457 wrote:
| > from desktops (that POST quickly)
|
| I take you don't run DDR5?
| Twirrim wrote:
| >chasing dependencies.
|
| This aspect of configure, in particular, drives me nuts.
| Obviously I'd like it to be faster, but it's not the end of
| the world. I forget what I was trying to build the other
| week, but I had to make 18 separate runs of configure to find
| all the things I was missing. When I dug into things it
| looked like it could probably have done it in 2 runs, each
| presenting a batch of things that were missing. Instead I got
| stuck with "configure, install missing package" over and over
| again.
| PinkSheep wrote:
| Exactly. Multiply this with the time it takes for one run
| on a slow machine. Back in the day, I ran a compilation on
| my phone as it was the target device. Besides the
| compilation taking 40 minutes (and configure _had_ missed a
| thing or two), the configure step itself took a minute or
| so. Because I don 't know all the moving parts, I prefer
| start from scratch than running into obscure problems later
| on.
|
| Arguing against parallelization of configure is like
| arguing against faster OS updates. "It's only once a
| week/whatever, come on!" Except it's spread over a billion
| of people time and time again.
| asah wrote:
| For postgresql development, you run configure over and over...
| fishgoesblub wrote:
| Very nice! I always get annoyed when my fancy 16 thread CPU is
| left barely used as one thread is burning away with the rest
| sitting and waiting. Bookmarking this for later to play around
| with whatever projects I use that still use configure.
|
| Also, I was surprised when the animated text at the top of the
| article wasn't a gif, but actual text. So cool!
| fmajid wrote:
| And on macOS, the notarization checks for all the conftest
| binaries generated by configure add even more latency. Apple
| reneged on their former promise to give an opt-out for this.
| SuperV1234 wrote:
| CMake also needs this, badly...
| torarnv wrote:
| Agreed! The CMake Xcode generator is extremely slow because not
| only is it running the configure tests sequentially, but it
| generates a new Xcode project for each of them.
| redleader55 wrote:
| Why do we need to even run most of the things in ./configure? Why
| not just have a file in /etc which is updated when you install
| various packages which ./configure can read to learn various
| stats about the environment? Obviously it will still allow
| setting various things with parameters and create a Makefile, but
| much faster.
| o11c wrote:
| Keep in mind that the build intentionally depends on
| environment variables, people often install non-packaged
| dependencies in bad ways, and cross-compiling is a thing, so
| it's not that simple.
| wolfgang42 wrote:
| Some relevant discussion/pointers to other notes on this sort
| of proposal can be found here:
| https://utcc.utoronto.ca/~cks/space/blog/sysadmin/AutoconfVa...
|
| (The conclusion I distilled out of reading that at the time, I
| think, was that this is actually sort of happening, but slowly,
| and autoconf is likely to stick around for a while, if only as
| a compatibility layer during the transition.)
| pabs3 wrote:
| Not every OS is going to have such a file, and you also don't
| know if it matches the actual system ./configure runs on.
| 1718627440 wrote:
| That does already exist in GNU Autoconf:
| https://www.gnu.org/software/autoconf/manual/autoconf-2.65/h...
| creatonez wrote:
| Noticed an easter egg in this article. The text below "I'm sorry,
| but in the year 2025, this is ridiculous:" is animated entirely
| without Javascript or .gif files. It's pure CSS.
|
| This is how it was done:
| https://github.com/tavianator/tavianator.com/blob/cf0e4ef26d...
| o11c wrote:
| Unfortunately it forgets to HTML-escape the <wchar.h> etc.
| tavianator wrote:
| Whoops! Forgot to do that when I switched from a ``` block to
| raw html
| tavianator wrote:
| Fixed! https://github.com/tavianator/tavianator.com/commit/
| aa9d99d5...
| moralestapia wrote:
| >The purpose of a ./configure script is basically to run the
| compiler a bunch of times and check which runs succeeded.
|
| Wait is this true? (!)
| Am4TIfIsER0ppos wrote:
| Yes.
| klysm wrote:
| The closer and deeper you look into the C toolchains the more
| grossed out you'll be
| acuozzo wrote:
| Hands have to get dirty somewhere. "As deep as The Worker's
| City lay underground, so high above towered the City of
| Metropolis."
|
| The choices are:
|
| 1. Restrict the freedom of CPU designers to some
| approximation of the PDP11. No funky DSP chips. No crazy
| vector processors.
|
| 2. Restrict the freedom of OS designers to some approximation
| of Unix. No bespoke realtime OSes. No research OSes.
|
| 3. Insist programmers use a new programming language for
| these chips and OSes. (This was the case prior to C and
| Unix.)
|
| 4. Insist programmers write in assembly and/or machine code.
| Perhaps a macro-assembler is acceptable here, but this is
| inching toward C.
|
| The cost of this flexibility is gross tooling to make it
| manageable. Can it be done without years and years of accrued
| M4 and sh? Perhaps, but that's just CMake and CMake is
| nowhere near as capable as Autotools & friends are when
| working with legacy platforms.
| klysm wrote:
| There is no real technical justification for the absolute
| shit show that is the modern C toolchain
| moralestapia wrote:
| I like C/C++ a lot, A LOT, and I agree with your comment.
|
| Man, if this got fixed it would be one of the best
| languages to develop for.
|
| My wishlist:
|
| * Quick compilation times (obv.) or some sort of tool
| that makes it feel like an interpreted language, at least
| when you're developing, then do the usual compile step to
| get an optimized binary.
|
| * A F...... CLEAR AND CONSISTENT WAY TO TELL THE
| TOOLCHAIN THIS LIBRARY IS HERE AND THIS ONE IS OVER THERE
| (sorry but, come on ...).
|
| * A single command line argument to output a static
| binary.
|
| * Anything that gets us closer to the "build-once run-
| anywhere" philosophy of "Cosmopolitan Libc". Even if an
| intermediate execution layer is needed. One could say,
| "oh, but this is C, not Java", but it is already _de
| facto_ a broken Java, because you still need an execution
| layer, call it stdlib, GLIB, whatever, if those shared
| libraries are not on your system with their exact version
| matching, your program breaks ... Just stop pretending
| and ship the "C virtual machine", lmao.
| gdwatson wrote:
| Historically, different Unixes varied a lot more than they do
| today. Say you want your program to use the C library function
| foo on platforms where it's available and the function bar
| where it isn't: You can write both versions and choose between
| them based on a C preprocessor macro, and the program will use
| the best option available for the platform where it was
| compiled.
|
| But now the user has to set the preprocessor macro
| appropriately when he builds your program. Nobody wants to give
| the user a pop quiz on the intricacies of his C library every
| time he goes to install new software. So instead the developer
| writes a shell script that tries to compile a trivial program
| that uses function foo. If the script succeeds, it defines the
| preprocessor macro FOO_AVAILABLE, and the program will use foo;
| if it fails, it doesn't define that macro, and the program will
| fall back to bar.
|
| That shell script grew into configure. A configure script for
| an old and widely ported piece of software can check for _a
| lot_ of platform features.
| im3w1l wrote:
| I'm not saying we should send everyone a docker container
| with a full copy of ubuntu, electron and foo.js whether they
| have foo in their c library or not, but maybe there is a
| middle ground?
| moralestapia wrote:
| I think this is a gigantic point in favor of interpreted
| languages.
|
| JS and Python wouldn't be what they are today if you had to
| `./configure` every website you want to visit, lmao.
| cesarb wrote:
| > JS and Python wouldn't be what they are today if you
| had to `./configure` every website you want to visit,
| lmao.
|
| You just gave me a flashback to the IE6 days. Yes, that's
| precisely what we did. On every page load.
|
| It's called "feature detection", and was the recommended
| way of doing things (the bad alternative was user agent
| sniffing, in which you read the user agent string to
| guess the browser, and then assumed that browser X always
| had feature Y; the worst alternative was to simply
| require browser X).
| BobbyTables2 wrote:
| I was really hoping he worked some autoreconf/macro magic to
| transform existing configure.ac files into a parallelized result.
|
| Nice writeup though.
| psyclobe wrote:
| (Luckily?) With c++ your build will nearly always take longer
| then the configuration step.
| bitbasher wrote:
| rust: " _hold my leggings_ "
| LoganDark wrote:
| Since when? I far more often run into CMake taking ages than
| Cargo.
| klysm wrote:
| autotools is a complete disaster. It's mind boggling to think
| that everything we build is usually on top of this arcane system
| gorgoiler wrote:
| On the topic* of having 24 cores and wanting to put them to work:
| when I were a lad the promise was that pure functional
| programming would trivially allow for parallel execution of
| functions. Has this future ever materialized in a modern language
| / runtime? x = 2 + 2 y = 2 * 2 z =
| f(x, y) print(z)
|
| ...where x and y evaluate in parallel _without me having to do
| anything_. Clojure, perhaps?
|
| *And superficially off the topic of this thread, but possibly
| not.
| speed_spread wrote:
| I believe it's not the language preventing it but the nature of
| parallel computing. The overhead of splitting up things and
| then reuniting them again is high enough to make trivial cases
| not worth it. OTOH we now have pretty good compiler
| autovectorization which does a lot of parallel magic if you set
| things right. But it's not handled at the language level
| either.
| deepsun wrote:
| Sure, Tensorflow and Pytorch, here ya go :)
| chubot wrote:
| That looks more like a SIMD problem than a multi-core problem
|
| You want bigger units of work for multiple cores, otherwise the
| coordination overhead will outweigh the work the application is
| doing
|
| I think the Erlang runtime is probably the best use of
| functional programming and multiple cores. Since Erlang
| processes are shared nothing, I think they will scale to 64 or
| 128 cores just fine
|
| Whereas the GC will be a bottleneck in most languages with
| shared memory ... you will stop scaling before using all your
| cores
|
| But I don't think Erlang is as fine-grained as your example ...
|
| Some related threads:
|
| https://news.ycombinator.com/item?id=40130079
|
| https://news.ycombinator.com/item?id=31176264
|
| AFAIU Erlang is not that fast an interpreter; I thought the
| Pony Language was doing something similar (shared nothing?)
| with compiled code, but I haven't heard about it in awhile
| juped wrote:
| There's some sharing used to avoid heavy copies, though GC
| runs at the process level. The implementation is tilted
| towards copying between isolated heaps over sharing, but it's
| also had performance work done over the years. (In fact, if I
| really _want_ to cause a global GC pause bottleneck in
| Erlang, I can abuse persistent_term to do this.)
| fmajid wrote:
| Yes, Erlang's zero-sharing model is what I think Rust should
| have gone for in its concurrency model. Sadly too few people
| have even heard of it.
| chubot wrote:
| That would be an odd choice for a low-level language ...
| languages like C, C++, and Rust let you use whatever the OS
| has, and the OS has threads
|
| A higher level language can be more opinionated, but a low
| level one shouldn't straight jacket you.
|
| i.e. Rust can be used to IMPLEMENT an Erlang runtime
|
| If you couldn't use threads, then you could not implement
| an Erlang runtime.
| steveklabnik wrote:
| Very early on, Rust was like this! But as the language
| changed over time, it because less appropriate.
| gdwatson wrote:
| Superscalar processors (which include all mainstream ones these
| days) do this within a single core, provided there are no data
| dependencies between the assignment statements. They have
| multiple arithmetic logic units, and they can start a second
| operation while the first is executing.
|
| But yeah, I agree that we were promised a lot more automatic
| multithreading than we got. History has proven that we should
| be wary of any promises that depend on a Sufficiently Smart
| Compiler.
| lazide wrote:
| Eh, in this case _not_ splitting them up to compute them in
| parallel _is_ the smartest thing to do. Locking overhead
| alone is going to dwarf every other cost involved in that
| computation.
| gdwatson wrote:
| Yeah, I think the dream was more like, "The compiler looks
| at a map or filter operation and figures out whether it's
| worth the overhead to parallelize it automatically." And
| that turns out to be pretty hard, with potentially painful
| (and nondeterministic!) consequences for failure.
|
| Maybe it would have been easier if CPU performance didn't
| end up outstripping memory performance so much, or if cache
| coherency between cores weren't so difficult.
| lazide wrote:
| I think it has shaken out the way it has, is because
| compile time optimizations to this extent require knowing
| runtime constraints/data at compile time. Which for non-
| trivial situations is impossible, as the code will be run
| with too many different types of input data, with too
| many different cache sizes, etc.
|
| The CPU has better visibility into the actual runtime
| situation, so can do runtime optimization better.
|
| In some ways, it's like a bytecode/JVM type situation.
| PinkSheep wrote:
| If we can write code to dispatch different code paths
| (like has been used for decades for SSE, later AVX
| support within one binary), then we can write code to
| parallelize large array execution based on heuristics.
| Not much different from busy spins falling back to
| sleep/other mechanisms when the fast path fails after ca.
| 100-1000 attempts to secure a lock.
|
| For the trivial example of 2+2 like above, of course,
| this is a moot discussion. The commenter should've lead
| with a better example.
| lazide wrote:
| Sure, but it's a rare situation (by code path) where it
| will beat the CPU's auto optimization, eh?
|
| And when that happens, almost always the developer knows
| it is that type of situation and will want to tune things
| themselves anyway.
| eptcyka wrote:
| Spawning threads or using a thread pool implicitly would
| be pretty bad - it would be difficult to reason about
| performance if the compiler was to make these choices for
| you.
| maccard wrote:
| I think you're fixating on the very specific example.
| Imagine if instead of 2 + 2 it was multiplying arrays of
| large matrices. The compiler or runtime would be smart
| enough to figure out if it's worth dispatching the
| parallelism or not for you. Basically auto vectorisation
| but for parallelism
| lazide wrote:
| Notably - in most cases, there is no way the _compiler_
| can know which of these scenarios are going to happen at
| compile time.
|
| At runtime, the CPU can figure it out though, eh?
| maccard wrote:
| I mean, theoretically it's possible. A super basic
| example would be if the data is known at compile time, it
| could be auto-parallelized, e.g. int
| buf_size = 10000000; auto vec =
| make_large_array(buf_size); for (const auto& val
| : vec) { do_expensive_thing(val);
| }
|
| this could clearly be parallelised. In a C++ world that
| doesn't exist, we can see that it's valid.
|
| If I replace it with int buf_size = 10000000; cin >>
| buf_size; auto vec = make_large_array(buf_size); for
| (const auto& val : vec) { do_expensive_thing(val); }
|
| the compiler could generate some code that looks like: if
| buf_size >= SOME_LARGE_THRESHOLD { DO_IN_PARALLEL } else
| { DO_SERIAL }
|
| With some background logic for managing threads, etc. In
| a C++-style world where "control" is important it likely
| wouldn't fly, but if this was python...
| arr_size = 10000000 buf = [None] * arr_size
| for x in buf: do_expensive_thing(x)
|
| could be parallelised at compile time.
| lazide wrote:
| Which no one really does (data is generally provided at
| runtime). Which is why 'super smart' compilers kinda went
| no where eh?
| maccard wrote:
| I dunno. I was promised the same things when I started
| programming and it never materialised.
|
| It doesn't matter what people do or don't do because this
| is a hypothetical feature of a hypothetical language that
| doesn't exist.
| que-encrypt wrote:
| Jax: https://docs.jax.dev/en/latest/_autosummary/jax.jit.html
| colechristensen wrote:
| there have been fortran compilers which have done auto
| parallelization for decades, i think nvidia released a compiler
| that will take your code and do its best to run it on a gpu
|
| this works best for scientific computing things that run
| through very big loops where there is very little interaction
| between iterations
| inejge wrote:
| > ...where x and y evaluate in parallel _without me having to
| do anything_.
|
| I understand that yours is a very simple example, but a) such
| things are already parallelized even on a single thread thanks
| to all the internal CPU parallelism, b) one should always be
| mindful of Amdahl's law, c) truly parallel solutions to various
| problems tend to be structurally different from serial ones _in
| unpredictable ways_ , so there's no single transformation, not
| even a single family of transformations.
| snackbroken wrote:
| Bend[1] and Vine[1] are two experimental programming languages
| that take similar approaches to automatically parallelizing
| programs; interaction nets[3]. IIUC, they basically turn the
| whole program into one big dependency graph, then the runtime
| figures out what can run in parallel and distributes the work
| to however many threads you can throw at it. It's also my
| understanding that they are currently both quite slow, which
| makes sense as the focus has been on making `write
| embarrassingly parallelizable program -> get highly
| parallelized execution` work at all until recently. Time will
| tell if they can manage enough optimizations that the approach
| enables you to get reasonably performing parallel functional
| programs 'for free'.
|
| [1] https://github.com/HigherOrderCO/Bend [2]
| https://github.com/VineLang/vine [3]
| https://en.wikipedia.org/wiki/Interaction_nets
| fweimer wrote:
| There have been experimental parallel graph reduction machines.
| Excel has a parallel evaluator these days.
|
| Oddly enough, functional programming seems to be a poor fit for
| this because the fanout tends to be fairly low: individual
| operations have few inputs, and single-linked lists and trees
| are more common than arrays.
| malkia wrote:
| "./configure" has always been the wrong thing for a very long
| long time. Also slow...
| codys wrote:
| I did something like the system described in this article a few
| years back. [1]
|
| Instead of splitting the "configure" and "make" steps though, I
| chose to instead fold much of the "configure" step into the
| "make".
|
| To clarify, this article describes a system where `./configure`
| runs a bunch of compilations in parallel, then `make` does stuff
| depending on those compilations.
|
| If one is willing to restrict what the configure can detect/do to
| writing to header files (rather than affecting variables
| examined/used in a Makefile), then instead one can have
| `./configure` generate a `Makefile` (or in my case, a ninja
| file), and then have the "run the compiler to see what defines to
| set" and "run compiler to build the executable" can be run in a
| single `make` or `ninja` invocation.
|
| The simple way here results in _almost_ the same behavior: all
| the "configure"-like stuff running and then all the "build" stuff
| running. But if one is a bit more careful/clever and doesn't
| depend on the entire "config.h" for every "<real source>.c"
| compilation, then one can start to interleave the work perceived
| as "configuration" with that seen as "build". (I did not get that
| fancy)
|
| [1]: https://github.com/codyps/cninja/tree/master/config_h
| tavianator wrote:
| Nice! I used to do something similar, don't remember exactly
| why I had to switch but the two step process did become
| necessary at some point.
|
| Just from a quick peek at that repo, nowadays you can write
|
| #if __has_attribute(cold)
|
| and avoid the configure test entirely. Probably wasn't a thing
| 10 years ago though :)
| codys wrote:
| yep. C's really come a long way with the special operators
| for checking if attributes exist, if builtins exist, if
| headers exist, etc.
|
| Covers a very large part of what is needed, making fewer and
| fewer things need to end up in configure scripts. I think
| most of what's left is checking for items (types, functions)
| existence and their shape, as you were doing :). I can dream
| about getting a nice special operator to check for
| fields/functions, would let us remove even more from
| configure time, but I suspect we won't because that requires
| type resolution and none of the existing special operators do
| that.
| mikepurvis wrote:
| You still need a configure step for the "where are my deps"
| part of it, though both autotools and CMake would be way
| faster if all they were doing was finding, and not any
| testing.
| o11c wrote:
| The problem is that the various `__has_foo` aren't actually
| reliable in practice - they don't tell you if the attribute,
| builtin, include, etc. actually _works_ the way it 's
| supposed to without bugs, or if it includes a particular
| feature (accepts a new optional argument, or allows new
| values for an existing argument, etc.).
| aaronmdjones wrote:
| #if __has_attribute(cold)
|
| You should use double underscores on attribute names to avoid
| conflicts with macros (user-defined macros beginning with
| double underscores are forbidden, as identifiers beginning
| with double underscores are reserved). #if
| __has_attribute(__cold__) # warning "This works too"
| #endif static void __attribute__((__cold__))
| foo(void) { // This works too }
| throwaway81523 wrote:
| GNU Parallel seems like another convenient approach.
| fmajid wrote:
| It has no concept of dependencies between tasks, or doing a
| topological sort prior to running the task queue. GNU Make's
| parallel mode (-j) has that.
| andreyv wrote:
| Autoconf can use cache files [1], which can greatly speed up
| repeated configures. With cache, a test is run at most once.
|
| [1] https://www.gnu.org/savannah-
| checkouts/gnu/autoconf/manual/a...
| fanf2 wrote:
| Sadly the cache files don't record enough about the environment
| to be usable if you change configure options. They are
| generally unreliable.
| iforgotpassword wrote:
| The other issue is that people seem to just copy
| configure/autotools scripts over from older or other projects
| because either they are lazy or don't understand them enough to
| do it themselves. The result is that even with relatively modern
| code bases that only target something like x86, arm and maybe
| mips and only gcc/clang, you still get checks for the size of an
| int, or which header is needed for printf, or whether long long
| exists.... And then the entire code base never checks the
| generated macros in a single place, uses int64_t and never checks
| for stint.h in the configure script...
| rbanffy wrote:
| It's always wise to be specific about the sizes you want for
| your variables. You don't want your ancient 64-bit code to act
| differently on your grandkids 128-bit laptops. Unless, of
| course, you want to let the compiler decide whether to leverage
| higher precision types that become available after you retire.
| IshKebab wrote:
| I don't think it's fair to say "because they are lazy or don't
| understand". Who would _want_ to understand that mess? It isn
| 't a virtue.
|
| A fairer criticism would be that they have no sense to use a
| more sane build system. CMake is a mess but even that is
| faaaaar saner than autotools, and probably more popular at this
| point.
| xiaoyu2006 wrote:
| Autotools use M4 to meta-program a bash script that meta-
| programs a bunch of C(++) sources and generates C(++) sources
| that utilizes meta-programming for different configurations;
| after which the meta-programmed script, again, meta-programs
| monolithic makefiles.
|
| This is peak engineering.
| krior wrote:
| Sounds like a headache. Is there a nice Python lib to
| generate all this M4-mumbo-jumbo?
| lionkor wrote:
| "Sounds complicated. I want it to throw exceptions and
| have significant whitespace on top of all that
| complexity!"
| IshKebab wrote:
| It was obviously a joke.
| knorker wrote:
| autotools is the worst, except for all the others.
|
| I'd like to think of myself as reasonable, so I'll just say
| that reasonable people may disagree with your assertion that
| cmake is in any way at all better than autotools.
| IshKebab wrote:
| Nope, autotools is _actually_ the worst.
|
| There is no way in hell anyone reasonable could say that
| Autotools is better than CMake.
| tpoacher wrote:
| And presumably the measure by which they are judged to be
| reasonable or not is if they prefer CMake over Autotools,
| correct? :D
| ordu wrote:
| Correct. I avoid autotools and cmake as much as I can.
| I'd better write Makefiles by hand. But when I need to
| deal with them, I'd prefer cmake. I can can modify
| CMakeLists.txt in a meaningful way and get the results I
| want. I wouldn't touch autotools build system because I
| never was able to figure out which of the files is the
| configuration that is meant to be edited by hands and not
| generated by scripts in other files. I tried to dig the
| documentation but I never made it.
| pletnes wrote:
| Configure-make is easier to use for someone else.
| Configuring a cmake based project is slightly harder. In
| every other conceivable way I agree 100% (until someone
| can convince me otherwise)
| jeroenhd wrote:
| I've seen programs replicate autotools in their
| Makefiles. That's actually worse. I've also used the old
| Visual Studio build tooling.
|
| Autotools is terrible, but it's not the worst.
| knorker wrote:
| My experience with cmake, though dated, is that it's
| simpler because it simply cannot do what autotools can
| do.
|
| It really smelled of "oh I can do this better", and you
| rewrite it, and as part of rewriting it you realise oh,
| this is why the previous solution was complicated. It's
| because the problem is actually more complex than I
| though.
|
| And then of course there's the problem where you need to
| install on an old release. But the thing you want to
| install requires a newer cmake (autotools doesn't have
| this problem because it's self contained). But this is an
| old system that you cannot upgrade, because the vendor
| support contract for what the server runs would be
| invalidated. So now you're down a rabbit hole of trying
| to get a new version of cmake to build on an unsupported
| system. Sigh. It's less work to just try to construct
| `gcc` commands yourself, even for a medium sized project.
| Either way, this is now your whole day, or whole week.
|
| If only the project had used autotools.
| IshKebab wrote:
| No, CMake can do everything Autotools does, but a hell of
| a lot simpler and without checking for a gazillion flags
| and files that you don't actually need to but you're
| checking them anyway because you copied the script from a
| someone who copied the script from... all the way back to
| the 90s when C compilers actually existed that didn't
| have stdint.h or whatever.
|
| CMake is easy to upgrade. There are binary downloads. You
| can even install it with pip (although recently the
| Python people in their usual wisdom have broken that).
| atq2119 wrote:
| CMake can't do everything autotools does, but the stuff
| autotools does which CMake doesn't isn't relevant anymore
| in today's world.
|
| The fundamental curse of build systems is that they are
| inherently complex beasts that hardly anybody has to work
| with full-time, and so hardly anybody learns them to the
| necessary level of detail.
|
| The only way out of this is to simplify the problem
| space. Sometimes for real (by reducing the number of
| operating systems and CPU architectures that are relevant
| -- e.g. CMake vs. Autotools) and sometimes by somewhat
| artificially restricting yourself to a specific niche
| (e.g. Cargo).
| smartmic wrote:
| I took the trouble (and even spent the money) to get to grips
| with autotools in a structured and detailed way by buying a
| book [1] about it and reading as much as possible. Yes, it's
| not trivial, but autotools are not witchcraft either, but as
| written elsewhere, a masterpiece of engineering. I have dealt
| with it without prejudice and since then I have been more of
| a fan of autotools than a hater. Anyway, I highly recommend
| the book and yes, after reading it, I think autotools is
| better than its reputation.
|
| [1] https://nostarch.com/autotools2e
| NekkoDroid wrote:
| > CMake is a mess but even that is faaaaar saner than
| autotools, and probably more popular at this point.
|
| Having done a deep dive into CMake I actually kinda like it
| (really modern cmake is actually very nice, except the DSL
| but that probably isn't changing any time soon), but that is
| also the problem: I had to do a deep dive into learning it.
| epcoa wrote:
| > either they are lazy or don't understand them enough to do it
| themselves.
|
| Meh, I used to keep printed copies of autotools manuals. I
| sympathize with all of these people and acknowledge they are
| likely the sane ones.
| Levitating wrote:
| I've had projects where I spent more time configuring
| autoconf than actually writing code.
|
| That's what you get for wanting to use a glib function.
| rollcat wrote:
| This.
|
| Simple projects: just use plain C. This is dwm, _the_ window
| manager that spawned a thousand forks. No . /configure in
| sight: <https://git.suckless.org/dwm/files.html>
|
| If you run into platform-specific stuff, just write a
| ./configure in simple and plain shell:
| <https://git.suckless.org/utmp/file/configure.html>. Even if
| you keep adding more stuff, it shouldn't take more than 100ms.
|
| If you're doing something really complex (like say, writing a
| compiler), take the approach from Plan 9 / Go. Make a
| conditionally included header file that takes care of platform
| differences for you. Check the $GOARCH/u.h files here:
|
| <https://go.googlesource.com/go/+/refs/heads/release-
| branch.g...>
|
| (There are also some simple OS-specific checks:
| <https://go.googlesource.com/go/+/refs/heads/release-
| branch.g...>)
|
| This is the reference Go compiler; it can target any platform,
| from any host (modulo CGO); later versions are also self-
| hosting and reproducible.
| knorker wrote:
| Interesting that you would bring up Go. Go is probably the
| most head-desk language of all for writing portable code. Go
| will fight you the whole way.
|
| Even plain C is easier.
|
| You can have a whole file be for OpenBSD, to work around that
| some standard library parts have different types on different
| platforms.
|
| So now you need one file for all platforms and architectures
| where Timeval.Usec is int32, and another file for where it is
| int64. And you need to enumerate in your code all GOOS/GOARCH
| combinations that Go supports or will ever support.
|
| You need a file for Linux 32 bit ARM (int32/int32 bit), one
| for Linux 64 bit ARM (int64,int64), one for OpenBSD 32 bit
| ARM (int64/int32), etc.... Maybe you can group them, but this
| is just one difference, so in the end you'll have to do one
| file per combination of OS and Arch. And all you wanted was
| pluggable "what's a Timeval?". Something that all build
| systems solved a long time ago.
|
| And then maybe the next release of OpenBSD they've changed
| it, so now you cannot use Go's way to write portable code at
| all.
|
| So between autotools, cmake, and the Go method, the Go method
| is by far the worst option for writing portable code.
| rollcat wrote:
| I have specifically given an example of u.h defining types
| such as i32, u64, etc to avoid running a hundred silly
| tests like "how long is long", "how long is long long",
| etc.
|
| > So now you need one file for all platforms and
| architectures where Timeval.Usec is int32, and another file
| for where it is int64. And you need to enumerate in your
| code all GOOS/GOARCH combinations that Go supports or will
| ever support.
|
| I assume you mean [syscall.Timeval]? $ go
| doc syscall [...] Package syscall contains
| an interface to the low-level operating system
| primitives. The details vary depending on the underlying
| system [...].
|
| Do you have a specific use case for [syscall], where you
| cannot use [time]?
| knorker wrote:
| Yeah I've had specific use cases when I need to use
| syscall. I mean... if there weren't use cases for syscall
| then it wouldn't exist.
|
| But not only is syscall an example of portability done
| wrong for APIs, as I said it's also an example of it
| being implemented in a dumb way causing needless work and
| breakage.
|
| Syscall as implementation leads by bad example because
| it's the only method Go supports.
|
| Checking for GOARCH+GOOS tuple equality for portable code
| is a known anti pattern, for reasons I've said and other
| ones, that Go still decided to go with.
|
| But yeah, autotools scripts often check for way more
| things than actually matter. Often because people copy
| paste configure.ac from another project without trimming.
| Levitating wrote:
| I want to agree with you, but as someone who regularly
| packages software for multiple distributions I really would
| prefer people using autoconf.
|
| Software with custom configure scripts are especially dreaded
| amongst packagers.
| Joker_vD wrote:
| Why, again, software in the Linux world has to be packaged
| for multiple distributions? On the Windows side, if you
| make installer for Windows 7, it will still work on Windows
| 11. And to the boot, you don't have to go through some
| Microsoft-approved package distibution platform and its
| approval process: you _can_ , of course, but you don't have
| to, you can distribute your software by yourself.
| amelius wrote:
| What I'd like to see is a configure with guarantees that if the
| configure succeeds, then the build will succeed too.
| Chocimier wrote:
| It is possible in theory to speed up existing configure scripts
| by switching interpreter from /bin/sh to something that scans
| file, splits it to independent blocks and runs them in parallel.
|
| Is there any such previous work?
| tavianator wrote:
| I was looking for a relevant paper and found this one, which
| isn't what I was looking for but is related:
| https://sigops.org/s/conferences/hotos/2023/papers/liargkova...
| rbanffy wrote:
| I get the impression configure not only runs sequentially, but
| incrementally, where previous results can change the results of
| tests run later. Were it just sequential, running multiple tests
| as separate processes would be relatively simple.
|
| Also, you shouldn't need to run ./configure every time you run
| make.
| fmajid wrote:
| No, but if you are doing something like rebuilding a distro's
| worth of packages from source from scratch, the configure step
| starts to dominate. I build around 550, and it takes around 6
| hours on a single node.
|
| Most checks are common, so what can help is having a shared
| cache for all configure scripts so if you have 400 packages to
| rebuild, it doesn't check 400 times if you should use flock or
| fcntl. This approach is described here:
| https://jmmv.dev/2022/06/autoconf-caching.html
|
| It doesn't help that autoconf is basically abandonware, with
| one forlorn maintainer trying to resuscitate it, but creating
| major regressions with new releases:
| https://lwn.net/Articles/834682/
| rbanffy wrote:
| > It doesn't help that autoconf is basically abandonware
|
| A far too common tragedy of our age.
| pdimitar wrote:
| I don't disagree with that general premise but IMO
| autotools being (gradually?) abandoned is logical. It
| served its purpose. Not saying it's still not very useful
| in the darker shadows of technology but for a lot of stuff
| people choose Zig, Rust, Golang etc. today, with fairly
| good reasons too, and those PLs usually have fairly good
| packaging and dependency management and building subsystems
| built-in.
|
| Furthermore, there really _has_ to be a better way to do
| what autotools is doing, no? Sure, there are some
| situations where you only have some bare sh shell and
| nothing much else but I 'd venture to say that in no less
| than 90% of all cases you can very easily have much more
| stuff installed -- like the `just` task runner tool, for
| example, that solves most of the problems that `make`
| usually did.
|
| If we are talking in terms of our age, we also have to take
| into account that there's too much software everywhere! I
| believe some convergence has to start happening. There is
| such a thing as too much freedom. We are dispersing so much
| creative energy for almost no benefit of humankind...
| tmtvl wrote:
| As a user I highly appreciate ./configure for the _--help_ flag,
| which usually tells me how to build a program with or without
| particular functionalities which may or may not be applicable to
| my use-case.
| saagarjha wrote:
| I actually think this is possible to improve if you have the
| autoconf files. You could parse it to find all the checks you
| know can run in parallel and run those.
| mrrogot69 wrote:
| Good idea!
| tekknolagi wrote:
| This doesn't mention another use of configure which is manually
| enabling or disabling features via --with-X -- I might send in a
| PR for that
| tavianator wrote:
| My actual "production" implementation of this concept does
| support that:
| https://github.com/tavianator/bfs/blob/main/configure
|
| But I wanted the blog post sized version to be simpler for
| exposition.
| gitroom wrote:
| Man, I've spent way too many hours wrestling with build systems
| like autotools and cmake and they both make me want to just toss
| my laptop sometimes - feels like it's way harder than it needs to
| be each time. You ever think we'll actually get to a point where
| building stuff just works, no endless config scripts or chasing
| weird cross-platform bugs?
| pdimitar wrote:
| I asked myself that probably no less than 200 times.
|
| Thinking in terms of technological problems, that should be a
| 100% solved problem at this point! Build a DAG of all tasks and
| just solve it and invoke stuff, right? Well, not exactly. A lot
| of the build systems don't allow you to specify if something is
| safe to execute in parallel. And that's important because
| sometimes even though it _seems_ three separate tasks are
| completely independent and can be executed in parallel they 'd
| still share f.ex. a filesystem-level cache and would compete
| for it and likely corrupt it, so... not so fast. :(
|
| But I feel the entire tech sector is collectively dragging
| their feet on this. We should be able to devise a good build
| system that allows us to encode more constraints and
| requirements than the current ones, and then simply build a
| single solver for its DAG and be done with it. How frakkin
| difficult can that be?
|
| Of course the problem is actually social, not technical.
| Meaning that most programmers wouldn't ever migrate if the
| decision was left to them. I still would not care about them
| though; if I had the free time and energy then I would
| absolutely work on it. It's something that might swallow a lot
| of energy but if solved properly (meaning it has to be
| reasonably extensive without falling into the trap that no two
| projects would even use the same implementation -- let us NOT
| do that!) then it will be solved only once and never again.
|
| We can dream.
|
| But again, it very much does seem like a very solvable problem
| to me.
___________________________________________________________________
(page generated 2025-04-26 23:01 UTC)