[HN Gopher] Proposed futex2 allows Linux to mimic the NT kernel ...
       ___________________________________________________________________
        
       Proposed futex2 allows Linux to mimic the NT kernel for better wine
       performance
        
       Author : marcodiego
       Score  : 309 points
       Date   : 2021-04-28 12:44 UTC (10 hours ago)
        
 (HTM) web link (lkml.org)
 (TXT) w3m dump (lkml.org)
        
       | Shadonototro wrote:
       | if linux people focused on offering compelling ecosystem for
       | app/game distribution, they wouldn't need to bloat their os with
       | slow and bloated emulators like wine
       | 
       | the reason why linux isn't gaining shares on steam anymore is
       | because valve openly said linux has no ecosystem and is forced to
       | emulate windows -- proton
        
         | ijlx wrote:
         | Wine is not an emulator, it is a compatibility layer for
         | Windows system calls on a Unix-like platform. If something runs
         | on wine, the performance overhead should be unnoticeable/very
         | small for the typical user.
        
           | chungy wrote:
           | Ironically, it seems quite often that Wine runs apps/games
           | faster than Windows does.
        
           | encryptluks2 wrote:
           | A compatibility layer wouldn't require that you essentially
           | emulate the same file system layout, registry and
           | functionality. It is basically emulating a Windows
           | environment.
        
             | anthk wrote:
             | So Windows 10 emulates Windows XP/2000/98 too?
        
               | encryptluks2 wrote:
               | Pretty much... they took a pile of crap and kept adding
               | onto it.
        
               | Dah00n wrote:
               | That has nothing in common with emulating though. Running
               | ARM code in x64 is emulating. If adding code were
               | emulating than Linux is emulating Linux v0.1. Wine isn't
               | an emulator. PCSX2 is an emulator.
        
               | chungy wrote:
               | The word "emulate" is overloaded, but because there's
               | often an assumption of ISA emulation, Wine for a time
               | called itself "WINE Is Not an Emulator" to prevent said
               | assumption.
               | 
               | Taking "emulate" at its dictionary-definition, which is
               | to imitate or copy, it's easy to argue that Wine is an
               | emulator, because it copies and imitates the Windows ABI
               | and API (the earliest versions of Wine even called itself
               | a "WINdows Emulator"). They just happen to draw a hard
               | line against emulating x86.
        
             | samus wrote:
             | In many cases it is indeed not so simple:
             | 
             | * Most nontrivial Windows program would get very confused
             | if exposed to a trivially mapped Unix filesystem. It's not
             | so bad if every program would use environment variables to
             | locate stuff, but you can't discount the possibility that
             | some of them hardcode paths. Also, application data in
             | Windows is usually kept in a common folder, while in Unix
             | filesystems it is usually split across `/usr/bin`,
             | `/usr/share`, `/usr/lib` and so on.
             | 
             | * Many Windows applications store their configuration in
             | the registry. That part is not too bad though; it should be
             | completely doable in userspace.
             | 
             | * The application might issue system calls that have
             | different semantics, or that don't even exist on Linux. TA
             | is significant because it paves the way to an efficient
             | implementation of one of these system calls.
        
               | encryptluks2 wrote:
               | > Most nontrivial Windows program would get very confused
               | if exposed to a trivially mapped Unix filesystem. It's
               | not so bad if every program would use environment
               | variables to locate stuff, but you can't discount the
               | possibility that some of them hardcode paths. Also,
               | application data in Windows is usually kept in a common
               | folder, while in Unix filesystems it is usually split
               | across `/usr/bin`, `/usr/share`, `/usr/lib` and so on.
               | 
               | The application data in windows would not be the same as
               | `/usr/` paths. It would be the equivalent of `.config`
               | and `.local`. The last thing anyone in Linux would want
               | is for Windows to start populating their system folders,
               | let alone probably creating hundreds of top-level folders
               | in their `.config` or `.local` directory.
               | 
               | > Many Windows applications store their configuration in
               | the registry. That part is not too bad though; it should
               | be completely doable in userspace.
               | 
               | The Windows registry is an atrocious monster that should
               | have died already. You can rarely ever truly delete an
               | app in Windows, as there will be sometimes hundreds of
               | registry keys leftover.
               | 
               | > TA is significant because it paves the way to an
               | efficient implementation of one of these system calls.
               | 
               | If there is mutual benefit outside of Windows, then sure
               | why not. If it is built just for better Wine
               | compatibility with Windows then I hope to see a rant by
               | Linus telling them how ridiculous they are.
        
         | fnord77 wrote:
         | w.i.n.e = Wine Is Not an Emulator
        
       | a-dub wrote:
       | what's the nt api call called? WaitForMultipleObjects? takes me
       | back 20 years just thinking about it...
        
         | dboreham wrote:
         | The NT api would begin ntXxxx WaitForMultipleObjects is a Win32
         | function.
        
           | ziml77 wrote:
           | If we're being pedantic here, it's not Win32.
           | 
           | > (Note that this was formerly called the Win32 API. The name
           | Windows API more accurately reflects its roots in 16-bit
           | Windows and its support on 64-bit Windows.)
           | 
           | Though Win32 is going to stick around for a while as can be
           | seen in the URL for the page that's quoted from:
           | https://docs.microsoft.com/en-
           | us/windows/win32/apiindex/wind...
        
             | a-dub wrote:
             | i think it's still called win32 today because win32 was a
             | massive overhaul. 16 bit windows was loaded with that far
             | and near pointer gunk to stretch those 16 bit pointers
             | which win32 was able to do away with thanks to larger
             | pointer widths.
             | 
             | regarding win32 vs. the nt api. i'm not super familiar with
             | the pedantic distinctions there, but i do know, much to the
             | programmer's chagrin, the thousands of windows api
             | functions have varying levels of kernel entry, where i
             | always have appreciated the (relatively) small number of
             | unix style syscalls and the clear delineation between
             | kernel and userspace api. (although i suppose futexes blur
             | that, in name at least!)
        
               | Joker_vD wrote:
               | Well, since introduction of vsyscall and vDSO, the Linux
               | syscalls too have varying levels of kernel entry.
        
           | beatrobot wrote:
           | NtXxxx or ZwXxxx
        
           | poizan42 wrote:
           | I don't know why you are being downvoted, you are completely
           | right. The NT api function is NtWaitForMultipleObjects. The
           | Win32 apis WaitForMultipleObjects and
           | WaitForMultipleObjectsEx are thin wrappers over this.
        
             | AndriyKunitsyn wrote:
             | Is this NT API actually possible to be called from
             | userspace? Or even from drivers?
        
               | poizan42 wrote:
               | Yes of course, they are all exported from ntdll.dll (in
               | userspace) and ntoskrnl.exe (in kernelspace). You may
               | need to link dynamically or make your own import
               | libraries for many of them though.
        
               | a-dub wrote:
               | think of it as like select and pthread_cond_wait rolled
               | into one generic api for waiting on one or more things
               | from the kernel.
               | 
               | it's actually pretty nice how generic they made it.
        
               | andrealmeid wrote:
               | Sure, here's an userspace example:
               | https://docs.microsoft.com/en-
               | us/windows/win32/sync/waiting-...
        
               | leeter wrote:
               | Based on the docs kernel mode drivers would call
               | KeWaitForMultipleObjects instead it appears. User space
               | should just call WaitForMultipleObjects. While
               | NtWaitForMultipleObjects and ZwWaitForMultipleObjects
               | exist they seem to be undocumented which means they
               | probably have funky behavior. Anything in KernelBase.dll
               | is a pretty thin wrapper that I'd be hesitant to call
               | Win32 so much as "really thin syscall wrapper"
        
         | tobias3 wrote:
         | WaitOnAddress and co. But that one was added with Windows 8, so
         | it can't take you back 20 years :)
        
           | gpderetta wrote:
           | WFMO is indeed the api that the new futex_waitv is supposed
           | to help accelerate under wine.
        
         | Joker_vD wrote:
         | By the way, hearing about futex implementation and their API
         | considerations should also bring everyone about 20 years back.
        
       | marcodiego wrote:
       | Artificial benchmarks show a 4% improvement in some cases, which
       | is huge. There aren't many tests to demonstrate the improvements
       | on common use cases but there are already reports of subjective
       | improvements with games having less stutter:
       | https://www.phoronix.com/forums/forum/phoronix/latest-phoron...
        
         | savant_penguin wrote:
         | Really curious, why is 4% improvement huge?
        
           | chalst wrote:
           | "Huge" means different things in different contexts. A
           | compiler optimisation that delivered a 4% increase in speed
           | for a whole class of well-tuned C programs would be regarded
           | as huge by people who follow compiler technology, but would
           | not strike most users of computers as a big deal. I take it
           | "huge" is meant in this sense of extraordinary within a class
           | of potential improvements.
           | 
           | If you can accumulate a series of 4% improvements, then even
           | the computer consumer will start to see the significance.
        
             | Someone wrote:
             | I can't find where the 4% number comes from, but I expect
             | that it comes from a micro benchmark that does little more
             | than futex calls. Reason: if it is from a benchmark that
             | represents real-world use, real-world use must spend at
             | least 4% of its time in futex calls. If that were the case,
             | somebody would have seen that, and work would have been
             | spent on this.
             | 
             | So no, this won't give you 4% in general.
        
               | jstanley wrote:
               | > If that were the case, somebody would have seen that,
               | and work would have been spent on this.
               | 
               | I would think that the linked patch is good evidence that
               | work is being spent on this.
        
               | xxs wrote:
               | The microbenchmark part is part of the article, no idea
               | why the downvotes here.
        
               | Dah00n wrote:
               | Because the benchmark isn't doing what GP days it does.
        
               | chalst wrote:
               | From the lkml post, it's neither from a real-world case
               | nor from an especially constructed futex benchmark, but
               | from an existing kernel benchmark where it was thought it
               | would be interesting to see how it performed with the new
               | call.
        
             | jhgb wrote:
             | A 4% improvement of the 100m dash world record would have
             | been considered phenomenal, I imagine.
        
               | alexeldeib wrote:
               | This reminded me of the 4 minute mile. The first 4 minute
               | mile was a 3:59 run by Roger Bannister in 1954. Prior to
               | that, the record was a 4:01 run in 1945.
               | 
               | A 4% improvement on the 4:01 would be a 3:51. A 3:51
               | wasn't achieved until 1966, about 20 years later.
               | 
               | So 4% can be quite a jump!
        
               | dbpatterson wrote:
               | Extending your point, 4% more would go down to just under
               | 3:42.... which has not yet happened, over a half century
               | later.
        
               | zimpenfish wrote:
               | To put that into context, Bolt's 9.58s (current WR,
               | considered phenomenal in and of itself, I think?) is
               | about 1.7% faster than Powell's 9.74s (last pre-Bolt WR).
               | 
               | A 4% improvement on Bolt's 9.58s would be about 9.19s -
               | probably not something we'll see for several decades, if
               | ever.
        
               | jamiek88 wrote:
               | I wonder if I'll ever see a Bolt again in my lifetime.
               | 
               | Such dominance and grace.
               | 
               | He jogged over the line with that record too.
        
           | fefe23 wrote:
           | Intel Xeons have huge price differences at the top tier for
           | relatively minor performance differences.
           | 
           | So if you get 4% for free here, that could mean hundreds of
           | dollars savings compared to having to buy the next tier
           | hardware. Multiplied by the number of Intel customers.
           | 
           | 4% could mean the difference between "I can still use my old
           | computer" and "I need to buy an upgrade".
        
             | ltbarcly3 wrote:
             | The price difference is due to artificial market
             | segmentation, for example only xeons support ecc. It is not
             | due to performance.
             | 
             | (consumer chips are usually basically identical, just with
             | functionality disabled)
        
           | k1rcher wrote:
           | This is interesting. Does anyone know of previous figures on
           | Windows virtualization improvement?
        
             | delroth wrote:
             | It's not 4% improvement for the wine use case, it's 4%
             | improvement in existing synthetic benchmarks for the futex
             | syscall. And a 21% regression in performance in another
             | synthetic benchmark.
             | 
             | It's likely the wine WMO implementation is getting a much
             | more significant increase in performance than 4%, but the
             | patch description doesn't provide numbers.
        
           | tombert wrote:
           | I don't know much about Wine, but I'm speaking in
           | generalities, but I've worked jobs where we spend weeks to
           | get <10% performance increases. Very often that last bit of
           | optimization is the hardest part.
        
           | [deleted]
        
           | marcodiego wrote:
           | Because you don't have to change one line of your previously
           | running software to get this benefit or replace your
           | hardware. Of course, will need a new version of the kernel
           | and wine, but that is a matter of time to hit distro
           | repositories. Also, consider that this will accumulate when
           | combined with new user space libs, servers, compilers,
           | drivers... Software will improve without any change just by
           | keeping your distro updated.
           | 
           | Yes, a 4% improvement is huge.
        
           | sp332 wrote:
           | Well, it's a relative measure. There's been a ton of
           | optimization already. All the low-hanging fruit is gone. It's
           | probably been a long time since a single optimization gained
           | 4%.
        
         | xxs wrote:
         | > 4% performance improvement for up to 800 threads.
         | 
         | 800 threads on similar WaitForMultipleObject is already a
         | horrific design if you even remotely care about performance.
         | The only viable case for 800 threads (and wine) would be
         | blocking IO which is an extremely boring one.
        
           | asveikau wrote:
           | IIRC, WaitForMultipleObjects can only block on 63 handles.
           | 800 threads blocked on 63 handles seems pretty weird.
        
             | xxs wrote:
             | >IIRC, WaitForMultipleObjects can only block on 63 handles
             | 
             | Yup (64 though), it's an extra annoying limitation. It has
             | been there since the 90s, WinNT (cant recall win95 version
             | as I never used it with win95). It's limited by
             | MAXIMUM_WAIT_OBJECTS which is 64.
             | 
             | For instance Java implements non-blocking IO
             | (java.nio.Selector) by using multiple threads and sync
             | between them to effectively achieve linux' epoll.
             | 
             | ----
             | 
             | my point was mostly that the 'huge' 4% happens with massive
             | amounts of threads and WaitForMultipleObjects won't see
             | even that... so kind of click-baity. Flip note: sync.
             | between threads can be done better than relying on OS
             | primitives aside for real wait/notify stuff (which indeed
             | it's implemented via futex)
        
           | usefulcat wrote:
           | Sure, but that's irrelevant for wine, which doesn't get to
           | choose how the things that it's running (e.g. games) are
           | implemented.
        
             | xxs wrote:
             | Of course. Yet, I am be massively surprised to see so
             | heavily multithreaded games with massive contention on
             | WaitForMultipleObjects. No one programs that way (again,
             | save for blocking IO... and then it totally doesn't matter)
        
               | monocasa wrote:
               | Unfortunately, people do program that way, and I've seen
               | it way more in closed source Windows code than in the
               | open source world where someone will (rightly) call them
               | out.
               | 
               | For instance, in a previous job, a C++ codebase existed
               | where the architect had horrendously misunderstood the
               | actor model and had instated a "all classes must extend
               | company::Thread" rule in the coding standard. So close,
               | yet so incredibly far away.
               | 
               | At the end of the day, wine (like windows) still has to
               | run these programs and do it's best with terrible choices
               | that were made in the past.
        
               | xxs wrote:
               | Having a policy to have to extend something is proper bad
               | already. For actors likely you need lock-free queue and
               | much preferably a bounded one - so each message is a CAS
               | (very likely uncontended) and there is an upper limit how
               | much stuff can be enqueued.
               | 
               | I do believe there are tons of incredible horrid code
               | (esp. when it comes to concurrency) - but most games tend
               | to use already established engines, less likely to have
               | glaring concurrency design blunders.
        
               | monocasa wrote:
               | I mean, yeah, you don't make a choice like that without
               | making a lot of other terrible choices too. Like I said
               | "horrendously misunderstood the actor model". And as you
               | seem to be picking up on, they had a very C++98 view at
               | best of how inheritance should work.
               | 
               | In case it's not clear to anyone reading along: if you
               | find yourself doing anything like this, take a long, deep
               | look in the mirror; you're almost certainly making the
               | wrong choice. Best I can say is that the architect
               | learned from some of their choices in the years since
               | from the resulting consequences.
               | 
               | However, that code is still out there. The engineers have
               | tried to rewrite since then, but it turns out that
               | rewriting isn't adding new features. I don't agree with
               | the company's calculus on the matter, but that's a tale
               | as old as time.
               | 
               | And you'd be surprised how many games aren't written
               | well. Writing a game engine is a bit of a rite of passage
               | among young coders, and a surprising subset of those
               | actually ship. In my experience, a lack of maturity in
               | the field and (we'll be courteous and say) 'questionable'
               | threading models (and as you point out inheritance models
               | too) go hand in hand.
               | 
               | The point at the end of the day is that Wine users (like
               | Windows users) expect the runtime to do the best it can
               | even against terribly written applications. The users
               | want to run that code, it isn't going to be fixed, so
               | anything Wine can do better to paper over the (incredibly
               | obvious to someone skilled in the art) deficiencies of
               | that code is in their users' benefit.
        
         | phkahler wrote:
         | - For requeue, I measured an average of 21% decrease in
         | performance compared to the original futex implementation. This
         | is expected given the new design with individual flags. The
         | performance trade-offs are explained at patch 4 ("futex2:
         | Implement requeue operation").
         | 
         | It's not clear that this is a win overall. But hey, if Windows
         | games run better under wine...
        
           | Denvercoder9 wrote:
           | The old interface will remain available, so if requeue
           | performance is important, you can keep using that. Also at
           | least one kernel maintainer considers requeue to be
           | historical ballast [1], so it's probably not used much
           | anymore.
           | 
           | [1] https://lwn.net/ml/linux-
           | kernel/87o8thg031.fsf@nanos.tec.lin...
        
           | derefr wrote:
           | Better yet, since Wine is there to serve as a "smart" shim,
           | it can offer a launch config parameter for the app, that will
           | get Wine to point its futex symbols at its previous userland
           | impl. (Sort of like the various Windows compatibility-mode
           | flags.) So even Windows apps that do tons of requeues (and so
           | presumably would be slowed down by unilaterally being made to
           | do kernel futexes) would be able to be tuned back up to
           | speed. And distros of Wine that provide pre-tuned installer
           | scripts for various apps, like CrossOver, can add that flag
           | to their scripts, so users don't even have to know.
        
             | [deleted]
        
           | gpderetta wrote:
           | IIRC glibc stopped using requeue a while ago for cobdution
           | variables wait morphing, so maybe requeue is not a hugely
           | important use case anymore.
        
       | alexhutcheson wrote:
       | Bummer that this doesn't seem to support or explicitly plan for
       | the proposed FUTEX_SWAP operation[1] (unless I'm missing it).
       | 
       | [1] https://lwn.net/Articles/824409/
        
         | dundarious wrote:
         | I haven't read anything about FUTEX_SWAP since Aug 2020. Do you
         | know is there ongoing progress?
        
       | Cloudef wrote:
       | What happened to NTSYNC?
       | https://lore.kernel.org/lkml/f4cc1a38-1441-62f8-47e4-0c67f5a...
        
         | elynl wrote:
         | It's still a thing! kernel:
         | https://repo.or.cz/linux/zf.git/shortlog/refs/heads/winesync
         | wine:
         | https://repo.or.cz/wine/zf.git/shortlog/refs/heads/fastsync2
         | 
         | They decided to rename it since the kernel maintainers don't
         | want NT interfaces and having NT in the name doesn't help
        
           | Cloudef wrote:
           | Thats good to hear. Darling devs also went with kernel module
           | approach. Its weird wine devs did not do the same until now.
           | Wineserver is nice but as we see, its sometimes impossible to
           | do accurate emulation of some apis there.
           | 
           | https://docs.darlinghq.org/internals/basics/system-call-
           | emul...
        
       | aasasd wrote:
       | I lowkey have to wonder why Linux seemingly goes all-in on
       | monster functions that implement a variety of operations
       | depending on the arguments (as far as I can see from casual
       | encounters). In my experience, such code often both reads poorly
       | on the calling side and looks very messy inside. I came to prefer
       | making distinct functions that spell out their intended action in
       | the names. Windows apparently went the same path.
       | 
       | E.g.: syscall doesn't accept multiple handles and other bells and
       | whistles? Well duh, it's not supposed to--just make a new one
       | that does.
       | 
       | Of course, it's probably too late to change this design choice
       | now...
        
         | whatshisface wrote:
         | I can only offer speculation, but it could be that the
         | alternative, a complicated API that requires several subsequent
         | calls to use, is seen as more of a risk for bad programming and
         | state mis-tracking. We definitely don't want programs leaving
         | the kernel in a bad state by calling the first half of a
         | sequence of syscalls then later on getting distracted or
         | crashing and never finishing it. A single call gives atomicity
         | guarantees that are otherwise hard to come by in languages
         | without state tracking primitives like futures or borrow
         | checked references.
         | 
         | Another thing to remember is that in C, it's easier to make a
         | call more specific, by hardcoding values than it is to make it
         | more general, by picking between variants in a branch. I can
         | see programmers wishing the API was more dynamic while writing
         | a giant if-else tree to select the appropriate version of the
         | syscall, but the opposite problem, when the same invocation is
         | always needed but the API takes a lot of dynamic options, is
         | not as bad because it is easy to write a bunch of literals in
         | the line that calls the function. (If C had named arguments
         | that approach would be almost flawless.)
        
           | aasasd wrote:
           | I don't see how `an_op_syscall(param)` is 'more complicated'
           | or requires more calls or state than `a_syscall(OP_NAME,
           | param)`.
        
             | whatshisface wrote:
             | Since C doesn't have first-class functions, if you needed
             | to dynamically pick between values of OP_NAME you'd be glad
             | it was a parameter. Imagine ten options parameters, which
             | your program needs to choose between depending on the
             | circumstances. That's a huge branch tree. Imagine choosing
             | between two options - you'd need to keep track of function
             | pointers, or at least write your own enum. C makes it far
             | nicer to handle enum values than it is to handle functions,
             | because ints are first-class. I agree that there's no
             | difference between a symbol in the name and a value in the
             | args when your program always needs the same value, but
             | sometimes it needs dynamism.
        
               | delusional wrote:
               | I don't see how function pointers are any harder to
               | manage than an enum. Just slap those function pointers
               | into an array or a switch, and away you go.
        
               | aasasd wrote:
               | You're talking here about the second half of your
               | previous comment. This reasoning is ok by me, though I
               | have to wonder how often, and why, do programmers need to
               | decide dynamically in mid-flight whether they want to
               | wait instead of wake (or whatever choices futex()
               | offers), or that they need one handle instead of a
               | vector.
               | 
               | I'm talking above about the first half of your previous
               | comment, where you say that having operation names in the
               | function names would somehow require multiple calls and
               | break atomicity. I can only understand this if Linux
               | syscalls are parameterized to do one thing and then
               | another thing and something else, in one call--which
               | sounds horrible, but still can be expressed in function
               | names.
        
               | gpderetta wrote:
               | There is no direct function calls into the kernel.
               | 
               | At the end of the day a function call is an interrupt
               | with the syscall parameters in a bunch of registers. As
               | you do not want to use a different interrupt (as there
               | are only so many interrupts [1]) for each function call,
               | the syscall id is passed as a parameter in a register. On
               | the kernel side the id is used to index in a function
               | pointer table to find and call the actual system
               | function.
               | 
               | Turns out this table is actually limited in size so some
               | related syscalls use another level of indirection with
               | additional private tables (I think this is no longer a
               | limitation and these syscall multiplexers are frowned
               | upon for newer syscalls).
               | 
               | All of this to say that FUTEX_WAIT and FUTEX_WAKE are
               | just implementation details. A proper syscall wrapper
               | would just expose distinct functions for each operation.
               | 
               | But for a very long time glibc did not expose any wrapper
               | for futexes as they were considered linux specific
               | implementation details and one was supoosed to use the
               | portable posix functions. So the futex man pages
               | documented the raw kernel interface. That didn't stop
               | anybody from using futexes of course and I think glibc
               | now has proper wrappers.
               | 
               | [1] yes, these days you would use sysenter or similar,
               | but same difference.
        
               | aasasd wrote:
               | Thanks, this actually answers my original question in
               | regard to the kernel's part of it. I'll take a look at
               | what names and signatures libc has, and if they're still
               | like that then the question is readdressed to libc--
               | though I guess the answer then would be "because that's
               | how it is in the kernel".
               | 
               | P.S. Come to think of it, I sorta knew about the
               | interrupt business, and vaguely about the later but
               | fundamentally similar method (but not that there are many
               | interrupts for syscalls, though thinking about eight-bit
               | registers clears this up quickly). Guess I just forgot
               | that libc is kept separate from the kernel, in the Linux
               | world.
        
               | whatshisface wrote:
               | > _I 'm talking above about the first half of your
               | previous comment, where you say that having operation
               | names in the function names_
               | 
               | I was addressing operation names in the function names in
               | the second half of my comment, in the first half I was
               | addressing breaking up specification into a state machine
               | process like OpenGL does. State machines are another way
               | of breaking up big functions with lots of options. They
               | have pseudo default argument support as an advantage, and
               | ease of implementation via static globals as a historical
               | (before multithreading) advantage. State machines are
               | kind of like builder pattern as a way to get named
               | arguments and default arguments in a language that
               | doesn't have either, except instead of the builder being
               | a class it's a singleton (with all the attendant
               | disadvantages.)
        
               | aasasd wrote:
               | Ah, no, that's also contrary to my taste. I love clear
               | function names and I hate state. Immutable return values
               | are way better than builders, in my experience, even if
               | that means many arguments. And static globals are a whole
               | other level of bad.
        
               | whatshisface wrote:
               | Builders can be a huge help when you get up to a dozen
               | arguments and above. Seriously, I never want to see
               | foo(13,15,91,39,LIB_YES,LIB_NO,5.0,183.5) ever again
               | haha. Good engineers would comment each argument, good
               | languages support named arguments, but when you have
               | neither it can be a mess unless the library author used
               | builders to force annotation of the values. A close
               | relative of builders are settings struct arguments, which
               | have their own pros and cons. I think Vulkan uses
               | settings structs.
        
               | aasasd wrote:
               | I think OOP builders can work without extra state if they
               | implement `->addThis()->addThat()` by actually doing
               | functional `addThat(addThis(defaultObj))` and creating
               | immutable values inside. But ultimately there's no good
               | answer to the absence of named arguments, and by now not
               | having them is a pure atrocity.
               | 
               | Of course, I'm guessing C programmers would cringe at the
               | prospect of going full COW for such use-cases.
        
               | xiphias2 wrote:
               | Efficient OpenGL programming was always hard for me, as I
               | don't just need to learn the API, but also try to figure
               | out how it's implemented, which is often undocumented and
               | unspecified.
               | 
               | It's actually hard to understand for me that an API that
               | was created for efficient computing doesn't
               | specify/guarantee the relative speed/latency of
               | operations.
        
               | whatshisface wrote:
               | Yeah, there are some really interesting unexpected
               | performance implications - for instance, it's actually a
               | great idea to send updates to buffers in a way that
               | implies new allocation, because most drivers will
               | silently and automatically set up a multiple-buffering
               | system that reuses resources efficiently and doesn't
               | stall the pipeline. Sending partial updates looks better
               | on the surface, but because it can stall while the driver
               | waits for the resource you're updating to fall out of
               | use, it can be worse unless you set up the triple
               | buffering yourself!
        
       | maxfurman wrote:
       | Can someone please ELI5 what is a futex? Is it like a mutex?
        
         | whoisburbansky wrote:
         | It's a primitive that lets you build mutexes; you've got an
         | atomic integer in userspace, attached to a wait queue in kernel
         | space. Eli Bendersky has a pretty good rundown:
         | https://eli.thegreenplace.net/2018/basics-of-futexes/
        
           | bogomipz wrote:
           | This is a good post thanks. One thing confuses me is that
           | these two sentences seem to contradict each other:
           | 
           | >"Before the introduction of futexes, system calls were
           | required for locking and unlocking shared resources (for
           | example semop)."
           | 
           | >"A user-space program employs the futex() system call only
           | when it is likely that the program has to block for a longer
           | time until the condition becomes true"
           | 
           | The first indicates futexes are an improvement because they
           | avoid the overhead of system call but the second states that
           | futex() is a system call.
           | 
           | Is this similar to how a vDSO or vsyscall works where a page
           | from kernel address space is mapped into all users processes?
           | In other words is it similar to how gettimeofday works?[1]
           | 
           | [1] https://0xax.gitbooks.io/linux-
           | insides/content/SysCall/linux...
        
             | kccqzy wrote:
             | It's not a contradiction. The two sentences are perfectly
             | clear to me. Of course futex() is itself a system call, but
             | in case of no contention you don't need to invoke the
             | system call. That's why futex() is an improvement. The key
             | phrase is "only when" in the second sentence.
        
               | bogomipz wrote:
               | >'The key phrase is "only when" in the second sentence'
               | 
               | Thank you for pointing this out. That makes sense. This
               | is sort of like the idea that the fastest system call is
               | the one that is never made. Can you say how does
               | something in userland know if something "would" block
               | without actually crossing the user/kernel boundary? Does
               | the kernel expose a queue length counter or similar via a
               | read-only page?
        
               | tialaramex wrote:
               | The trick is, the memory address futex cares about _is_
               | how you signal. Normally, in the  "uncontended" case,
               | everybody involved uses atomic memory access to
               | manipulate this memory location (for example they set it
               | to 1 as a flag to say somebody is in the mutually
               | excluded code and then back to zero to say now nobody is
               | in there) and they pay no kernel cost at all for this
               | feature. The kernel has no idea the memory is even
               | special.
               | 
               |  _But_ in the contended case the flag is already set by
               | somebody else, so you leave a sign (maybe you set it to
               | 42) about the contention in the memory address, and you
               | tell the kernel that you 're going to sleep and it should
               | wake you back up once whoever is blocking you releases.
               | When whoever was in there is done, they try to put things
               | back to zero, but they discover you've left that sign
               | that you were waiting, so _they_ tell the kernel they 're
               | done with the address and it's time to release you.
               | 
               | None of the conventional semantics are enforced by the
               | operating system. If you write buggy code which just
               | scribbles nonsense on the magic flag location, your
               | program doesn't work properly. Too bad. The kernel does
               | contain some code that helps do some tricks lots of
               | people want to do using futexes, and so for that reason
               | you should stick to the conventions, but nobody will
               | force you to.
        
               | kccqzy wrote:
               | The most common implementation of mutexes using futex()
               | is that the userland simply spins a while waiting to
               | acquire the mutex (just atomic operations, no system
               | call). After a while the userland gives up spinning and
               | makes the system call.
        
               | gpderetta wrote:
               | IMHO The nice thing about futexes is not that they allow
               | to fast path uncontended sinchronization.
               | 
               | The userspace fast path can in principle be implemented
               | on top of any kernel only synchronization primitive (a
               | pipe for example).
               | 
               | The nice thing about futexes is that they do not consume
               | any kernel resource when uncontended, i.e there is no
               | open or close futex syscalls. Any kernel state is created
               | lazily when needed (i.e on the first wait) and destroyed
               | automatically when not needed (i.e. when the last waiter
               | is woken up). Coupled with the minimal userspace space
               | requirements (just a 32 bit word) it means that futexes
               | can be embedded very cheaply in a lot of places.
        
         | Iwan-Zotow wrote:
         | it is thing behind/inside mutex. It is making uncontested mutex
         | acquire/release ops pure userspace thingy.
        
         | ROARosen wrote:
         | Can someone please ELI5 what is a mutex?
        
           | hyperman1 wrote:
           | A mutex is the software equivalent of the lock on your toilet
           | door. There is a zone in your code where only 1 thread at a
           | time is allowed. So before entering, you lock the mutex. Now
           | either you are allowed in or are added to a set of waiters.
           | When you leave, you unlock the mutex, and 1 other waiter is
           | allowed to enter.
           | 
           | This sounds easy, but is in practice wrought eith peril. You
           | need high performance, not waste too much cpu time on
           | sleepers, some notion of fairness when you have lots of
           | waiters, minimal memory usage, etc...
        
           | __init wrote:
           | (Edited because I transposed "mutex" and "semaphore" in my
           | mind.)
           | 
           | A mutex is a lock -- if multiple users attempt to access a
           | resource simultaneously, they each attempt to acquire the
           | lock serially, and those not first are excluded (blocked)
           | until the current holder releases the lock.
           | 
           | Mutexes are typically implemented with semaphores:
           | 
           | First a more developer-oriented explanation (because I
           | imagine that's what you really want):
           | 
           | A semaphore locks a shared resource by implementing two
           | operations, "up" and "down". First, the semaphore is
           | initialized to some positive integer, which represents how
           | many concurrent "users" (typically threads) can use that
           | resource. Then, each time a user wants to take advantage of
           | that resource, it "down"s the semaphore , which atomically
           | subtracts one from the current value -- but the key is that
           | the semaphore's value can _never_ be negative, so if the
           | value is currently zero, "down"ing a semaphore implies
           | blocking until it's non-zero again. "Up"ing is just what
           | you'd imagine: atomically increment the semaphore value, and
           | unblock someone waiting, if necessary.
           | 
           | Semaphores are generally seen as a fundamental primitive (one
           | reason being that locks/mutexes can be implemented trivially
           | as a semaphore initialized to one), but they also have broad
           | use as a general synchronization mechanism between threads.
           | 
           | For a true ELI5 of semaphores (I enjoy writing these):
           | 
           | Imagine everyone in class needs to write something, but there
           | are only three pencils available, so you devise a system (a
           | system of mutual exclusion, per se) for everyone to follow
           | while the pencils are used: First, you note how many pencils
           | are available -- three. Then, each time someone asks you for
           | one, (which we call "down"ing the semaphore) you subtract one
           | from the number in your head and give them a pencil. But if
           | that number is zero, you don't have any pencils left, so you
           | ask the person to wait. When someone returns and is done with
           | their pencil, you hand it off to someone waiting, or, if
           | nobody is waiting, you add one to that number in your head
           | and take the pencil back ("up"ing the semaphore). It's
           | important that you decide to only handle one request at a
           | time (atomicity) -- if you tried to do too many things at
           | once, you might lose track of your pencil count and
           | accidentally promise more pencils than you have.
        
             | MayeulC wrote:
             | > First, you note how many pencils are available -- three.
             | Then, each time someone asks you for one, (which we call
             | "down"ing the mutex) you subtract one from the number in
             | your head and give them a pencil
             | 
             | Isn't that a semaphore rather than a mutex?
             | 
             | For ELI5 on semaphores, it's like going at the train
             | station, airport, or Mc Donald's: There is a single queue
             | for multiple counters. Everyone is given a paper with a
             | number, and once a counter is free, the one with the lowest
             | number can walk to it. A semaphore is just the number of
             | free counters, it decreases when someone goes to one, and
             | increases when one leaves. If the semaphore is zero, there
             | is no free spot, and people will have to queue.
             | 
             | It's quite close to another synchronization primitive,
             | barriers: if you have to wait for _n_ tasks to finish, you
             | make one with _n_ spots, and release everyone when all the
             | spots are filled.
             | 
             | To me, mutexes are semaphores, but with only one counter:
             | there is only exactly one or zero resources available. If
             | you take the mutex, others will have to wait for you to be
             | done before they can take it. They can queue in any
             | fashion, depending on the implementation: first come, first
             | served; fastest first; most important first; random; etc.
        
               | __init wrote:
               | Oh no, you're absolutely right. My sleep-deprived brain
               | has confused the names -- thanks for the heads up. I'll
               | edit my post.
        
             | e_proxus wrote:
             | But it can't be turtles all the way down? Isn't a semaphore
             | counter also shared state that needs to be protected if
             | several threads want to "down" the value? Can it become
             | negative?
        
               | gpderetta wrote:
               | Well at the bottom of the turtle stack you have harware
               | primitives that guarantee atomicity of a bunch of word
               | sized arithmetic operations (at the minimum CAS or
               | LL/SC).
        
             | ROARosen wrote:
             | Thanks, great analogue there, it really helped it 'click'!
             | 
             | Please excuse my ignorance...
        
           | marcodiego wrote:
           | Mutex is named after "mutual exclusion" a primitive that is
           | used to guarantee that some parts of code will run
           | exclusively, never at the same time. This is important for
           | good performance and synchronization on multiple threaded
           | programs.
        
           | xxs wrote:
           | My ELI5 version of that locking mechanism, e.g. mutex, uses a
           | toilet and the disaster if multiple people try taking the
           | dump at the same time.
        
         | brandmeyer wrote:
         | The best introduction is probably the original presentation
         | paper from 2002.
         | 
         | Fuss, Futexes, and Furwoks: Fast Userlevel Locking in Linux
         | 
         | https://www.kernel.org/doc/ols/2002/ols2002-pages-479-495.pd...
         | 
         | Another great paper that came after some additional experience
         | by glibc developers:
         | 
         | Futexes are Tricky, by Ulrich Drepper
         | 
         | https://www.akkadia.org/drepper/futex.pdf
        
           | MaxBarraclough wrote:
           | I see Drepper resisted the urge to pluralise _futex_ to
           | _futices_.
        
         | DSingularity wrote:
         | Okay, consider a multi-threaded program. These programs have
         | multiple threads running on multiple cores. Sometimes you want
         | to coordinate the execution of multiple threads in such
         | programs. A common example is when a thread is in a "critical
         | section" where it is only safe for one thread to access/modify
         | state. Sometimes threads will have to wait for each other. This
         | is where the futex comes in. Instead of the thread to wait
         | while using the processor it can tell the OS that it needs to
         | wait for this futex to become free. The OS will then suspend
         | the thread and find another one to schedule in its place.
        
           | bogomipz wrote:
           | Ah Interesting. So is the concept of futex() similar to how a
           | process in kernel mode will yield the CPU by putting itself
           | on a wait_queue and call the schedule() function before it
           | issues an I/O request then?
           | 
           | I realize that one is userspace and the other kernel but
           | would the above be the right mental model?
        
             | gpderetta wrote:
             | Yes, a futex is basically a waitqueue abstraction coupled
             | with a TOCTOU guard.
        
         | hulitu wrote:
         | As far as i remember a futex is a fast mutex
        
         | corbet wrote:
         | See https://lwn.net/Articles/823513/ for a look at both the old
         | and proposed new futex interfaces.
        
         | achernya wrote:
         | futex is a Fast Userspace muTEX. It's the syscall to help
         | implement a mutex when there are two or more threads waiting on
         | the lock to let other processes/threads schedule and do useful
         | work during the wait.
        
         | hyperman1 wrote:
         | Futex stands for fast userspace mutex. It is a basic building
         | block for implementing a mutex (amongst others).
         | 
         | Afaik, if locking a mutex succeeds, futexes avoid a context
         | switch to and from the kernel. The fast path happens completely
         | in userspace. Only when a mutex lock causes a thread to sleep,
         | the kernel is needed. All of this improves mutex performance.
         | 
         | There are other locking primitives than mutexes, and the
         | current futex interface is not optimal for implementing common
         | windows ones. Hence this proposal.
        
           | ece wrote:
           | ESYNC is the way this is currently done, it's hit and miss in
           | that it doesn't work unless the number of open files is set
           | high enough for some games. It's also slower than this
           | proposal.
        
           | xxs wrote:
           | >"context switch to and from the kernel"
           | 
           | This is a common misconception, a "context" switch is a lot
           | more expensive than a "mode" switch that goes from user mode
           | to the kernel mode (and back).
           | 
           | Of course, if the futex has to wait, it'd be a context switch
           | for real.
        
         | gpderetta wrote:
         | A futex is an ephemeral list of waiting threads or processes
         | keyed on an address. It can be used to implement any
         | synchronization or signaling primitive, like mutexes, condition
         | variables, semaphores, etc.
        
           | tialaramex wrote:
           | This is a really nice insight into the kernel's model of
           | what's going on. It's probably not the best way for an
           | absolute beginner to think about why they _want_ this, but it
           | 's much better than some of the other explanations if you
           | have trickier technical or performance questions, and it
           | makes it obvious why it's clever and why you need the OS
           | kernel to actually implement it.
        
       | twoodfin wrote:
       | Unclear to me from the writeup: Would it be possible to implement
       | the original futex() API on top of futex2() or is Linux doomed to
       | preserve most of two complete futex implementations?
        
         | guipsp wrote:
         | Would it be possible? Probably yes, but not performant.
        
       | c7DJTLrn wrote:
       | This might get me some angry replies, but does the feature set of
       | Linux ever end? Will there ever be a day where someone says "you
       | know what, Linux isn't built for this and your code adds too much
       | complexity, let's not". It really just seems like Linux has
       | become a dumping ground for anyone to introduce whatever they
       | feel like as long as it passes code review, with nobody looking
       | at the big picture.
        
         | DSMan195276 wrote:
         | Features get rejected all the time, even some pretty
         | substantial ones. That said I think there's two things to keep
         | in mind:
         | 
         | 1. If someone is willing to take the time to develop and submit
         | something to the kernel, it probably has _some_ redeeming
         | factors to it. The original idea or code might not get merged,
         | but something similar to meet those needs likely can be if it
         | 's really something the kernel can't do.
         | 
         | 2. The modularity of the Linux Kernel build system means that
         | adding new features doesn't have to be that big of a deal,
         | since anybody who doesn't want them can just choose not to
         | compile them in. It's not _always_ that simple, but for a lot
         | of the added features it is.
        
         | marcodiego wrote:
         | Getting a new syscall on linux is hard. You'll often need user-
         | space users before it gets accepted; that alone is hard. The
         | linux syscall set is very small considered the breadth of cases
         | it is used today. Compared to nt, linux syscall set is very
         | small and well thought out.
        
           | formerly_proven wrote:
           | Windows has around 500 syscalls, Linux around 300.
           | 
           | Both have several escape-hatches for arbitrary driver-
           | userspace communication, which makes the number of syscalls
           | pretty irrelevant. (E.g. Linux has ioctl, netlink, procfs,
           | sysfs, ...)
        
             | monocasa wrote:
             | Well, there's also win32k.sys which provides an additional
             | 1000 or so syscalls.
        
             | Ericson2314 wrote:
             | > Both have several escape-hatches for arbitrary driver-
             | userspace communication, which makes the number of syscalls
             | pretty irrelevant. (E.g. Linux has ioctl, netlink, procfs,
             | sysfs, ...)
             | 
             | Yes, we need a better metric for accounting for those
             | things, which will paint a much more sobering picture of
             | the complexity and growth of complexity of the userland <->
             | kernel interface.
        
           | GoblinSlayer wrote:
           | Linux syscall set is very small and posix. No matter what you
           | think, the first version is always a prototype.
        
         | pram wrote:
         | That's the beauty of open source! Just fork your own Linux and
         | then refuse any further patches or PRs. True freedom.
        
         | dooglius wrote:
         | No, because the Linux kernel is designed explicitly as a
         | monolithic kernel, so as long as there is anything anyone wants
         | to do that is "OS-like", it expects to own that.
        
         | rkangel wrote:
         | > with nobody looking at the big picture
         | 
         | As I understand it, this is exactly what Linus does. He's the
         | final arbiter of what makes it in, and has a history of setting
         | high level direction for how stuff should be (usually prefixed
         | with "I'm rejecting this because").
        
         | gpderetta wrote:
         | this patch both cleanup existing complexity and add
         | functionality needed for a significant use case.
        
         | anoncake wrote:
         | What purpose is a general-purpose kernel not built for?
        
         | vondur wrote:
         | I'm assuming if you build the kernel yourself, you may be able
         | to strip out features you don't want.
        
         | penagwin wrote:
         | There are pieces of linux - say the networking - that already
         | don't meet many people's use cases. People seem to like BSD for
         | certain networking stuff due to differences in the respective
         | networking stacks.
        
           | zokula wrote:
           | > People seem to like BSD for certain networking stuff due to
           | differences in the respective networking stacks.
           | 
           | Is that why *BSD has lower market share for servers and other
           | networking equipment than it did 15 years ago?
        
         | thedracle wrote:
         | I mean, the futex2 proposal has been around for over a year.
         | 
         | A lot of the history and discussion can be found here:
         | https://lkml.org/lkml/2021/4/27/1208
         | 
         | I personally feel like the kernel maintainers have been very
         | conservative over the years regarding the introduction of new
         | system calls, but then again I've maybe gotten used to the
         | trauma of being familiar with windows internals.
        
         | slaymaker1907 wrote:
         | I think that will be the day somebody comes and creates a new
         | FOSS kernel which gets rid of mostly unused legacy code. The
         | problem with this sort of stuff is that it is hard to know what
         | abstractions will actually be useful before said abstractions
         | are available.
         | 
         | I don't think anyone in their right mind would design the
         | IndexedDB APIs the way they are with the background of modern
         | JS. It would certainly be promise based and I think there would
         | be a better distinction between an actual error with IndexedDB
         | vs just an aborted transaction. It's also extremely confusing
         | about how long you can have transactions ongoing. I suspect
         | they would replace the nonsensical autocommit behavior of
         | transactions currently with some sort of timeout mechanism.
         | There is way too much asynchronous code out there these days to
         | rely on the event loop for transactions committing.
         | 
         | Linux is honestly pretty clean as far as outdated abstractions
         | go, but Windows is definitely very guilty of keeping around old
         | outdated abstractions. Having execution policies for PowerShell
         | while batch scripts get off scotch free is really strange.
         | These restrictions would only make sense if Windows didn't let
         | users run VBScript, Batch, etc. Can anyone tell me how running
         | a PowerShell script is inherently more dangerous than VBScript
         | or Batch scripts?
        
         | stjohnswarts wrote:
         | That's a fair criticism but the kernel is very modular and the
         | legacy stuff that no one will maintain gets cut all the time,
         | so I think it's all a fair tradeoff. I'm not angry, just
         | pragmatic because if you don't evolve you die, look at how much
         | market share BSD has lost over the past 15 years
        
         | Ericson2314 wrote:
         | This shouldn't be getting down-votes.
         | 
         | Consider the slogan "composition not configuration", Linux is
         | squarely the latter. All the hardware, all the different use-
         | cases, that are just added to what's mostly a monolith. You can
         | disable them, but not really pull things a part.
         | 
         | Now, Linux also has some of the most stringent code review in
         | the industry, and they are well aware of issues of
         | maintainability, but fundamentally C is not a sufficiently
         | expressive language for them to architect the code base in a
         | compositional manner.
         | 
         | What really worries me is that as the totality of the userland
         | users an increasingly large interface to the kernel, it will
         | very hard to ever replace linux withsomething else when we _do_
         | have something more compositional. I am therefore interested,
         | and somewhat working on, stuff that hopefully would end up with
         | a multi-kernel NixOS (Think Debian GNU /kFreeBSD) to try to get
         | some kernel competition going again, and incentivize some
         | userland portability.
         | 
         | Note that I don't think portability <===> just use old system
         | calls. I am no Unix purist. I would really like a kernel that
         | does add new interfaces but also dispenses with the old cruft;
         | think Rust vs C++.
        
           | fabianhjr wrote:
           | > Consider the slogan "composition not configuration", Linux
           | is squarely the latter. All the hardware, all the different
           | use-cases, that are just added to what's mostly a monolith.
           | You can disable them, but not really pull things a part.
           | 
           | Linux is a Monolithic Kernel, sounds like you are more
           | interested in Microkernels.
           | https://www.wikiwand.com/en/Microkernel
           | 
           | One microkernel alternative could be Redox-OS:
           | https://www.wikiwand.com/en/Redox_(operating_system)
        
             | bruckie wrote:
             | You could write a monolithic kernel using composition for
             | your code structure. It would just be composition at build
             | time rather than at runtime.
             | 
             | It seems pretty hard to write a microkernel without using
             | composition, though, since the runtime requirements kind of
             | dictate your code structure.
        
           | oalae5niMiel7qu wrote:
           | > but fundamentally C is not a sufficiently expressive
           | language for them to architect the code base in a
           | compositional manner.
           | 
           | Plan 9 would like to have a word with you.
        
         | joppy wrote:
         | I don't think this is an achievable goal for a kernel, since it
         | has strictly more permissions than user code. There are some
         | parts of user code that simply cannot be made faster by
         | changing how they're written: no amount of rewriting in
         | Rust/C/C++ will help if the relevant kernel primitives are not
         | there. The kernel must move with the times, and incorporate new
         | features as they are (tastefully) decided upon.
        
         | fullstop wrote:
         | The BSDs are always an option.
        
           | zokula wrote:
           | The BSDs were an option they are becoming more moribund and
           | less used every year.
        
         | bombcar wrote:
         | Linus has done exactly that a few times; though usually he
         | defines what would be needed to be merged.
         | 
         | Some of the changes aren't compatible with the kernel direction
         | as a whole and so they're a separate branch. Given enough time
         | (and if the change is major AND ends up being widely used) it
         | may get merged in.
         | 
         | Those who want a specific kernel with a defined feature set
         | often will look to one of the BSDs as a base (as often they
         | also have reduced hardware requirements, think routers).
        
         | yarg wrote:
         | As a relative layman, this all seemed reasonable and desirable
         | - and the fact that he integrated it with pthreads seems
         | particularly cool to me.
         | 
         | NUMA awareness especially stood out as highly significant.
         | 
         | The new supported sizes seems nice and shiny - if well
         | utilised, it could improve performance and capacity in general,
         | capabilities in the low-end and scaling in the high-end.
        
         | lizknope wrote:
         | Linus Torvalds looks at the big picture all the time. That is
         | basically his job as he doesn't write a lot of low level code
         | anymore.
         | 
         | Do you have an example of something that you consider part of
         | the "dumping ground?"
        
         | trillic wrote:
         | So what? This code (and most drivers, modules, additional
         | fucntionality, etc) will only be compiled in if necessary and
         | will only actually be run if you use the feature.
        
           | c7DJTLrn wrote:
           | My concern is not about what _my_ flavor of kernel includes.
           | It 's about the size of the codebase, and whether the limited
           | number of maintainers can keep up with the countless features
           | being introduced.
        
             | coolreader18 wrote:
             | I mean I think in terms of maintainer resources Linux is
             | one of the projects with the least to worry about, there
             | are always companies who are willing to have a team work on
             | stuff.
        
               | Ericson2314 wrote:
               | Those megacorps also accumulate endless complexity
               | internally, as does the industry as a whole. It's a
               | problem larger than the discipline of reviewers of C code
               | could ever hope to fix.
               | 
               | Even the larger world outside the field is somewhat aware
               | of the the looming complexity problem: https://www.theatl
               | antic.com/technology/archive/2017/09/savin...
               | 
               | Basically on our current trajectory we will end up with
               | something like the biosphere that can do stuff but too
               | complex to fully understand --- but also probably well
               | more fragile than that?
        
               | kaba0 wrote:
               | That's what testing is for. We can't really reason
               | globally about much smaller codebases either Software is
               | complex.
        
             | cbmuser wrote:
             | The kernel is regularly shedding off unmaintained code.
             | 
             | So the codebase is actually not growing as much as it
             | could.
        
               | c7DJTLrn wrote:
               | The code they're shedding off is drivers. They won't ever
               | get rid of old syscalls because Linus refuses to break
               | userspace, ever.
        
       | zamalek wrote:
       | > The use case lies in the Wine implementation of the Windows NT
       | interface WaitMultipleObjects.
       | 
       | This can't only be a win for emulating Windows, surely (future)
       | Linux native code could benefit from this too?
        
         | CoolGuySteve wrote:
         | Linux's answer is io_uring with eventfd. I've noticed it has
         | higher latency than WaitForMultipleObjects though. I've had a
         | hard time getting the 99th percentile latency below 5usec.
         | 
         | It would be nice if io_uring polling had a futex-like feature
         | that spinned in userspace rather than having to syscall
         | io_uring_wait every time. I feel like that feature would be
         | more universal than futex2.
         | 
         | Adding an atomic int somewhere in the userspace would add
         | ~100ns in the kernel path but save many microseconds in the
         | userspace-spin path during high contention so it seems like a
         | win to me.
        
           | vardump wrote:
           | Spinning consumes a lot of power, so I'd guess one would only
           | want to do that when very low latency is truly required. At
           | least I'd certainly wouldn't want it to be the default for
           | the generic case.
           | 
           | Of course a low latency option should also exist for those
           | cases that require it and currently that pretty much means
           | spinning.
        
           | the8472 wrote:
           | My understanding of io_uring is that you can already poll the
           | CQ with atomics without any syscall. io_uring_enter() is only
           | needed to inform the kernel that new SQ entries exist and
           | that too can be replaced by polling via IORING_SETUP_SQPOLL
           | in privileged processes.
        
         | vardump wrote:
         | > This can't only be a win for emulating Windows, surely
         | (future) Linux native code could benefit from this too?
         | 
         | That would certainly help porting Windows software to Linux.
        
         | [deleted]
        
       | monocasa wrote:
       | Too bad bpf isn't allowed for non root yet. A bpf futex program
       | like xok's wake programs would be more general, and way nicer.
        
         | fefe23 wrote:
         | What are you talking about?
         | 
         | I have been using bpf in seccomp for non-root for years.
         | 
         | Maybe your distro patched that out. The stock kernel allows bpf
         | for non-root.
        
           | monocasa wrote:
           | Sort of... there's been push back in the post spectre days of
           | expanding non root usage of BPF. Right now, a subset of trace
           | programs and seccomp are allowed for non-root, but that
           | hasn't expanded since 2016. Everything else requires
           | CAP_SYS_ADMIN.
           | 
           | They would probably remove those too if the balance of Linux
           | didn't put "don't break user space" higher than "defense in
           | depth security".
        
       ___________________________________________________________________
       (page generated 2021-04-28 23:00 UTC)