[HN Gopher] PostgreSQL reconsiders its process-based model
       ___________________________________________________________________
        
       PostgreSQL reconsiders its process-based model
        
       Author : todsacerdoti
       Score  : 459 points
       Date   : 2023-06-19 16:33 UTC (6 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | mihaic wrote:
       | I'm honestly surprised it took them so long to reach this
       | conclusion.
       | 
       | > That idea quickly loses its appeal, though, when one considers
       | trying to create and maintain a 2,000-member structure, so the
       | project is unlikely to go this way.
       | 
       | As repulsive as this might sound at first, I've seen structures
       | of hundreds of fields work fine if the hierarchy inside them is
       | well organized and they're not just flat. Still, I have no real
       | knowledge of the complexity of the code and wish the Postgres
       | devs all the luck in the world to get this working smoothly.
        
         | rsaxvc wrote:
         | This is how I made my fork of libtcc lock-free.
         | 
         | Mainline has a lock so that all backends can use global
         | variables, but only one instance can do codegen at a time.
         | 
         | It was a giant refactoring Especially fun was when multiple
         | compilation units used the same static variable name, but it
         | all worked in the end.
        
         | paulddraper wrote:
         | > I'm honestly surprised it took them so long to reach this
         | conclusion.
         | 
         | On the contrary, it's been discussed for ages. But it's a huge
         | change, with only modest advantages.
         | 
         | I'm skeptical of the ROI to be honest. Not that is doesn't have
         | value, but that it has more value than the effort.
        
           | 36364949thrw wrote:
           | > it's a huge change, with only modest advantages
           | 
           | +significant and unknown set of new problems, including new
           | bugs.
           | 
           | This reminds me of the time they lifted entire streets in
           | Chicago by 14 feet to address new urban requirements.
           | Chicago, we can safely assume, did not have the option of
           | just starting a brand new city a few miles away.
           | 
           | The interesting question here is should a system design that
           | works quite well upto a certain scale be abandoned in order
           | to extend its market reach.
        
           | datavirtue wrote:
           | Yeah, and you will run headlong into other unforseen real
           | world issues. You may never reach the performance goals.
        
         | loeg wrote:
         | Yeah. I think as a straightforward, easily correct transition
         | from 2000 globals, a giant structure isn't an awful idea. It's
         | not like the globals were organized before! You're just making
         | the ambient state (awful as it is) explicit.
        
           | stingraycharles wrote:
           | Yes, it's the most pragmatic and it's only "awful" because it
           | makes the actual problem visible. And would likely encourage
           | slowly refactoring code to handle its state in a more sane
           | way, until you're only left with the really gnarly stuff,
           | which shouldn't be too much anymore and you can put them in
           | individual thread local storages.
           | 
           | It's an easy transition path.
        
           | mihaic wrote:
           | Exactly, if you're now forced to put everything in one place
           | you're forced to acknowledge and understand the complexity of
           | your state, and might have incentives to simplify it.
        
             | Sesse__ wrote:
             | Here's MySQL's all-session-globals-in-one-place-class:
             | https://github.com/mysql/mysql-
             | server/blob/8.0/sql/sql_class...
             | 
             | I believe I can safely say that nobody acknowledges and
             | understands the complexity of all state within that class,
             | and that whatever incentives there may be to simplify it
             | are not enough for that to actually happen.
             | 
             | (It ends on line 4692)
        
               | IshKebab wrote:
               | Right but that would still be true if they were globals
               | instead. Putting all the globals in a class doesn't make
               | any difference to how much state you have.
        
           | cakoose wrote:
           | > I think as a straightforward, easily correct transition
           | from 2000 globals, a giant structure isn't an awful idea.
           | 
           | Agree.
           | 
           | > It's not like the globals were organized before!
           | 
           | Using a struct with 2000 fields loses some encapsulation.
           | 
           | When a global is defined in a ".c" file (and not exported via
           | a ".h" file), it can only be accessed in that one ".c" file,
           | sort of like a "private" field in a class.
           | 
           | Switching to a single struct would mean that all globals can
           | be accessed by all code.
           | 
           | There's probably a way to define things that allows you to
           | regain some encapsulation, though. For example, some spin on
           | the opaque type pattern:
           | https://stackoverflow.com/a/29121847/163832
        
             | pasc1878 wrote:
             | No that is what a static in a .c file is for.
             | 
             | A plain global can be accessed from other compiled units -
             | agreed with no .h entry it is my=uch more error prone e.g.
             | you don't know the type but the variables name is exposed
             | to other objects
        
               | remexre wrote:
               | Wouldn't those statics also be slated for removal with
               | this change?
        
           | cogman10 wrote:
           | I think my bigger fear is around security. A process per
           | connection keeps things pretty secure for that connection
           | regardless of what the global variables are doing (somewhat
           | hard to mess that up with no concurrency going on in a
           | process).
           | 
           | Merge all that into one process with many threads and it
           | becomes a nightmare problem to ensure some random addon
           | didn't decide to change a global var mid processing (which
           | causes wrong data to be read).
        
             | dfox wrote:
             | All postgres processes run under the same system user and
             | all the access checking happens completely in userspace.
        
               | fdr wrote:
               | Access checking, yes, but the scope of memory corruption
               | does increase unavoidably, given the main thing the
               | pgsql-hackers investigating threads want: one virtual
               | memory context when toggling between concurrent work.
               | 
               | Of course, there's a huge amount of shared space already,
               | so a willful corruption can already do virtually
               | anything. But, more is more.
        
           | magicalhippo wrote:
           | We did this with a project I worked on. I came on after the
           | code was mature.
           | 
           | While we didn't have 2000 globals, we did have a non-trivial
           | amount, spread over about 300kLOC of C++.
           | 
           | We started by just stuffing them into a "context" struct, and
           | every function that accessed a global thus needed to take a
           | context instance as a new parameter. This was tedious but
           | easy.
           | 
           | However the upside was that this highlighted poor
           | architecture. Over time we refactored those bits and the main
           | context struct shrunk significantly.
           | 
           | The result was better and more modular code, and overall well
           | worth the effort in our case, in my opinion.
        
         | MuffinFlavored wrote:
         | > if the hierarchy inside them is well organized
         | 
         | is this another way to say "in a 2000 member structure, only 10
         | have significant voting power"?
        
           | Ankhers wrote:
           | This statement is not about people, it is about a C struct.
        
         | FooBarWidget wrote:
         | I don't get it. How is a 2000-member structure any different
         | from having 2000 global variables? How is maintaining the
         | struct possibly harder than maintaining the globals?
         | Refactoring globals to struct members is semantically nearly
         | identical, it may as well just be a mechanical, cosmetic
         | change, while also giving the possibility to move to a threaded
         | architecture.
        
           | ComputerGuru wrote:
           | Because global variables can be confined to individual cpp
           | files, exclusively visible in that compilation unit. It makes
           | them far easier to reason with than hoisting them to the
           | "global and globally visible" option if you just use a
           | gargantuan struct. Which is why a more invasive refactor
           | might be required.
        
             | imtringued wrote:
             | Just use thread local variables.
             | 
             | I abuse them for ridiculous things.
        
         | comboy wrote:
         | I've never really been limited by CPU when running postgres
         | (few TB instances). The bottleneck is always IO. Do others have
         | different experience? Plus there's elegance and a feeling of
         | being in control when you know query is associated with
         | specific process which you can deal with and monitor just like
         | any other process.
         | 
         | But I'm very much clueless about internals, so this is a
         | question rather than an opinion.
        
           | hyperman1 wrote:
           | I see postgres become CPU bound regularly: Lots of hash
           | joins, copy from or to CSV, index or materialized view
           | rebuild. Postgis eats CPU. Tds_fdw tends to spend a lot of
           | time doing charset conversion, more than actually networking
           | to mssql.
           | 
           | I was surprised when starting with postgres. Then again, I
           | have smaller databases (A few TB) and the cache hit ratio
           | tends to be about 95%. Combine that with SSDs, and it becomes
           | understandable.
           | 
           | Even so, I am wary of this change. Postgres is very reliable,
           | and I have no problem throwing some extra hardware to it in
           | return. But these people have proven they know what they are
           | doing, so I'll go with their opinion.
        
             | aetherson wrote:
             | I've also definitely seen a lot of CPU bounding on
             | postgres.
        
           | sargun wrote:
           | With modern SSDs that can push 1M IOPs+, you can get into a
           | situation where I/O latency starts to become a problem, but
           | in my experience, they far outpace what the CPU can do. Even
           | the I/O stack can be optimized further in some of these
           | cases, but often it comes with the trade off of shifting more
           | work into the CPU.
        
           | ilyt wrote:
           | >I've never really been limited by CPU when running postgres
           | (few TB instances). The bottleneck is always IO.
           | 
           | Throw a few NVMe drives at it and it might.
        
             | dfox wrote:
             | Throw a ridiculous amount of RAM at it is more correct
             | assessment. NVMe reads are still an "I/O" and that is slow.
             | And for at least 10 years buying enough RAM to have all off
             | the interesting parts of OLTP psql database either in
             | shared_buffers or in the OS-level buffer cache is
             | completely feasible.
        
               | ilyt wrote:
               | > NVMe reads are still an "I/O" and that is slow
               | 
               | It's orders of magnitude faster than SAS/SATA SSDs and
               | you can throw 10 of them into 1U server. It's nowhere
               | near "slow" and still easy enough to be CPU bottlenecked
               | before you get IO bottlenecked.
               | 
               | But yes, pair of 1TB RAM servers gotta cost you less than
               | half year's worth of developer salary
        
           | phamilton wrote:
           | I've generally had buffer-cache hit rates in the 99.9% range,
           | which ends up being minimal read I/O. (This is on AWS Aurora,
           | where these bo disk cache and so shared_buffers is the
           | primary cache, but an equivalent measure for vanilla postgres
           | exists.)
           | 
           | In those scenarios,there's very little read I/O. CPU is the
           | primary bottleneck. That's why we run up as many as 10 Aurora
           | readers (autoscaled with traffic).
        
           | paulddraper wrote:
           | Depends on your queries.
           | 
           | If you push a lot of work into the database including JSON
           | and have a lot of buffer memory...CPU can easily be limiting.
        
           | Diggsey wrote:
           | It's not just CPU - memory usage is also higher. In
           | particular, idle connections still consume signficant memory,
           | and this is why PostgreSQL has so much lower connection
           | limits than eg. MySQL. Pooling can help in some cases, but
           | pooling also breaks some important PostgreSQL features (like
           | prepared statements...) since poolers generally can't
           | preserve session state. Other features (eg. notify) are just
           | incompatible with pooling. And pooling cannot help with
           | connections that are idle but inside a transaction.
           | 
           | That said, many of these things are solvable without a full
           | switch to a threaded model (eg. by having pooling built-in
           | and session-state-aware).
        
             | ComputerGuru wrote:
             | > solvable without a full switch to a threaded model (eg.
             | by having pooling built-in and session-state-aware).
             | 
             | Yeeeeesssss, but solving that is solving the hardest part
             | of switching to a threaded model. It requires the team to
             | come terms with the global state and encapsulating session
             | state in a non-global struct.
        
         | saulrh wrote:
         | Also, even if a 2k-member structure is obnoxious, consider the
         | alternative - having to think about and manage 2k global
         | variables is probably even worse!
        
           | megous wrote:
           | Each set of globals is in a module it relates to, not in some
           | central file where everything has to be in one struct.
           | 
           | If anything, it's probably easier to understand.
        
         | hans_castorp wrote:
         | > I'm honestly surprised it took them so long to reach this
         | conclusion.
         | 
         | Oracle also uses a process model on Linux. At some point (I
         | think starting with 12.x), it can now be configured on Linux to
         | use a threaded model, but the default is still a process-per-
         | connection model.
         | 
         | Why does everybody think it's a bad thing in Postgres, but
         | nobody thinks it's a bad thing in Oracle.
        
         | topspin wrote:
         | > I'm honestly surprised it took them so long to reach this
         | conclusion.
         | 
         | I'm not. You can get a long way with conventional IPC, and OS
         | processes provide a lot of value. For most PostgreSQL instances
         | the TLB flush penalty is _at least_ 3rd or 4th on the list of
         | performance concerns, _far_ below prevailing storage and
         | network bottlenecks.
         | 
         | I share the concerns cited in this LWN story. Reworking this
         | massive code base around multithreading carries a large amount
         | of risk. PostgreSQL developers will have to level up
         | substantially to pull it off.
         | 
         | A PostgreSQL endorsed "second-system" with the (likely
         | impossible, but close enough that it wouldn't matter) goal of
         | 100% client compatibility could be a better approach. Adopting
         | a memory safe language would make this both tractable and
         | attractive (to both developers and users.) The home truth is
         | that any "new process model" effort would actually play out
         | exactly this way, so why not be deliberate about it?
        
           | atonse wrote:
           | Would this basically be a new front end? Like the part that
           | handles sockets and input?
           | 
           | Or more if a rewrite of subsystems? Like the query planner or
           | storage engine etc?
        
             | topspin wrote:
             | Both, I'd imagine.
             | 
             | With regard to client compatibility there are related
             | precedents for this already; the PostgreSQL wire protocol
             | has emerged as a de facto standard. Cockroachdb and
             | ClickHouse are two examples that come to mind.
        
           | nextaccountic wrote:
           | From what I gather postgres isn't doing conventional IPC but
           | instead it uses shared memory, which means the same mechanism
           | threads use but with way higher complexity
        
             | mgaunard wrote:
             | What do you think IPC is?
        
             | topspin wrote:
             | As does Oracle, and others. I'm aware.
             | 
             | IPC, to me, includes the conventional shared memory
             | resources (memory segments, locks, semaphores, condition
             | variable, etc.) used by these systems: resources acquired
             | by processes for the purpose of communication with other
             | processes.
             | 
             | I get it though. The most general concept of shared memory
             | is not coupled to an OS "process." You made me question
             | whether my concept of term IPC was valid, however. So what
             | does one do when a question appears? Stop thinking
             | immediately and consult a language model!
             | 
             | Q: Is shared memory considered a form of interprocess
             | communication?
             | 
             | GPT-4: Yes, shared memory is indeed considered a form of
             | interprocess communication (IPC). It's one of the several
             | mechanisms provided by an operating system to allow
             | processes to share and exchange data.
             | 
             | ...
             | 
             | Why does citing ChatGPT make me feel so ugly inside?
        
               | faangsticle wrote:
               | > Why does citing ChatGPT make me feel so ugly inside?
               | 
               | Its the modern let me Google that for you. Just like
               | people don't care what the #1 result on Google is, they
               | also don't care what ChatGPT has to say about it. If they
               | did, they'd ask it themselves.
        
               | TeMPOraL wrote:
               | I always understood IPC, "interprocess communication", in
               | general sense, as anything and everything that can be
               | used by processes to communicate with each other - of
               | course with a narrowing provision that common use of the
               | term refers to those means that are typically used for
               | that purpose, are relatively efficient, and the process
               | in question run on the same machine.
               | 
               | In that view, I always saw shared memory as IPC, in that
               | it is a tool commonly used to exchange data between
               | processes, but of course it is not strictly tied to any
               | process in particular. This is similar to files, which if
               | you squint are a form of IPC too, and are also not tied
               | to any specific process.
               | 
               | > _Why does citing ChatGPT make me feel so ugly inside?_
               | 
               | That's probably because, in cases like this, it's not
               | much different to stating it yourself, but is more noisy.
        
             | wbl wrote:
             | Not necessarily. Man 3 shmem if you want a journey back to
             | some bad ideas.
        
         | shepardrtc wrote:
         | I think this is a situation where a message-passing Actor-based
         | model would do well. Maybe pass variable updates to a single
         | writer process/thread through channels or a queue.
         | 
         | Years ago I wrote an algorithmic trader in Python (and Cython
         | for the hotspots) using Multiprocessing and I was able to get
         | away with a lot using that approach. I had one process
         | receiving websocket updates from the exchange, another process
         | writing them to an order book that used a custom data
         | structure, and multiple other processes reading from that data
         | structure. Ran well enough that trade decisions could be made
         | in a few thousand nanoseconds on an average EC2 instance. Not
         | sure what their latency requirements are, though I imagine they
         | may need to be faster.
         | 
         | Obviously mutexes are the bottleneck for them at this point,
         | and while my idea might be a bit slower than a low-load
         | situation, perhaps it would be faster when you start getting to
         | higher load.
        
           | hamandcheese wrote:
           | I think the Actor model is fine if you start there, but I
           | can't imagine incrementally adopting it in a large,
           | preexisting code base.
        
           | ilyt wrote:
           | That would most likely be several times slower than current
           | model
        
       | bb88 wrote:
       | This reminds me of this poster: "You must be this tall..."
       | 
       | https://bholley.net/blog/2015/must-be-this-tall-to-write-mul...
       | 
       | Back about a decade ago I was "auditing" someone else's threaded
       | code. And couldn't figure it out. But he was the company's
       | "golden child" so by default it must be working code because he
       | wrote it.
       | 
       | And then it started causing deadlocks in prod.
       | 
       | "What do you want me to do about it? It's the golden child's
       | code. He's not even gonna show up til 2pm today."
        
         | wmf wrote:
         | The thing is... multi-process with a bespoke shared memory
         | system isn't better than multithreading; it's much worse.
        
       | thecopy wrote:
       | This feels like developers are bored and want a challenge.
        
         | dboreham wrote:
         | It's a multi-decade ask from many PG users and a serious pain
         | point for many deployments.
        
       | Icathian wrote:
       | Some interesting discussion on this here also:
       | https://news.ycombinator.com/item?id=36284487
        
       | [deleted]
        
       | levkk wrote:
       | Pretty sure Tom Lane said this will be a disaster in that same
       | pgsql-hackers thread. Not entirely sure what benefits the multi-
       | threaded model will have when you can easily saturate the entire
       | CPU with just 128 connections and a pooler. So I doubt there is
       | consensus or even strong desire from the community to undertake
       | this boil the ocean project.
       | 
       | On the other hand, having the ability to shut down and cleanup
       | the entire memory space of a single connection by just
       | disconnecting is really nice, especially if you have extensions
       | that do interesting things.
        
         | ilyt wrote:
         | >Not entirely sure what benefits the multi-threaded model will
         | have when you can easily saturate the entire CPU with just 128
         | connections and a pooler.
         | 
         | That the all of those would work faster because of performance
         | benefits, as mentioned in article
        
         | BoardsOfCanada wrote:
         | From the article:
         | 
         | > Tom Lane said: "I think this will be a disaster. There is far
         | too much code that will get broken". He added later that the
         | cost of this change would be "enormous", it would create "more
         | than one security-grade bug", and that the benefits would not
         | justify the cost.
        
           | timcobb wrote:
           | You can think of this as an opportunity to rewrite in Rust.
        
             | tracker1 wrote:
             | AfterPostgres
        
       | pjmlp wrote:
       | So when we as a whole decided that multiprocessing is a much
       | better approach from security and application stability point of
       | view, they decide to go with threads?
        
         | blinkingled wrote:
         | Horses for courses I guess - purely threaded vs purely MP both
         | have different set of tradeoffs and shoehorning one over the
         | other always fails some use cases. The article says they are
         | also considering the possibility of having to keep both process
         | and thread models indefinitely for this and other reasons.
         | 
         | I know nothing of PG internals but I can see why process per
         | connection model doesn't work for large machines and/or high
         | number of connections. One way to do it would be to keep
         | connection handling per thread and still keep multiprocess
         | approach where it makes sense for security and doesn't add
         | linear overheads.
        
       | RcouF1uZ4gsC wrote:
       | This would be one those places where a language like Rust would
       | be helpful. In C/C++ with undefined behavior and crashes, process
       | isolation makes a lot of sense to limit the blast radius. Rust
       | borrow checker gives you at compile time a lot of the safety that
       | you would rely on process isolation for.
        
         | mattashii wrote:
         | Yes, but note that the blast radius of a PostgreSQL process
         | crash is already "the whole system reboots", so there are not a
         | lot of differences between process- and thread-based PostgreSQL
         | written in C.
         | 
         | Rewriting in Rust would be interesting, but it would also
         | probably be too invasive to make it worthwile at all - all code
         | in PostgreSQL is C, while not all code in PostgreSQL interacts
         | with the intrinsics of processes vs threads. Any rewrite to
         | Rust would likely take several times more effort than a port to
         | threads.
        
           | megous wrote:
           | PostgreSQL process crash may also just be, one query fails.
        
       | ceeam wrote:
       | Is there any reason at all people use intrinsically bug-prone and
       | broken multithreading mode instead of fork() and IPC apart from
       | WinAPI having no proper fork?
        
         | adwn wrote:
         | Yes, and some of those reasons are even listed in the article.
        
           | ceeam wrote:
           | TLB misses? They are just a detail of particular CPU
           | implementation, and the architectures change. Also, aren't
           | they per core and not per process? What would that solve then
           | to switch to MT?
        
             | adwn wrote:
             | > _TLB misses? They are just a detail of particular CPU
             | implementation, and the architectures change._
             | 
             | TLBs are "just a detail" of roughly 100% of server,
             | desktop, and mobile CPUs.
             | 
             | > _Also, aren 't they per core and not per process? What
             | would that solve then to switch to MT?_
             | 
             | TLB entries are per address space. Threads share an address
             | space, processes do not.
        
       | tragomaskhalos wrote:
       | Worked on a codebase which was separate processes, each of which
       | has a shedload of global variables. It was a nightmare working
       | out what was going on, not helped by the fact that there was no
       | naming convention for the globals, plus they were not declared in
       | a single place. I believe their use was a performance move, ie
       | having the linker pin a var to a specific memory location rather
       | than copying it to the stack as a variable and referencing it by
       | offset the whole time. Premature optimisation? Optimisation at
       | all? Who knows, but there's a good reason coding standards
       | typically militate against globals.
        
         | ajkjk wrote:
         | There's something to be said for globals whose access is well-
         | managed, though.
         | 
         | IMO: if the variable is _truly_ global, i.e. code all over the
         | codebase cares about it, then it should just be global instead
         | of pretending like it's not with some fancy architecture.
         | 
         | The tricky part is reacting to changes to a global variable.
         | Writing a bunch of "on update" logic leads to madness. The
         | ideal solution is for there to be some sort of one-directional
         | flow for updates, like when a React component tree is re-
         | rendered... but that's very hard to build in an application
         | that doesn't start out using a library like React in the first
         | place.
        
       | shariat wrote:
       | this will be the beginning of the end of postgres
        
       | haburka wrote:
       | That sounds like really hard programming. I'm glad I write react
       | and get paid possibly much more.
        
         | paulddraper wrote:
         | We're glad you're writing React too :)
        
         | Exuma wrote:
         | I feel this sort of undertaking could only be done by those
         | programmers who truly value domain knowledge above all else
         | (money, etc). I'm more of the entrepreneureal mind so I
         | generally only learn as much as needed to do some task (even if
         | it's very difficult), but just seeking information as a means
         | to an end doesn't feel fulfilling to me. Of course many people
         | DO find that, and its upon those people's shoulders that heroic
         | things like this rest, and I'm very thankful to them.
        
       | uudecoded wrote:
       | I am going to go ahead and trust Tom Lane on this one, over
       | someone who is working on "serverless Postgres". Godspeed to the
       | forthcoming fork.
        
         | aseipp wrote:
         | Heikki Linnakangas is one of the top Postgres contributors of
         | all time, he isn't just "someone." The fact he's working for a
         | startup on a fork (that already exists, which you can run right
         | now on your local machine) doesn't warrant any snide dismissal.
         | Robert Haas admitted that it would be a huge amount of work and
         | that it would only be achievable by a small few people anyway,
         | Heikki being among them.
         | 
         | Anyway, I think there are definitely limits that are starting
         | to appear with Postgres in some spots. This is probably one of
         | the most difficult possible solutions to some of those
         | problems, but even if they don't switch to a fully threaded
         | model, being more CPU efficient, better connection handling,
         | etc will all go a substantial way. Doing some of the really
         | hard work is better than none of it, probably.
        
           | selimnairb wrote:
           | I wonder what AWS's PostgreSQL-compatible Aurora looks like
           | under the hood. Does it use threading, processes, both?
        
       | rburhum wrote:
       | Sorry if I offend anybody, but this sounds like such a bad idea.
       | I have been running various versions of postgres in production
       | for 15 years with thousands of processes on super beefy machines,
       | and I can tell you without a doubt that sometimes those processes
       | crash - specially if you are running any of the extensions.
       | Nevertheless, Postgres has 99% of the time proven to be
       | resilient. The idea that a bad client can bring the whole cluster
       | down because it hit a bug sounds scary. Every try creating a
       | spatial index on thousands/millions of records that have nasty
       | overly complex or badly digitized geometries? Sadly, crashes are
       | part of that workflow, and changing this from process to
       | threading would mean all the other clients also crashing and
       | cutting connections. This as a potential problem because I want
       | to avoid context switching overhead or cache misses, no thanks.
        
         | mike_hock wrote:
         | Also, reducing context switching overhead (or any other CPU
         | overhead) is probably not gonna fix the garbage I/O
         | performance.
        
         | dan-robertson wrote:
         | Is the actual number you got 99%? Seems low to me but I don't
         | really know about Postgres. That's 3 and a half days of
         | downtime per year, or an hour and a half per week.
        
           | dfox wrote:
           | Well, hour and half per week is the amount of downtime that
           | you need for modestly sized database (units of TB) accessed
           | by legacy clients that have ridiculously long running
           | transactions that interfere with autovacuum.
        
         | zeroimpl wrote:
         | However, it's already the case that if a postgres process
         | crashes, the whole cluster gets restarted. I've occasionally
         | seen this message:                   WARNING: terminating
         | connection because of crash of another server process
         | DETAIL: The postmaster has commanded this server process to
         | roll back the current transaction and exit, because another
         | server process exited abnormally and possibly corrupted shared
         | memory.         HINT: In a moment you should be able to
         | reconnect to the database and repeat your command.         LOG:
         | all server processes terminated; reinitializing
        
           | niccl wrote:
           | yes, but postmaster is still running to roll back the
           | transaction. If you crash a single multi-threaded process,
           | you may lose postmaster as well and then sadness would ensue
        
             | jtc331 wrote:
             | If you read the thread you'd see the discussion includes
             | still having e.g. postmaster as a separate process.
        
             | mattashii wrote:
             | The threaded design wouldn't necessarily be single-process,
             | it would just not have 1 process for every connection.
             | Things like crash detection could still be handled in a
             | separate process. The reason to use threading in most cases
             | is to reduce communication and switching overhead, but for
             | low-traffic backends like a crash handler the overhead of
             | it being a process is quite limited - when it gets
             | triggered context switching overhead is the least of your
             | problems.
        
               | Yoric wrote:
               | Seconded. For instance, Firefox' crash reporter has
               | always been a separate process, even at the time Firefox
               | was mostly single-process, single-threaded. Last time I
               | checked, this was still the case.
        
             | cyberax wrote:
             | PostgreSQL can recover from abruptly aborted transactions
             | (think "pulled the power cord") by replaying the journal.
             | This is not going to change anyway.
        
             | cogman10 wrote:
             | Transaction roll back is a part of the WAL. Databases write
             | to the disk an intent to change things, what should be
             | changed, and a "commit" of the change when finished so that
             | all changes happen as a unit. If the DB process is
             | interrupted during that log write then all changes
             | associated with that transaction are rolled back.
             | 
             | Threaded vs process won't affect that.
        
               | dfox wrote:
               | Running the whole DBMS as a bunch of threads in single
               | process changes how fast is the recovery from some kind
               | of temporary inconsistency. In the ideal world, this
               | should not happen, but in reality it does and you do not
               | want to bring the whole thing down because of some
               | superficial data corruption.
               | 
               | On the other hand, all cases of fixable corrupted data in
               | PostgreSQL I have seen were result of somebody doing
               | something totally dumb (rsyncing live cluster, even
               | between architectures), while on InnoDB it seems to
               | happen somewhat randomly without any obvious reason of
               | somebody doing stupid things.
        
             | anarazel wrote:
             | We would still have a separate process doing that part of
             | postmaster's work.
        
             | tracker1 wrote:
             | You can still have a master control process separate from
             | the client connections.
        
             | moonchrome wrote:
             | Restart on crash doesn't sound that difficult to do.
        
       | BoardsOfCanada wrote:
       | I recently looked through the source code of postgresql and every
       | source files starts with a (really good) description of what the
       | file is supposed to do, which made it really easy to get in to
       | the code compared to other open source projects I've seen. So
       | thanks for that.
        
         | Alekhine wrote:
         | I have no idea why that isn't standard practice in every
         | codebase. I should be able to figure out your code without
         | having to ask, or dig through issues or commit messages. Just
         | tell me what it's for!
        
           | nologic01 wrote:
           | the average programmer thinks they are writting significantly
           | above average clean code, so no need to document it :-)
        
           | ComputerGuru wrote:
           | It kind of is in rust now, with module-level documentation
           | given its own specific AST representation instead of just
           | being a comment at the top of the file (a file is a module).
        
           | SoylentOrange wrote:
           | Because it takes a lot of time and because the comments can
           | get outdated. I also want this for all my code bases. But do
           | I always do this myself? No, especially on green field
           | projects. I will sometimes go back and annotate them later.
        
             | mhh__ wrote:
             | They can get outdated but they usually don't. It's a good
             | litmus test for if a file is too big / small if it's
             | purpose is hard to nail down.
        
             | withinboredom wrote:
             | Even outdated comments can tell you the original purpose of
             | the code, which helps if you're looking for a bug.
             | Especially if you're looking for a bug.
             | 
             | If someone didn't take the time to update the comments and
             | the reviewers didn't point it out, then you've probably
             | found the bug because someone was cowboying some shitty
             | code.
        
               | alex_reg wrote:
               | I have the opposite experience.
               | 
               | Outdated comments are often way worse than no comments,
               | because they can give you wrong ideas that aren't true
               | anymore, and send you off in the wrong direction before
               | you finally figure out the comment was wrong.
        
               | elteto wrote:
               | Indeed. I recently found this piece of code:
               | if (X) assert(false); // we never do X, ever, anywhere.
               | 
               | Then I look over to the other pane, where I have a
               | different, but related file open:                   if
               | (exact same X) { do_useful_stuff(); }
               | 
               | It got a chuckle out of me.
        
             | akira2501 wrote:
             | Trying to understand what I previously wrote and why I
             | wrote it takes more time than I ever care to spend. I'd
             | much rather have the comments, plus at this point, by
             | making them a "first class" part of my code, I find them
             | much easier to write and I find the narrative style I use
             | incredibly useful in laying out a new structure but also in
             | refactoring old ones.
        
       | cmrdporcupine wrote:
       | It sounds like the specific concerns here are actually around
       | buffer pool management performance in and around the TLB: _" Once
       | you have a significant number of connections we end up spending a
       | *lot* of time in TLB misses, and that's inherent to the process
       | model, because you can't share the TLB across processes. "_
       | 
       | Many of the comments here seem to be missing this and talking
       | about CPU-boundedness generally and thread-per-request vs process
       | etc models, but this seems orthogonal to that, and is actually
       | quite specific about the VM subsystem and seems like a legitimate
       | bottleneck with the approach Postgres has to take for buffer/page
       | mgmt with the process model it has now.
       | 
       | I'm no Postgres hacker (or a Linux kernel hacker), and I only did
       | a 6 month stint doing DB internals, but it _feels_ to me like
       | perhaps the right answer here is that instead of Postgres getting
       | deep down in the weeds refactoring and rewriting to a thread
       | based model -- with all the risks in that that people have
       | pointed out -- some assistance could be reached for by working on
       | specific targeted patches in the Linux kernel?
       | 
       | The addition of e.g. userfaultfd shows that there is room for
       | innovation and acceptance of changes in and around kernel re:
       | page management. Some new flags for mmap, shm_open, etc. to
       | handle some specific targeted use cases to help Postgres out?
       | 
       | Also wouldn't be the first time that people have done custom
       | kernel patches or tuning parameters to crank performance out of a
       | database.
        
       | eclipticplane wrote:
       | For the record, I think this will be a disaster.  There is far
       | too much         code that will get broken, largely silently, and
       | much of it is not         under our control.             regards,
       | tom lane
       | 
       | (via https://lwn.net/ml/pgsql-
       | hackers/4178104.1685978307@sss.pgh....)
       | 
       | If Tom Lane says it will be a disaster, I believe it will be a
       | disaster.
        
         | idiomaticrust wrote:
         | He is right. Such rewrites cause a lot of problems if your
         | compiler doesn't help you with avoiding data races.
         | 
         | But there is another way.
        
           | hnarn wrote:
           | > But there is another way.
           | 
           | Ok?
        
             | mycall wrote:
             | Microsoft SQL Server has SQLOS which is another way [0].
             | 
             | [0] https://www.thegeekdiary.com/what-is-sql-server-
             | operating-sy...
        
             | carstenhag wrote:
             | The person probably implied that Postgres should switch to
             | another toolchain that guarantees more things at compile
             | time, so probably Rust.
        
               | bb88 wrote:
               | You can take a chunk of code and just rewrite it in Rust.
               | You'll learn a lot quickly by this.
        
               | steve_adams_86 wrote:
               | It's sort of like the inverse of the Matrix when Neo
               | learns kung fu. You realize that you actually don't know
               | how to program :)
        
               | tylerhou wrote:
               | The boundaries within database code are not clear. There
               | are too many interlocking parts to take a nontrivial
               | chunk and rewrite it Rust.
        
               | blincoln wrote:
               | If the existing code is old-school enough to use
               | thousands of global variables in a thread-unsafe way,
               | seems like changing it enough to compile as safe Rust
               | code would push the "non-trivial" envelope pretty far.
        
             | chc wrote:
             | I think it's meant to imply the solution given in their
             | username ("idiomatic Rust").
        
             | avgcorrection wrote:
             | Don't mind the gimmick gallery (username).
        
           | [deleted]
        
         | gremlinsinc wrote:
         | Maybe a better option would be finding a team to create nugres,
         | aka a fork for this and other experiments. So that mainline
         | remains stable.
        
         | datavirtue wrote:
         | This should be considered a research effort, assuming it will
         | be a complete rewrite. In light of that, you should not draw
         | down resources from the established code base to work on it.
         | 
         | Ignoring the above, first state the explicit requirements
         | driving this change and let people weigh in on those. This
         | sounds like a geeky dev itch.
        
         | duped wrote:
         | That's an awful message with the only sensible reply.
        
       | jasonhansel wrote:
       | I'm rather surprised that their focus is on improving vertical
       | scalability, rather than on adding more features for scaling
       | Postgres horizontally.
        
         | tracker1 wrote:
         | If you're more interested in horizontal scaling, you may want
         | to look into CockroachDB, which has a Postgres compatible
         | protocol, but still quite different. There are a lot more
         | limitations with CDB over Pg though.
         | 
         | With the changes suggested, I'm not sure it's the best idea
         | from where Postgres is... if might be an opportunity to rewrite
         | bits in Rust, but even then, there is a _LOT_ that can go
         | wrong. The use of shared memory is apparently already in place,
         | and the separate process and inter-process communication isn 't
         | the most dangerous part... it's the presumption, variables and
         | other contextual bits that are currently process globals that
         | wouldn't be in the "after" version.
         | 
         | The overall surface is just massive... That doesn't even get
         | into plugin compatibility.
        
       | mynonameaccount wrote:
       | "the benefits would not justify the cost". PostgreSQL, like any
       | software, at some point in it's life need to be refactored. Why
       | not refactor with a thread model. Of course there will be bugs.
       | Of course it will be difficult. But I think it is a worthwhile
       | endeavor. Doesn't sound like this will happen but a new project
       | would be cool.
        
         | timtom39 wrote:
         | > like any software, at some point in it's life need to be
         | refactored.
         | 
         | This is simply not true for most software. Software has a
         | product life cycle like everything else and major
         | refactors/rewrites should be weighed carefully against
         | cost/risk of the refactor. Many traditional engineering fields
         | do much better at this analysis.
         | 
         | Although, because I run a contracting shop, I have personally
         | profited greatly by clients thinking this is true and being
         | unable to convince them otherwise.
        
         | smsm42 wrote:
         | "Difficult" doesn't even begin to do it justice. Making a code
         | which has 2k global variables and probably order of magnitude
         | as many underlying assumptions (the code should know that now
         | every time you touch X you may be influenced or influence all
         | other threads that may touch X) is a gargantuan task, and will
         | absolutely for sure involve many iterations which any sane
         | person would never let anywhere near valuable data (and how
         | long would it take until you'd consider it safe enough?). And
         | making this all performant - given that shared-state code
         | requires completely different approach to thinking about
         | workload distribution, something that performs when running in
         | isolated processes may very well get bogged down in locking or
         | cache races hell when sharing the state - would be even harder.
         | I am not doubting Postgres has some very smart people - much,
         | much smarter than me, in any case - but I'd say it could be
         | more practical to write new core from scratch than trying to
         | "refactor" the core that organically grew for decades with
         | assumptions of share-nothing model.
        
         | djur wrote:
         | What you're talking about is a rewrite, not a refactor.
        
         | gremlinsinc wrote:
         | a better option would just create an experimental fork that has
         | a different name and is obviously a different product but based
         | on the original source. That way pg gets updates and remains
         | stable and if they fail, they fail and it doesn't hurt all the
         | pg in production.
        
       | wielebny wrote:
       | Having been using and administering a lot of PostgreSQL servers,
       | I hope they don't lose any stability over this.
       | 
       | I've seen (and reported) bugs that caused panics/segfaults in
       | specific psql processes. Not just connections, also processes
       | related to wal writing or replication. The way it's built right
       | now, a child process can be just forced to quit and it does not
       | affect other processes. Hopefully switching into thread won't
       | force whole PostgreSQL to panic and shut down.
        
         | tracker1 wrote:
         | Most likely, the postmaster will maintain a separate process,
         | much like today with pg, or similar to Firefox or Chrome's
         | control process that can catch the panic'd process, cleanup and
         | restart them. The WAL can be recovered as well if there were
         | broken transactions in flight.
        
         | jtc331 wrote:
         | Because of shared memory most panics and seg faults in a worker
         | process take down the entire server already (this wasn't always
         | the case, but not doing so was a bug).
        
         | vbezhenar wrote:
         | Of course it will. That's better than continue working with
         | damaged memory structures and unpredictable consequences. For
         | database it's more important than ever. Imagine writing
         | corrupted data because other thread went crazy.
        
           | wizofaus wrote:
           | You're implying that only an OS can provide memory separation
           | between units of execution - at least in .NET AppDomains give
           | you the same protection within a single process, so why
           | couldn't postgres have its own such mechanism? I'd also think
           | with a database engine shared state is not just in-memory -
           | i.e. one process can potentially corrupt the behaviour of
           | another by what it writes to disk, so moving to a single-
           | process model doesn't necessarily introduce problems that
           | could never have existed previously (but, yes, would arguably
           | make them more likely)
        
             | vbezhenar wrote:
             | I don't know .NET enough to comment here, but I'm pretty
             | sure that if you would manage to run bare metal C inside
             | your .NET app (should be possible), it'll destroy all your
             | domains easily. RAM is RAM. The only memory protection that
             | we have is across process boundary (even that protection is
             | not perfect with shared memory, but at least it allows to
             | protect private memory).
             | 
             | At least I'm not aware of any way to protect private thread
             | memory from other threads.
             | 
             | Postgres is C and that's not going to change ever.
        
               | wizofaus wrote:
               | I certainly wasn't suggesting it would make sense to
               | rewrite Postgres to run on .NET (using any language, even
               | managed C++, assuming anyone still uses that). Yes, it's
               | inherent in the C/C++ language that it's able to randomly
               | access any memory that a process has access to, and
               | obviously on that basis OS-provided process-separation is
               | the "best" protection you can get, just pointing out that
               | it's not the only possibility.
        
             | SigmundA wrote:
             | No AppDomains are not as good as processes, I have tried to
             | go that route before, you cannot stop unruly code reliably
             | in an app domain (you must use thread.abort() which is not
             | good) and memory can still leak in any native code used
             | there.
             | 
             | The only reliable way to stop bad code like say an infinite
             | loop is to run in another process even in .Net.
             | 
             | They also removed Appdomain in later versions of .Net
             | because they had little benefit and weak protections
             | compared to a a full process.
        
               | wizofaus wrote:
               | Not claiming they're as good, just noting that there are
               | alternative ways to provide memory barriers, though
               | obviously if it's not enforced at the language/runtime
               | level, it requires either super strong developer disciple
               | or the use of some other tool to do so. I can't find
               | anything suggesting AppDomains have been removed
               | completely though, just they're not fully supported on
               | non-Windows platforms, which is interesting, I wonder if
               | that means they do have OS-level support.
        
               | SigmundA wrote:
               | https://learn.microsoft.com/en-
               | us/dotnet/api/system.appdomai...
               | 
               | "On .NET Core, the AppDomain implementation is limited by
               | design and does not provide isolation, unloading, or
               | security boundaries. For .NET Core, there is exactly one
               | AppDomain. Isolation and unloading are provided through
               | AssemblyLoadContext. Security boundaries should be
               | provided by process boundaries and appropriate remoting
               | techniques."
               | 
               | AppDomains pretty much only allowed you to load unload
               | assemblies and provided little else. If you wanted to
               | stop bad code you still used Thread.Abort which left your
               | runtime in a potentially bad state due to no isolation
               | between threads.
               | 
               | The only way to do something like an AppDomain to replace
               | process isolation would be to re-write the whole OS in a
               | memory safe language similar to
               | https://en.wikipedia.org/wiki/Midori_(operating_system) /
               | https://en.wikipedia.org/wiki/Singularity_(operating_syst
               | em)
        
               | wizofaus wrote:
               | Is that saying global variables are shared between
               | AppDomains on .NET core then? Scary if so, we have a
               | bunch of .NET framework code we're looking at porting to
               | .NET core in the near future, and I know it relies on
               | AppDomain separation currently. It's not the first
               | framework->Core conversation I've done, but I don't
               | remember changes in AppDomain behaviour causing any
               | issues the first time.
               | 
               | As it happens I already know there are bits of code
               | currently not working "as expected" exactly because of
               | AppDomain separation - i.e. attempting to use a shared-
               | memory cache to improve performance and in one or two
               | cases in an attempt to share state, and I got the
               | impression whoever wrote that code didn't understand that
               | there even were two AppDomains involved, and used various
               | ugly hacks to "fall back" to alternative means of state-
               | sharing, but in fact the fall-back is the only thing that
               | actually ever works.
        
       | [deleted]
        
       | rbancroft wrote:
       | Changing something so fundamental seems like it should be a
       | rewrite.
        
       | papito wrote:
       | This has Python 3 vibes.
        
       | newaccount74 wrote:
       | A big advantage of the process-based model is its resilience
       | against many classes of errors.
       | 
       | If a bug in PostgreSQL (or in an extension) causes the server to
       | crash, then only that process will crash. Postmaster will detect
       | the child process termination, and send an error message to the
       | client. The connection will be lost, but other connections will
       | be unaffected.
       | 
       | It's not foolproof (there are ways to bring the whole server
       | down), but it does protect against many error conditions.
       | 
       | It is possible to trap on some exceptions in a threaded
       | environment, but cleaning up after eg. an attempted NULL pointer
       | dereference is going to be very difficult or impossible.
        
         | anarazel wrote:
         | We would still have a separate supervisor process of we moved
         | connections to threads.
        
       | baggy_trough wrote:
       | I hope they are conservative about this, because even the
       | smartest and best programmers in the world cannot create bug free
       | multithreaded code.
        
         | jerf wrote:
         | I mentally snarked to myself that "obviously they should
         | rewrite it in Rust first".
         | 
         | Then, after more thought, I'm not entirely sure that would be a
         | bad approach. I say this not to advocate for actually rewriting
         | it in Rust, but as a way of describing how difficult this is.
         | I'm not actually sure rewriting the relevant bits of the system
         | in Rust _wouldn 't_ be easier in the end, and obviously, that's
         | really, really hard.
         | 
         | This is _really_ hard transition.
         | 
         | I don't think multithread code quality should be measured in
         | absolutes. There are things that are so difficult as to be
         | effectively impossible, which is the lock-based approach that
         | was dominant in the 90s, and convinced developers that it's
         | just impossible difficult, but it's not multithreaded code
         | that's impossibly difficult, it's lock-based multithreading.
         | Other approaches range from doable to even not that hard once
         | you learn the relevant techniques (Haskell's full immutability
         | & Rust's borrow checker are both very solid), but of course
         | even "not that hard" becomes a lot of bugs when scaled up to
         | something like Postgres. But it's not like the current model is
         | immune to that either.
        
         | xwdv wrote:
         | Nonsense, multithreaded code can be written as bug free as
         | regular code. No need to fear.
        
           | preordained wrote:
           | It _can_ be. Anything can be. It is far more treacherous,
           | though.
        
           | taeric wrote:
           | I think the point is that some mistakes in process based code
           | are not realized as the bugs that they will be in threaded
           | code?
        
           | dboreham wrote:
           | This is true. However, the blast radius may be smaller with a
           | process model. Also recovering from a fatal error in one
           | session could possibly be easier. I say this as a 30-year
           | threading proponent.
        
           | PhilipRoman wrote:
           | I'm assuming you're referring to formally proven programs. If
           | that's the case, do you have any pointers?
           | 
           | Aside from the trivial while(!transactionSucceeded){retry()}
           | loop, I have trouble proving the correctness of my programs
           | when the number of threads is not small and finite.
        
           | baggy_trough wrote:
           | In theory, yes. In practice, no.
        
           | ajkjk wrote:
           | It is just harder.
        
         | mmphosis wrote:
         | _Concurrency isn't a "nice layer over pthreads" - the most
         | important thing is isolation - anything that mucks up isolation
         | is a mistake.
         | 
         | -- Joe Armstrong_
         | 
         | Threads are evil. https://www.sqlite.org/faq.html#q6
         | https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-...
         | 
         | Nginx uses an asynchronous event-driven approach, rather than
         | threads, to handle requests.
         | https://aosabook.org/en/v2/nginx.html
         | http://www.kegel.com/c10k.html
        
         | jrott wrote:
         | The fact that they are planning on doing this across multiple
         | releases gives me hope that they'll be cautious with this.
        
         | johannes1234321 wrote:
         | The code already is multithreaded. They have shared state just
         | across multiple processes instead of threads within a process.
         | 
         | They might even reduce complexity that way.
        
           | usefulcat wrote:
           | It's not the same at all for global variables, of which pgsql
           | apparently has around a couple thousand.
           | 
           | If every process is single threaded, you don't have to
           | consider the possibility of race conditions when accessing
           | any of those ~2000 global variables. And you can pretty much
           | guarantee that little if any of the existing code was written
           | with that possibility in mind.
        
             | ants_a wrote:
             | Those global variables would be converted to thread locals
             | and most of the code would be oblivious of the change. This
             | is not the hard part of the change.
        
         | anarazel wrote:
         | Postgres is already concurrent today. There's a _lot_ of shared
         | state between the processes (via shared memory).
        
       | mastax wrote:
       | It would be interesting to have something between threads and
       | processes. I'll call them heavy-threads for sake of discussion.
       | 
       | Like light-threads, heavy-threads would share the same process-
       | security-boundary and therefore switching between them would be
       | cheap. No need to flush TLB, I$, D$.
       | 
       | Like processes, heavy-threads would have mostly-separate address
       | spaces by default. Similar to forking a process, they could share
       | read-only mappings for shared libraries, code, COW global
       | variables, and explicitly defined shared writable memory regions.
       | 
       | Like processes, heavy-threads would isolate failure states. A C++
       | exception, UNIX signal, segfault, etc. would kill only the heavy-
       | thread responsible.
        
         | mike_hearn wrote:
         | There are some problems.
         | 
         | 1. Mostly separate address spaces requires changing the TLB on
         | context switch (modern hw lets it be partial). You could use
         | MPKs to share a single address space with fast protection
         | switches.
         | 
         | 2. Threads share the global heap, but your heavy threads would
         | require explicitly defined shared writeable memory regions, so
         | presumably each one has its own heap. That's a fair bit of
         | overhead.
         | 
         | 3. Failure isolation is more complicated than deciding what to
         | kill.
         | 
         | The expand on the last point, Postgres _doesn 't_ isolate
         | failures to a single process because they do share memory and
         | might corrupt those shared memory regions. But even if you
         | don't have shared memory failure recovery isn't always easy.
         | Software has to be written specifically to plan for it. You can
         | kill processes because everything in the OS is written around
         | allowing for that possibility, for example, shells know what to
         | do if a sub-process is killed unexpectedly. Killing a heavy
         | thread (=process) is no good if the parent process is going to
         | wait for a reply from it forever because it wasn't written to
         | handle the process going away.
        
         | dasyatidprime wrote:
         | So what would be different between those and forked processes?
        
         | ShroudedNight wrote:
         | I've been pondering / ruminating with this too; I've been
         | somewhat surprised that few operating systems have played with
         | reserving per-thread address space as thread-local storage, or
         | requiring something akin to a 'far' pointer to access commonly-
         | addressed shared memory.
        
         | wbl wrote:
         | You cannot COW and share the TLB state. The caches aren't
         | flushed in process changes either: it's that the data is
         | different so evictions happen.
        
         | mattashii wrote:
         | > No need to flush TLB
         | 
         | TLB isn't "flushed" so much as it is useless across different
         | memory address spaces. Switching processes means switching
         | address spaces, which means you have to switch the contents of
         | the TLB to the new process' TLB entries, which eventually
         | indeed flushes the TLB, but that is only over time, not
         | necessarily the moment you switch processes.
         | 
         | > Like processes, heavy-threads would have mostly-separate
         | address spaces by default.
         | 
         | This thus conflicts with the need to not flush TLBs. You can't
         | not change TLB contents across address spaces.
        
       | lukeschlather wrote:
       | This sounds like a problem that would border on the complexity of
       | replacing the GIL in Ruby or Python. The performance benefits are
       | obvious but it seems like the correctness problems would be
       | myriad and a constant source of (unpleasant) surprises.
        
         | narrator wrote:
         | The correctness problem should be handled by a suite of
         | automated tests which PostgreSQL has. If all tests pass, the
         | application must work correctly. The project is too big, and
         | has too many developers to make much progress without full test
         | coverage. Where else would up-to-date documentation regarding
         | the correct behavior of PostgreSQL exist? In some developers
         | head? SQLite is pretty famous for there extreme approach to
         | testing including out of memory conditions, and other rare
         | circumstances: https://www.sqlite.org/testing.html
        
           | abalashov wrote:
           | > If all tests pass, the application must work correctly.
           | 
           | These are "famous last words" in many contexts, but when
           | talking about difficult-to-reproduce parallelism issues, I
           | just don't think it's a particularly applicable viewpoint at
           | all. No disrespect. :)
        
           | lukeschlather wrote:
           | Parallelism is often incredibly hard to write automated tests
           | for, and this will most likely create parallelism issues that
           | were not dreamed of by the authors of the test suite.
        
         | MuffinFlavored wrote:
         | Does GIL stand for Global Interpreter Lock?
        
           | Yujf wrote:
           | yes
        
         | cactusfrog wrote:
         | This is different because there isn't a whole ecosystem of
         | packages that depend on access to a thread unsafe C API.
         | Getting the GIL out of core Python isn't too challenging.
         | Getting all of the packages that depend on Python's C API
         | working is.
        
           | masklinn wrote:
           | An other component of the Gil story is that removing the Gil
           | require adding fine grained locks, which (aside from making
           | VM development more complicated) significantly increases lock
           | traffic and thus runtime costs, which noticeably impacts
           | single-threaded performance, which is of major import.
           | 
           | Postgres starts from a share-nothing architecture, it's quite
           | a bit easier to evaluate the addition of sharing.
        
             | bsder wrote:
             | > which noticeably impacts single-threaded performance,
             | which is of major import.
             | 
             | 1) I don't buy this a priori. Almost everybody who removed
             | a gigantic lock suddenly realizes that there was more
             | contention than they thought and that atomizing it made
             | performance improve.
             | 
             | 2) Had Python bitten the bullet and removed the GIL back at
             | Python 3.0, the performance would likely already be back to
             | normal or better. You can't optimize hypothetically.
             | Optimization on something like Python is an accumulation of
             | lots of small wins.
        
             | anarazel wrote:
             | Postgres already shares a lot of state between processes
             | via shared memory. There's not a whole lot that would
             | initially change from a concurrency perspective.
        
             | ComputerGuru wrote:
             | > which (aside from making VM development more complicated)
             | significantly increases lock traffic and thus runtime
             | costs, which noticeably impacts single-threaded
             | performance, which is of major import.
             | 
             | I don't think that's a fair characterization of the trade
             | offs. Acquiring uncontended mutexes is basically free (and
             | fairly side-effect free) so single-threaded performance
             | will not be noticeably impacted.
             | 
             | Every large C project I'm aware of (read: kernels) that has
             | publicly switched from coarse locks to fine-grained locks
             | has considered it to be a huge win with little to no impact
             | on single-threaded performance. You can even gain
             | performance if you chop up objects or allocations into
             | finer-grained blobs to fit your finer-grained locking
             | strategy because it can play nicer with cache friendliness
             | (accessing one bit of code doesn't kick the other bits of
             | code out of the cache).
        
           | erikpukinskis wrote:
           | > there isn't a whole ecosystem of packages that depend on
           | access to a thread unsafe C API
           | 
           | They mentioned a similar issue for Postgres extensions, no?
           | 
           | > Haas, though, is not convinced that it would ever be
           | possible to remove support for the process-based mode.
           | Threads might not perform better for all use cases, or some
           | important extensions may never gain support for running in
           | threads.
        
             | scolby33 wrote:
             | I question how important an extension is if there's not
             | enough incentive to port it to the new paradigm, at least
             | eventually.
        
               | abalashov wrote:
               | Well. The thing with that is just that there are a lot of
               | extensions. Like, a lot!
        
       | gjvc wrote:
       | surely this is "Some guy reconsiders the process-based model of
       | PostgreSQL"
        
         | Icathian wrote:
         | Uh. Heikki is definitely not just "some guy". Dude is one of
         | the top contributors to Postgres.
        
           | gjvc wrote:
           | How does that make him immune to having dumb ideas? See, I'm
           | judging the idea on merit. You're just defending your hero.
        
       | sargun wrote:
       | I'm curious if they can take advantage of vfork / CLONE_VM, to
       | get the benefits of sharing memory and lower overhead context
       | switches, with the trade of still getting benefits from the
       | scheduler, and sysadmin-friendliness.
       | 
       | The other thing that might be interesting is FUTEX_SWAP / UMCG.
       | Although it doesn't remove the overhead induced by context
       | switches entirely (specifically, you would still deal with TLB
       | misses), you can avoid dealing with things like speculative
       | execution exploit mitigations.
        
         | nneonneo wrote:
         | Per the article, Postgres has many, many global variables, many
         | of which track per-session state; much session state is "freed"
         | via process exit rather than being explicitly cleaned up.
         | Switching to CLONE_VM requires these problems to all be solved.
        
         | why-el wrote:
         | what about support for Windows?
        
       | jupp0r wrote:
       | Please don't use mutable global state in your work. Global
       | variables are universally bad and don't provide much of a
       | benefit. The number of desirable architectural refactoring that
       | I've witnessed turning into a muddy mess because of them is
       | daunting. This is one more example of this.
        
         | orthoxerox wrote:
         | You know what a database is, do you? It is the place where you
         | store your mutable global state. You can't kick the can down
         | the road forever, _someone_ has to tackle the complexity of
         | managing state.
        
         | slashdev wrote:
         | Thank you for sharing your ideological views, but this is not
         | the appropriate venue for that. If you want to have a software
         | _engineering_ discussion about the trade offs involved in
         | sharing global mutable state, this is a good venue for that.
         | All engineering is trade offs. As soon as you make blanket
         | statements that X is always bad, you've transitioned into the
         | realm of ideology. Now presumably you mean to say it's almost
         | always bad. But that really depends on the context. It may well
         | be almost always bad in average software projects, but
         | PostgreSQL is not your average software project. Databases are
         | a different realm.
        
           | refulgentis wrote:
           | Global mutable state being a poor choice in software
           | architecture isn't an ideology. There is no ideology that
           | argues it is awesome.
           | 
           | If you want to have a software _engineering_ discussion about
           | the trade offs involved in sharing global mutable state, this
           | is a good venue for that.
           | 
           | All engineering is trade offs. As soon as you start telling
           | people they're making blanket statements that X is always
           | bad, you've transitioned into the realm of nitpicking.
        
             | slashdev wrote:
             | It's awesome where performance considerations are
             | paramount. It's awesome in databases. It's awesome in
             | embedded software. It's awesome in operating system
             | kernels.
             | 
             | The fact is sometimes it's good. Saying it's universally
             | bad is going beyond the realm of logic and evidence and
             | into the realm of ideology.
        
             | megous wrote:
             | Using globals is simpler, it's also pretty natural in event
             | driven architectures. Passing everything via function
             | arguments is welcome for library code, but there's little
             | point to using it in application code. It just complicates
             | things.
        
       | [deleted]
        
       | CodeWriter23 wrote:
       | "no objections" <> "consent"
        
         | ed25519FUUU wrote:
         | Have you ever tried to move a large organization forward in a
         | certain direction? It's really hard. At some point you have to
         | make a decision.
        
           | timcobb wrote:
           | Not in something like Postgres, I hope
        
       | formerly_proven wrote:
       | ngmi
        
       | chasil wrote:
       | Oracle has similar problems.
       | 
       | On UNIX systems, Oracle uses a multi-process model, and you can
       | see these:                 $ ps -ef | grep smon            USER
       | PID  PPID  STARTED   TIME %CPU %MEM COMMAND       oracle  22131
       | 1   Mar 28   3:09  0.0  4.0 ora_smon_yourdb
       | 
       | Windows forks processes about 100x slower than Linux, so Oracle
       | runs threaded on that platform in one great big PID.
       | 
       | Sybase was the first major database that fully adopted threads
       | from an architectural perspective, and Microsoft SQL Server has
       | certainly retained and improved on that model.
        
         | EvanAnderson wrote:
         | > Windows forks processes about 100x slower than Linux...
         | 
         | I work with a Windows-based COTS webapp that uses Postgres w/o
         | any connection pooling. It's nearly excruciating to use because
         | it spins-up new Postgres processes for each page load. If not
         | for the fact that the Postgres install is "turnkey" with the
         | app I'd just move Postgres over to a Linux machine.
        
           | devit wrote:
           | Use pgbouncer
        
             | ethbr0 wrote:
             | Was curious about this as an architectural solution as
             | well.
             | 
             | We're really talking about X-per-client as the primary
             | reason to move away from processes, right?
             | 
             | So if you can get most of the benefit via pooling... why
             | inherit the pain of porting?
             | 
             | Presumably latency jitter would be a difficult problem with
             | pools, but it seems easier (and safer) than porting
             | processes -> threads.
             | 
             | Disclaimer: High performance / low latency DB code is
             | pretty far outside my wheelhouse.
        
               | ddorian43 wrote:
               | > We're really talking about X-per-client as the primary
               | reason to move away from processes, right?
               | 
               | Many other things too. Like better sharing of caches.
               | Lower overhead of thread instead of process. Etc. (read
               | the thread)
        
               | ilyt wrote:
               | The reasons are explained in article. Read the article
        
             | treis wrote:
             | That helps a lot but it's not a replacement for large
             | number of persistent connections. If you had that you could
             | simplify things in the application layer and do interesting
             | things with the DB.
        
           | ComputerGuru wrote:
           | If you run postgres under WSLv1 (now available on Server
           | Edition as well), the WSL subsystem handles processes and
           | virtual memory in a way that has been specifically designed
           | to optimize process initialization as compared to the
           | traditional Win32 approach.
        
           | chasil wrote:
           | It would not be difficult to simply "pg_dump" all the data to
           | Postgres on a Linux machine, then quietly set the clients to
           | use the new server.
        
         | blinkingled wrote:
         | Didn't Oracle switch to threaded model in 12c - at least on
         | Linux I remember there being a parameter to do that - it
         | dropped the number of processes significantly.
        
           | chasil wrote:
           | No, I ran that on v19.                 $ ps -ef | grep smon
           | UID        PID  PPID  C STIME TTY          TIME CMD
           | oracle   22131     1  0 Mar28 ?        00:03:09
           | ora_smon_yourdb            $ $ORACLE_HOME/bin/sqlplus -silent
           | '/ as sysdba'       select version_full from v$instance;
           | VERSION_FULL       -----------------       19.18.0.0.0
        
             | blinkingled wrote:
             | https://oracle-base.com/articles/12c/multithreaded-model-
             | usi...
             | 
             | Probably still requires the parameter to be set.
        
               | chasil wrote:
               | Contrast this to Microsoft SQL Server:                 $
               | systemctl status mssql-server       * mssql-
               | server.service - Microsoft SQL Server Database Engine
               | Loaded: loaded (/usr/lib/systemd/system/mssql-
               | server.service; disabled; vendor preset: disabled)
               | Active: active (running) since Mon 2023-06-19 15:48:05
               | CDT; 1min 18s ago            Docs:
               | https://docs.microsoft.com/en-us/sql/linux        Main
               | PID: 2125 (sqlservr)           Tasks: 123
               | CGroup: /system.slice/mssql-server.service
               | +-2125 /opt/mssql/bin/sqlservr                  +-2156
               | /opt/mssql/bin/sqlservr
        
               | blinkingled wrote:
               | Yeah multiprocess isn't Microsoft's style given how
               | expensive creating processes is on Windows.
               | 
               | Oracle - never had a scalability issue on very big Linux,
               | Solaris and HPUX systems though - they do it well in my
               | experience.
        
           | hans_castorp wrote:
           | > Didn't Oracle switch to threaded model in 12c
           | 
           | It's optional, and the default is still a process model on
           | Linux.
        
       | 0xbadcafebee wrote:
       | So compromise. Take the current process model, add threading and
       | shared memory, with feature flags to limit number of processes
       | and number of threads.
       | 
       | Want to run an extension that isn't threadsafe? Run with 10
       | processes, 1 threads. Want to run high-performance? Run with 1
       | process, 10 threads. Afraid of "stability issues"? Run with 1
       | process, 1 thread.
       | 
       | Will it be hard to do? Sure. Impossible? Not at all. Plan for it,
       | give a very long runway, throw all your new features into the
       | next major version branch, and tell people everything else is off
       | the table for the next few years. If you're _really sure_
       | threading is going to be increasingly necessary, better to start
       | now than to wait until it 's too late. But this idea of "oh it's
       | hard", "oh it's dangerous", "too complicated", etc is bullshit.
       | We've built fucking spaceships that visit other planets. We can
       | make a database with threads that doesn't break. Otherwise we
       | admit that basic software development using practices from the
       | past 30 years is too much for us to figure out.
        
       | EGreg wrote:
       | I hope they don't do it.
       | 
       | I've had a similar situation with PHP, where we had written quite
       | a large engine (https://github.com/Qbix/Platform) with many
       | features (https://qbix.com/features.pdf) . It took advantage of
       | the fact that PHP isolated each script and gave it its own global
       | variables, etc. In fact, much of the request handling did stuff
       | like this:                 Q_Request::requireFields(['a', 'b',
       | 'c']);       $uri = Q_Dispatcher::uri();
       | 
       | instead of stuff like this:
       | $this->getContext()->request()->requireFields(['a', 'b', 'c']);
       | $this->getContext()->dispatcher()->uri();
       | 
       | Over the last few years, I have run across many compelling
       | things:                 amp       reactPHP       Swoole (native
       | extension)       Fibers (inside PHP itself)
       | 
       | It seemed so cool! PHP could behave like Node! It would have an
       | event loop and everything. Fibers were basically PHP's version of
       | Swoole's coroutines, etc. etc.
       | 
       | Then I realized... we would have to go through the entire code
       | and redo how it all works. We'd also no longer benefit from PHP's
       | process isolation. If one process crapped out or had a memory
       | leak, it could take down everything else.
       | 
       | There's a reason PHP still runs 80% of all web servers in the
       | world (https://kinsta.com/blog/is-php-dead/) ... and one of the
       | biggest is that commodity servers can host terrible PHP code and
       | it's mostly isolated in little processes that finish "quickly"
       | before they can wreak havoc on other processes or on long-running
       | stuff.
       | 
       | So now back to postgres. It's been praised for its rock-solid
       | reliability and security. It's got so many features and the MVCC
       | is very flexible. It seems to use a lot of global variables. They
       | can spend their time on many other things, like making it
       | byzantine-fault-tolerant, or something.
       | 
       | The clincher for me was when I learned that php-fpm (which spins
       | up processes which sleep when waiting for I/O) is only 50% slower
       | than all those fancy things above. Sure, PHP with Swoole can
       | outperform even Node.js, and can handle twice as many requests.
       | But we'd rather focus on soo many other things we need to do :)
        
         | zackmorris wrote:
         | I've been using PHP for decades and have found its isolated
         | process model to be about the best around, certainly for any
         | mainstream language. Also Symfony's Process component
         | encapsulates most of the errata around process management in a
         | cross-platform way:
         | 
         | https://symfony.com/doc/current/components/process.html
         | 
         | Going from a working process implementation to async/threads
         | with shared memory is pretty much always a mistake IMHO,
         | especially if it's only done for performance reasons. Any speed
         | gains will be eclipsed by endless whack-a-mole bug fixes, until
         | the code devolves into something unrecognizable. Especially
         | when there are other approaches similar to map-reduce and
         | scatter-gather arrays where data is processed in a distributed
         | fashion and then joined into a final representation through
         | mechanisms like copy-on-write, which are supported by very few
         | languages outside of PHP and the functional programming world.
         | 
         | The real problem here is the process spawning and context-
         | switching overhead of all versions of Windows. I'd vote to
         | scrap their process code in its entirety and write a new
         | version based on atomic operations/lists/queues/buffers/rings
         | with no locks and present an interface which emulates the
         | previous poor behavior, then run it through something like a
         | SAT solver to ensure that any errata that existing software
         | depends on is still present. Then apps could opt to use the
         | direct unix-style interface and skip the cruft, or refactor
         | their code to use the new interface.
         | 
         | Apple did something similar to this when OS X was released,
         | built on a mostly POSIX Darwin, NextSTEP, Mach and BSD Unix. I
         | have no idea how many times Microsoft has rewritten their
         | process model or if they've succeeded in getting performance on
         | par with their competitors (unlikely).
         | 
         | Edit: I realized that the PHP philosophy may not make a lot of
         | sense to people today. In the 90s, OS code was universally
         | terrible, so for example the graphics libraries of Mac and
         | Windows ran roughly 100 times slower than they should for
         | various reasons, and developers wrote blitters to make it
         | possible for games to run in real time. That was how I was
         | introduced to programming. PHP encapsulated the lackluster OS
         | calls in a cross-platform way, using existing keywords from
         | popular languages to reduce the learning curve to maybe a day
         | (unlike Perl/Ruby, which are weird in a way that can be fun but
         | impractical to grok later). So it's best to think of PHP more
         | like something like Unity, where the nonsense is abstracted and
         | developers can get down to business. Even though it looks like
         | Javascript with dollar signs on the variables. It's also more
         | like the shell, where it tries to be as close as possible to
         | bare-metal performance, even while restricted to the 100x
         | interpreter slowdown of languages like Python. I find that PHP
         | easily saturates the processor when doing things in a data-
         | driven way by piping bytes around.
        
       | js4ever wrote:
       | Finally! This and a good multi master story and I'll finally
       | start to love Postgres
        
       | waselighis wrote:
       | It sounds to me like migrating to a fully multi-threaded
       | architecture may not be worth the effort. Simply reducing the
       | number of processes from thousands to hundreds would be a huge
       | win and likely much more feasible than a complete re-
       | architecture.
        
       | chucky_z wrote:
       | I wish they would do some kind of easy shared storage instead, or
       | in addition too. This sounds like an odd solution, however I've
       | scaled pgsql since 9 on very, very large machines and doing 1
       | pgsql cluster per physical socket ended up doing near-linear
       | scaling even on 100+ total core machines with TB+ of memory.
       | 
       | The challenge with this setup is that you need to do 1 writer and
       | multiple reader clusters so you end up doing localhost
       | replication which is super weird. If that requirement was somehow
       | removed that'd be awesome for scaling really huge clusters.
        
       | t43562 wrote:
       | Calm down guys! Threading is tricky but they can rewrite it all
       | in Rust so it'll be completely ok........
       | 
       | ;-)
        
       ___________________________________________________________________
       (page generated 2023-06-19 23:00 UTC)