[HN Gopher] PostgreSQL reconsiders its process-based model
       ___________________________________________________________________
        
       PostgreSQL reconsiders its process-based model
        
       Author : todsacerdoti
       Score  : 784 points
       Date   : 2023-06-19 16:33 UTC (1 days ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | mihaic wrote:
       | I'm honestly surprised it took them so long to reach this
       | conclusion.
       | 
       | > That idea quickly loses its appeal, though, when one considers
       | trying to create and maintain a 2,000-member structure, so the
       | project is unlikely to go this way.
       | 
       | As repulsive as this might sound at first, I've seen structures
       | of hundreds of fields work fine if the hierarchy inside them is
       | well organized and they're not just flat. Still, I have no real
       | knowledge of the complexity of the code and wish the Postgres
       | devs all the luck in the world to get this working smoothly.
        
         | rsaxvc wrote:
         | This is how I made my fork of libtcc lock-free.
         | 
         | Mainline has a lock so that all backends can use global
         | variables, but only one instance can do codegen at a time.
         | 
         | It was a giant refactoring Especially fun was when multiple
         | compilation units used the same static variable name, but it
         | all worked in the end.
        
           | 72deluxe wrote:
           | Out of curiosity, where is this fork? Sounds very
           | interesting.
        
             | rsaxvc wrote:
             | https://github.com/rsaxvc/tinycc-multithreaded
             | 
             | This is the multi-threaded compiler:
             | https://github.com/rsaxvc/tcc-swarm
             | 
             | With the multi-threaded tcc above it scales about as well
             | as multiprocess. With mainline it doesn't scale well at
             | all.
             | 
             | So far I haven't gotten around to reusing anything across
             | libtcc handles/instances, but would eventually like to
             | share mmap()'d headers across instances, as well as cache
             | include paths, and take invocation arguments through stdin
             | one compilation unit per line.
        
         | paulddraper wrote:
         | > I'm honestly surprised it took them so long to reach this
         | conclusion.
         | 
         | On the contrary, it's been discussed for ages. But it's a huge
         | change, with only modest advantages.
         | 
         | I'm skeptical of the ROI to be honest. Not that is doesn't have
         | value, but that it has more value than the effort.
        
           | 36364949thrw wrote:
           | > it's a huge change, with only modest advantages
           | 
           | +significant and unknown set of new problems, including new
           | bugs.
           | 
           | This reminds me of the time they lifted entire streets in
           | Chicago by 14 feet to address new urban requirements.
           | Chicago, we can safely assume, did not have the option of
           | just starting a brand new city a few miles away.
           | 
           | The interesting question here is should a system design that
           | works quite well upto a certain scale be abandoned in order
           | to extend its market reach.
        
           | datavirtue wrote:
           | Yeah, and you will run headlong into other unforseen real
           | world issues. You may never reach the performance goals.
        
         | loeg wrote:
         | Yeah. I think as a straightforward, easily correct transition
         | from 2000 globals, a giant structure isn't an awful idea. It's
         | not like the globals were organized before! You're just making
         | the ambient state (awful as it is) explicit.
        
           | stingraycharles wrote:
           | Yes, it's the most pragmatic and it's only "awful" because it
           | makes the actual problem visible. And would likely encourage
           | slowly refactoring code to handle its state in a more sane
           | way, until you're only left with the really gnarly stuff,
           | which shouldn't be too much anymore and you can put them in
           | individual thread local storages.
           | 
           | It's an easy transition path.
        
           | mihaic wrote:
           | Exactly, if you're now forced to put everything in one place
           | you're forced to acknowledge and understand the complexity of
           | your state, and might have incentives to simplify it.
        
             | Sesse__ wrote:
             | Here's MySQL's all-session-globals-in-one-place-class:
             | https://github.com/mysql/mysql-
             | server/blob/8.0/sql/sql_class...
             | 
             | I believe I can safely say that nobody acknowledges and
             | understands the complexity of all state within that class,
             | and that whatever incentives there may be to simplify it
             | are not enough for that to actually happen.
             | 
             | (It ends on line 4692)
        
               | IshKebab wrote:
               | Right but that would still be true if they were globals
               | instead. Putting all the globals in a class doesn't make
               | any difference to how much state you have.
        
               | Sesse__ wrote:
               | > Putting all the globals in a class doesn't make any
               | difference to how much state you have.
               | 
               | I didn't make any claims about the _amount_ of state. My
               | claim was that "you're forced to acknowledge and
               | understand the complexity of your state" (i.e., moving it
               | all together in one place helps understanding the state)
               | is plain-out wrong.
        
               | IshKebab wrote:
               | It's not wrong. Obviously putting it all in one place
               | makes you consider just how much of it you have, rather
               | than having it hidden away all over your code.
        
           | cakoose wrote:
           | > I think as a straightforward, easily correct transition
           | from 2000 globals, a giant structure isn't an awful idea.
           | 
           | Agree.
           | 
           | > It's not like the globals were organized before!
           | 
           | Using a struct with 2000 fields loses some encapsulation.
           | 
           | When a global is defined in a ".c" file (and not exported via
           | a ".h" file), it can only be accessed in that one ".c" file,
           | sort of like a "private" field in a class.
           | 
           | Switching to a single struct would mean that all globals can
           | be accessed by all code.
           | 
           | There's probably a way to define things that allows you to
           | regain some encapsulation, though. For example, some spin on
           | the opaque type pattern:
           | https://stackoverflow.com/a/29121847/163832
        
             | pasc1878 wrote:
             | No that is what a static in a .c file is for.
             | 
             | A plain global can be accessed from other compiled units -
             | agreed with no .h entry it is my=uch more error prone e.g.
             | you don't know the type but the variables name is exposed
             | to other objects
        
               | remexre wrote:
               | Wouldn't those statics also be slated for removal with
               | this change?
        
               | junon wrote:
               | At most they'd be determined to be read only constants
               | that are inlined during constant folding. This includes
               | most integral sized / typed scalar values that fit into
               | registers for the most part, and nothing you've taken the
               | address of either - those remain as static data.
        
               | cakoose wrote:
               | I think there might be a terminology mix-up here. In C, a
               | global variable with the `static` keyword is is still
               | mutable. So it typically can't be constant-
               | folded/inlined.
               | 
               | The `static` modifier in that context just means that the
               | symbol is not exported, so other ".c" files can't access
               | it.
        
               | bourgeoismedia wrote:
               | A static variable in C is mutable in the same sense that
               | a local variable is, but since it's not visible outside
               | the current compilation unit the optimizer is allowed to
               | observe that it's never actually modified or published
               | and constant fold it away.
               | 
               | Check out the generated assembly for this simple program,
               | notice that kBase is folded even though it's not marked
               | const: https://godbolt.org/z/h45vYo5x5
        
               | cakoose wrote:
               | It is also possible for a link-time optimizer to observe
               | that a non-static global variable is never modified and
               | optimize that away too.
               | 
               | But the Postgres mailing list is talking about 2000
               | global variables being a hurdle to multi-threading. I
               | doubt they just didn't realize that most of them can be
               | optimized into constants.
        
               | anarazel wrote:
               | Yea. Just about none of them could be optimized to
               | constants because, uh, they're not constant. We're not
               | perfect, but we do add const etc to TU level
               | statics/globals that are actually read only. And if they
               | are actually read only, we don't care about them in the
               | context of threading anyway, since they wouldn't need any
               | different behaviour anyway.
        
           | cogman10 wrote:
           | I think my bigger fear is around security. A process per
           | connection keeps things pretty secure for that connection
           | regardless of what the global variables are doing (somewhat
           | hard to mess that up with no concurrency going on in a
           | process).
           | 
           | Merge all that into one process with many threads and it
           | becomes a nightmare problem to ensure some random addon
           | didn't decide to change a global var mid processing (which
           | causes wrong data to be read).
        
             | dfox wrote:
             | All postgres processes run under the same system user and
             | all the access checking happens completely in userspace.
        
               | fdr wrote:
               | Access checking, yes, but the scope of memory corruption
               | does increase unavoidably, given the main thing the
               | pgsql-hackers investigating threads want: one virtual
               | memory context when toggling between concurrent work.
               | 
               | Of course, there's a huge amount of shared space already,
               | so a willful corruption can already do virtually
               | anything. But, more is more.
        
           | magicalhippo wrote:
           | We did this with a project I worked on. I came on after the
           | code was mature.
           | 
           | While we didn't have 2000 globals, we did have a non-trivial
           | amount, spread over about 300kLOC of C++.
           | 
           | We started by just stuffing them into a "context" struct, and
           | every function that accessed a global thus needed to take a
           | context instance as a new parameter. This was tedious but
           | easy.
           | 
           | However the upside was that this highlighted poor
           | architecture. Over time we refactored those bits and the main
           | context struct shrunk significantly.
           | 
           | The result was better and more modular code, and overall well
           | worth the effort in our case, in my opinion.
        
         | MuffinFlavored wrote:
         | > if the hierarchy inside them is well organized
         | 
         | is this another way to say "in a 2000 member structure, only 10
         | have significant voting power"?
        
           | Ankhers wrote:
           | This statement is not about people, it is about a C struct.
        
         | FooBarWidget wrote:
         | I don't get it. How is a 2000-member structure any different
         | from having 2000 global variables? How is maintaining the
         | struct possibly harder than maintaining the globals?
         | Refactoring globals to struct members is semantically nearly
         | identical, it may as well just be a mechanical, cosmetic
         | change, while also giving the possibility to move to a threaded
         | architecture.
        
           | ComputerGuru wrote:
           | Because global variables can be confined to individual cpp
           | files, exclusively visible in that compilation unit. It makes
           | them far easier to reason with than hoisting them to the
           | "global and globally visible" option if you just use a
           | gargantuan struct. Which is why a more invasive refactor
           | might be required.
        
             | imtringued wrote:
             | Just use thread local variables.
             | 
             | I abuse them for ridiculous things.
        
               | jeltz wrote:
               | That is the plan for PostgreSQL.
        
               | ComputerGuru wrote:
               | Yeah, I was really into that before there was even a
               | cross-compiler/cross-platform syntax for declaring TLS
               | values in C++ but have since "upgraded" to avoiding TLS
               | altogether where possible. The quality of the
               | implementations vary greatly from compiler and platform
               | to compiler and platform, you run into weird issues with
               | thread_at exit if they're not primitive types, they run
               | afoul of any fibers/coroutines/etc that have since become
               | extremely prevalent, and a few other things.
        
         | comboy wrote:
         | I've never really been limited by CPU when running postgres
         | (few TB instances). The bottleneck is always IO. Do others have
         | different experience? Plus there's elegance and a feeling of
         | being in control when you know query is associated with
         | specific process which you can deal with and monitor just like
         | any other process.
         | 
         | But I'm very much clueless about internals, so this is a
         | question rather than an opinion.
        
           | hyperman1 wrote:
           | I see postgres become CPU bound regularly: Lots of hash
           | joins, copy from or to CSV, index or materialized view
           | rebuild. Postgis eats CPU. Tds_fdw tends to spend a lot of
           | time doing charset conversion, more than actually networking
           | to mssql.
           | 
           | I was surprised when starting with postgres. Then again, I
           | have smaller databases (A few TB) and the cache hit ratio
           | tends to be about 95%. Combine that with SSDs, and it becomes
           | understandable.
           | 
           | Even so, I am wary of this change. Postgres is very reliable,
           | and I have no problem throwing some extra hardware to it in
           | return. But these people have proven they know what they are
           | doing, so I'll go with their opinion.
        
             | aetherson wrote:
             | I've also definitely seen a lot of CPU bounding on
             | postgres.
        
           | sargun wrote:
           | With modern SSDs that can push 1M IOPs+, you can get into a
           | situation where I/O latency starts to become a problem, but
           | in my experience, they far outpace what the CPU can do. Even
           | the I/O stack can be optimized further in some of these
           | cases, but often it comes with the trade off of shifting more
           | work into the CPU.
        
           | ilyt wrote:
           | >I've never really been limited by CPU when running postgres
           | (few TB instances). The bottleneck is always IO.
           | 
           | Throw a few NVMe drives at it and it might.
        
             | dfox wrote:
             | Throw a ridiculous amount of RAM at it is more correct
             | assessment. NVMe reads are still an "I/O" and that is slow.
             | And for at least 10 years buying enough RAM to have all off
             | the interesting parts of OLTP psql database either in
             | shared_buffers or in the OS-level buffer cache is
             | completely feasible.
        
               | rcxdude wrote:
               | an array of modern SSDs can get to a similar bandwidth to
               | RAM, albeit with significantly worse latency still. It's
               | not that hard to push the bottleneck elsewhere in a lot
               | of workloads. High performance fileservers, for example,
               | need pretty beefy CPUs to keep up.
        
               | ilyt wrote:
               | > NVMe reads are still an "I/O" and that is slow
               | 
               | It's orders of magnitude faster than SAS/SATA SSDs and
               | you can throw 10 of them into 1U server. It's nowhere
               | near "slow" and still easy enough to be CPU bottlenecked
               | before you get IO bottlenecked.
               | 
               | But yes, pair of 1TB RAM servers gotta cost you less than
               | half year's worth of developer salary
        
           | phamilton wrote:
           | I've generally had buffer-cache hit rates in the 99.9% range,
           | which ends up being minimal read I/O. (This is on AWS Aurora,
           | where these bo disk cache and so shared_buffers is the
           | primary cache, but an equivalent measure for vanilla postgres
           | exists.)
           | 
           | In those scenarios,there's very little read I/O. CPU is the
           | primary bottleneck. That's why we run up as many as 10 Aurora
           | readers (autoscaled with traffic).
        
           | paulddraper wrote:
           | Depends on your queries.
           | 
           | If you push a lot of work into the database including JSON
           | and have a lot of buffer memory...CPU can easily be limiting.
        
           | Diggsey wrote:
           | It's not just CPU - memory usage is also higher. In
           | particular, idle connections still consume signficant memory,
           | and this is why PostgreSQL has so much lower connection
           | limits than eg. MySQL. Pooling can help in some cases, but
           | pooling also breaks some important PostgreSQL features (like
           | prepared statements...) since poolers generally can't
           | preserve session state. Other features (eg. notify) are just
           | incompatible with pooling. And pooling cannot help with
           | connections that are idle but inside a transaction.
           | 
           | That said, many of these things are solvable without a full
           | switch to a threaded model (eg. by having pooling built-in
           | and session-state-aware).
        
             | ComputerGuru wrote:
             | > solvable without a full switch to a threaded model (eg.
             | by having pooling built-in and session-state-aware).
             | 
             | Yeeeeesssss, but solving that is solving the hardest part
             | of switching to a threaded model. It requires the team to
             | come terms with the global state and encapsulating session
             | state in a non-global struct.
        
             | anarazel wrote:
             | > That said, many of these things are solvable without a
             | full switch to a threaded model (eg. by having pooling
             | built-in and session-state-aware).
             | 
             | The thing is that that's a lot easier with threads. Much of
             | the session state lives in process private memory (prepared
             | statements etc), and it can't be statically sized ahead of
             | time. If you move all that state into dynamically allocated
             | shared memory, you've basically paid all the price for
             | threading already, except you can't use any tooling for
             | threads.
        
         | saulrh wrote:
         | Also, even if a 2k-member structure is obnoxious, consider the
         | alternative - having to think about and manage 2k global
         | variables is probably even worse!
        
           | megous wrote:
           | Each set of globals is in a module it relates to, not in some
           | central file where everything has to be in one struct.
           | 
           | If anything, it's probably easier to understand.
        
         | hans_castorp wrote:
         | > I'm honestly surprised it took them so long to reach this
         | conclusion.
         | 
         | Oracle also uses a process model on Linux. At some point (I
         | think starting with 12.x), it can now be configured on Linux to
         | use a threaded model, but the default is still a process-per-
         | connection model.
         | 
         | Why does everybody think it's a bad thing in Postgres, but
         | nobody thinks it's a bad thing in Oracle.
        
           | patmorgan23 wrote:
           | Well for one Postgress is open source and widely used. So
           | anyone can pick it up and look at its internals, that's not
           | the case for Oracle DB .
        
         | topspin wrote:
         | > I'm honestly surprised it took them so long to reach this
         | conclusion.
         | 
         | I'm not. You can get a long way with conventional IPC, and OS
         | processes provide a lot of value. For most PostgreSQL instances
         | the TLB flush penalty is _at least_ 3rd or 4th on the list of
         | performance concerns, _far_ below prevailing storage and
         | network bottlenecks.
         | 
         | I share the concerns cited in this LWN story. Reworking this
         | massive code base around multithreading carries a large amount
         | of risk. PostgreSQL developers will have to level up
         | substantially to pull it off.
         | 
         | A PostgreSQL endorsed "second-system" with the (likely
         | impossible, but close enough that it wouldn't matter) goal of
         | 100% client compatibility could be a better approach. Adopting
         | a memory safe language would make this both tractable and
         | attractive (to both developers and users.) The home truth is
         | that any "new process model" effort would actually play out
         | exactly this way, so why not be deliberate about it?
        
           | gmokki wrote:
           | Would something like opt-in sharing of pages between
           | processes that oracle has been trying to get into kernel be
           | the correct option: https://lwn.net/ml/linux-
           | kernel/cover.1682453344.git.khalid....
           | 
           | Postmaster would just share the already shared memory between
           | processes (containing also the locks). That explicit part of
           | memory would opt-in to thread -like sharing and thus get
           | faster/less tlb switching and lower memory usage. While all
           | the rest of the state would still be per-process and safe.
           | 
           | tl;dr super share the existing shared memory area with kernel
           | patch
           | 
           | All operating systems not supporting it would keep working as
           | is.
        
             | topspin wrote:
             | Yes, it would mitigate the TLB problem. Interesting that
             | Oracle is also looking to solve this problem, but not by
             | multithreading the Oracle RDBMS.
        
           | atonse wrote:
           | Would this basically be a new front end? Like the part that
           | handles sockets and input?
           | 
           | Or more if a rewrite of subsystems? Like the query planner or
           | storage engine etc?
        
             | topspin wrote:
             | Both, I'd imagine.
             | 
             | With regard to client compatibility there are related
             | precedents for this already; the PostgreSQL wire protocol
             | has emerged as a de facto standard. Cockroachdb and
             | ClickHouse are two examples that come to mind.
        
           | nextaccountic wrote:
           | From what I gather postgres isn't doing conventional IPC but
           | instead it uses shared memory, which means the same mechanism
           | threads use but with way higher complexity
        
             | mgaunard wrote:
             | What do you think IPC is?
        
             | topspin wrote:
             | As does Oracle, and others. I'm aware.
             | 
             | IPC, to me, includes the conventional shared memory
             | resources (memory segments, locks, semaphores, condition
             | variable, etc.) used by these systems: resources acquired
             | by processes for the purpose of communication with other
             | processes.
             | 
             | I get it though. The most general concept of shared memory
             | is not coupled to an OS "process." You made me question
             | whether my concept of term IPC was valid, however. So what
             | does one do when a question appears? Stop thinking
             | immediately and consult a language model!
             | 
             | Q: Is shared memory considered a form of interprocess
             | communication?
             | 
             | GPT-4: Yes, shared memory is indeed considered a form of
             | interprocess communication (IPC). It's one of the several
             | mechanisms provided by an operating system to allow
             | processes to share and exchange data.
             | 
             | ...
             | 
             | Why does citing ChatGPT make me feel so ugly inside?
        
               | faangsticle wrote:
               | > Why does citing ChatGPT make me feel so ugly inside?
               | 
               | Its the modern let me Google that for you. Just like
               | people don't care what the #1 result on Google is, they
               | also don't care what ChatGPT has to say about it. If they
               | did, they'd ask it themselves.
        
               | pritambarhate wrote:
               | Without a credible source to reconfirm what ChatGPT said,
               | one can't really assume what ChatGPT says is correct.
        
               | TeMPOraL wrote:
               | I always understood IPC, "interprocess communication", in
               | general sense, as anything and everything that can be
               | used by processes to communicate with each other - of
               | course with a narrowing provision that common use of the
               | term refers to those means that are typically used for
               | that purpose, are relatively efficient, and the process
               | in question run on the same machine.
               | 
               | In that view, I always saw shared memory as IPC, in that
               | it is a tool commonly used to exchange data between
               | processes, but of course it is not strictly tied to any
               | process in particular. This is similar to files, which if
               | you squint are a form of IPC too, and are also not tied
               | to any specific process.
               | 
               | > _Why does citing ChatGPT make me feel so ugly inside?_
               | 
               | That's probably because, in cases like this, it's not
               | much different to stating it yourself, but is more noisy.
        
             | wbl wrote:
             | Not necessarily. Man 3 shmem if you want a journey back to
             | some bad ideas.
        
         | shepardrtc wrote:
         | I think this is a situation where a message-passing Actor-based
         | model would do well. Maybe pass variable updates to a single
         | writer process/thread through channels or a queue.
         | 
         | Years ago I wrote an algorithmic trader in Python (and Cython
         | for the hotspots) using Multiprocessing and I was able to get
         | away with a lot using that approach. I had one process
         | receiving websocket updates from the exchange, another process
         | writing them to an order book that used a custom data
         | structure, and multiple other processes reading from that data
         | structure. Ran well enough that trade decisions could be made
         | in a few thousand nanoseconds on an average EC2 instance. Not
         | sure what their latency requirements are, though I imagine they
         | may need to be faster.
         | 
         | Obviously mutexes are the bottleneck for them at this point,
         | and while my idea might be a bit slower than a low-load
         | situation, perhaps it would be faster when you start getting to
         | higher load.
        
           | hamandcheese wrote:
           | I think the Actor model is fine if you start there, but I
           | can't imagine incrementally adopting it in a large,
           | preexisting code base.
        
           | ilyt wrote:
           | That would most likely be several times slower than current
           | model
        
       | bb88 wrote:
       | This reminds me of this poster: "You must be this tall..."
       | 
       | https://bholley.net/blog/2015/must-be-this-tall-to-write-mul...
       | 
       | Back about a decade ago I was "auditing" someone else's threaded
       | code. And couldn't figure it out. But he was the company's
       | "golden child" so by default it must be working code because he
       | wrote it.
       | 
       | And then it started causing deadlocks in prod.
       | 
       | "What do you want me to do about it? It's the golden child's
       | code. He's not even gonna show up til 2pm today."
        
         | wmf wrote:
         | The thing is... multi-process with a bespoke shared memory
         | system isn't better than multithreading; it's much worse.
        
           | citrin_ru wrote:
           | The difference is between everything shared (threads) and
           | some parts are shared explicitly (processes with shared
           | memory). I'm not sure 2nd is worse.
        
           | paulddraper wrote:
           | It kinda is though. The process barrier is better at
           | enforcing careful deliberate interactions
        
           | throwawaylinux wrote:
           | By bespoke you mean using standard interfaces to create
           | shared memory pools?
           | 
           | They do roll some of their own locking primitives, but that's
           | not particularly unusual in a large portable program (and
           | quite likely what they wanted is/was not available in glibc
           | or other standard libraries, at least when first written).
        
           | __turbobrew__ wrote:
           | In Linux, multi process with shared memory regions is
           | basically just threads. The kernel doesn't know anything
           | about threads, it knows about processes and it lets you share
           | memory regions between those processes if you so desire.
        
           | anarazel wrote:
           | I'm not sure if I'd judge it as harshly, but you have a good
           | point: A lot of debugging / validation tooling understands
           | threads, but not memory shared between processes.
        
       | georgewfraser wrote:
       | I wonder if it would be easier to create a C virtual machine that
       | emulates all the OS interaction, then recompile Postgres and the
       | extensions to run on this. Perhaps TruffleC would work?
       | 
       | https://dl.acm.org/doi/10.1145/2647508.2647528
        
         | anarazel wrote:
         | Hard to believe that would provide any benefit without also
         | causing massive slowdowns.
        
       | thecopy wrote:
       | This feels like developers are bored and want a challenge.
        
         | dboreham wrote:
         | It's a multi-decade ask from many PG users and a serious pain
         | point for many deployments.
        
       | benlivengood wrote:
       | This is interesting because Google just created AlloyDb[0] which
       | is decidedly multiprocess for performance and switches out the
       | storage layer from a read/write model to write+replicate + read-
       | only model.
       | 
       | The deep dive[1] has some details; the tl;dr: is that the main
       | process only has to output Write Ahead Logs to the durable
       | storage layer which minimizes transaction latency. The log
       | processing service materializes postgres-compatible on-disk
       | blocks that read-only replicas can read from, with a caching
       | layer for block reads which sends cache invalidations from the
       | LPS to read replicas.
       | 
       | I'm not sure if similar benefits could be seen within a single
       | machine; using network DMA or even rDMA to transfer bytes to and
       | from remote machines also avoids TLB invalidation. There are some
       | mentions in the mailing list of waiting for Linux to support
       | shared page mappings between processes as a solution.
       | 
       | I'm not exactly sure I understand the reasoning behind process
       | separation as crash recovery; as far as I understand each
       | connection is responsible for correctness and so if a process
       | crashes there seems to be an assumption that the database can
       | recover and keep working by killing that process but that seems
       | like it risks silent data corruption; perhaps it's equivalently
       | mitigated by separating materialization of blocks from the sync'd
       | WAL in a separate process from the multithreaded connection
       | process producing WAL entries?
       | 
       | [0] https://cloud.google.com/alloydb [1]
       | https://cloud.google.com/blog/products/databases/alloydb-for...
        
       | Icathian wrote:
       | Some interesting discussion on this here also:
       | https://news.ycombinator.com/item?id=36284487
        
       | [deleted]
        
       | levkk wrote:
       | Pretty sure Tom Lane said this will be a disaster in that same
       | pgsql-hackers thread. Not entirely sure what benefits the multi-
       | threaded model will have when you can easily saturate the entire
       | CPU with just 128 connections and a pooler. So I doubt there is
       | consensus or even strong desire from the community to undertake
       | this boil the ocean project.
       | 
       | On the other hand, having the ability to shut down and cleanup
       | the entire memory space of a single connection by just
       | disconnecting is really nice, especially if you have extensions
       | that do interesting things.
        
         | ilyt wrote:
         | >Not entirely sure what benefits the multi-threaded model will
         | have when you can easily saturate the entire CPU with just 128
         | connections and a pooler.
         | 
         | That the all of those would work faster because of performance
         | benefits, as mentioned in article
        
         | BoardsOfCanada wrote:
         | From the article:
         | 
         | > Tom Lane said: "I think this will be a disaster. There is far
         | too much code that will get broken". He added later that the
         | cost of this change would be "enormous", it would create "more
         | than one security-grade bug", and that the benefits would not
         | justify the cost.
        
           | timcobb wrote:
           | You can think of this as an opportunity to rewrite in Rust.
        
             | tracker1 wrote:
             | AfterPostgres
        
               | krylon wrote:
               | May I humbly suggest PostPostgres, or Post2gres?
        
               | tomjakubowski wrote:
               | 2Post2Furigres
        
               | moi2388 wrote:
               | Postgr3s: Tokyo Thread
        
       | glintik wrote:
       | Mysql guys: Smart move, Postgres!
        
       | pjmlp wrote:
       | So when we as a whole decided that multiprocessing is a much
       | better approach from security and application stability point of
       | view, they decide to go with threads?
        
         | blinkingled wrote:
         | Horses for courses I guess - purely threaded vs purely MP both
         | have different set of tradeoffs and shoehorning one over the
         | other always fails some use cases. The article says they are
         | also considering the possibility of having to keep both process
         | and thread models indefinitely for this and other reasons.
         | 
         | I know nothing of PG internals but I can see why process per
         | connection model doesn't work for large machines and/or high
         | number of connections. One way to do it would be to keep
         | connection handling per thread and still keep multiprocess
         | approach where it makes sense for security and doesn't add
         | linear overheads.
        
       | RcouF1uZ4gsC wrote:
       | This would be one those places where a language like Rust would
       | be helpful. In C/C++ with undefined behavior and crashes, process
       | isolation makes a lot of sense to limit the blast radius. Rust
       | borrow checker gives you at compile time a lot of the safety that
       | you would rely on process isolation for.
        
         | mattashii wrote:
         | Yes, but note that the blast radius of a PostgreSQL process
         | crash is already "the whole system reboots", so there are not a
         | lot of differences between process- and thread-based PostgreSQL
         | written in C.
         | 
         | Rewriting in Rust would be interesting, but it would also
         | probably be too invasive to make it worthwile at all - all code
         | in PostgreSQL is C, while not all code in PostgreSQL interacts
         | with the intrinsics of processes vs threads. Any rewrite to
         | Rust would likely take several times more effort than a port to
         | threads.
        
           | megous wrote:
           | PostgreSQL process crash may also just be, one query fails.
        
       | beebmam wrote:
       | Why change postgres? Just fork it if you want to change something
       | this fundamental.
        
         | anarazel wrote:
         | Because it makes it a lot easier to address some of postgres'
         | weaknesses? This is a proposal by long time contributor to
         | postgres, that a number of other long time contributors agree
         | with (and others disagree with!). Why shouldn't have Heikki
         | brought this up for discussion?
        
       | ceeam wrote:
       | Is there any reason at all people use intrinsically bug-prone and
       | broken multithreading mode instead of fork() and IPC apart from
       | WinAPI having no proper fork?
        
         | adwn wrote:
         | Yes, and some of those reasons are even listed in the article.
        
           | ceeam wrote:
           | TLB misses? They are just a detail of particular CPU
           | implementation, and the architectures change. Also, aren't
           | they per core and not per process? What would that solve then
           | to switch to MT?
        
             | adwn wrote:
             | > _TLB misses? They are just a detail of particular CPU
             | implementation, and the architectures change._
             | 
             | TLBs are "just a detail" of roughly 100% of server,
             | desktop, and mobile CPUs.
             | 
             | > _Also, aren 't they per core and not per process? What
             | would that solve then to switch to MT?_
             | 
             | TLB entries are per address space. Threads share an address
             | space, processes do not.
        
       | tragomaskhalos wrote:
       | Worked on a codebase which was separate processes, each of which
       | has a shedload of global variables. It was a nightmare working
       | out what was going on, not helped by the fact that there was no
       | naming convention for the globals, plus they were not declared in
       | a single place. I believe their use was a performance move, ie
       | having the linker pin a var to a specific memory location rather
       | than copying it to the stack as a variable and referencing it by
       | offset the whole time. Premature optimisation? Optimisation at
       | all? Who knows, but there's a good reason coding standards
       | typically militate against globals.
        
         | TomMasz wrote:
         | There are _2000_ globals here, so more like a couple of
         | shedloads. While this is something you 'd sort of expect for a
         | product that's been around 30+ years, it really seems like
         | there's a lot of optimization that could happen and still stick
         | with the process model.
        
         | JdeBP wrote:
         | Per discussion on this very page, in the headlined article, and
         | in the mailing list discussion it references, PostgreSQL is not
         | in that category. It has lots of _static storage duration_
         | variables, which _do not_ necessarily have external linkage.
         | 
         | Robert Haas pointed out in one message that an implementation
         | pattern was to use things like file-scope static storage
         | duration variables to provide session-local state for
         | individual components. This is why they've been arguing against
         | a single giant structure declared in "session.h" as an
         | approach, as it requires every future addition to session state
         | to touch the central core of the entire program.
         | 
         | They want to keep the advantage of the fact that these
         | variables are in fact _not global_. They are local; and the
         | problem is rather that they have static storage duration and
         | are not per-thread, and thus are not per-session in a thread-
         | per-session model.
        
         | ajkjk wrote:
         | There's something to be said for globals whose access is well-
         | managed, though.
         | 
         | IMO: if the variable is _truly_ global, i.e. code all over the
         | codebase cares about it, then it should just be global instead
         | of pretending like it's not with some fancy architecture.
         | 
         | The tricky part is reacting to changes to a global variable.
         | Writing a bunch of "on update" logic leads to madness. The
         | ideal solution is for there to be some sort of one-directional
         | flow for updates, like when a React component tree is re-
         | rendered... but that's very hard to build in an application
         | that doesn't start out using a library like React in the first
         | place.
        
       | shariat wrote:
       | this will be the beginning of the end of postgres
        
       | haburka wrote:
       | That sounds like really hard programming. I'm glad I write react
       | and get paid possibly much more.
        
         | paulddraper wrote:
         | We're glad you're writing React too :)
        
         | Exuma wrote:
         | I feel this sort of undertaking could only be done by those
         | programmers who truly value domain knowledge above all else
         | (money, etc). I'm more of the entrepreneureal mind so I
         | generally only learn as much as needed to do some task (even if
         | it's very difficult), but just seeking information as a means
         | to an end doesn't feel fulfilling to me. Of course many people
         | DO find that, and its upon those people's shoulders that heroic
         | things like this rest, and I'm very thankful to them.
        
       | cryptica wrote:
       | Seems like a bad idea. Processes are more elegant and scalable
       | than threads as they discourage the use of shared memory. Shared
       | memory is often a bad idea. You end up with different threads
       | competing and queuing up to access or write the same data (e.g.
       | waiting on each other to acquire a lock with mutexes - This
       | immediately disqualifies the system from becoming embarrassingly
       | parallel) and it becomes the OS's problem to figure out when to
       | allow which thread to access what memory... This is bad because
       | the OS doesn't care about optimizing memory access for your
       | specific use case. It will treat your 'high performance' database
       | in the same way as it treats a run-of-the-mill Gimp desktop
       | application....
       | 
       | With the process model, it encourages using separate memory for
       | each process; this forces developers to think about things like
       | memory consistency and availability and gives them more
       | flexibility in terms of scalability across multiple CPU cores or
       | even hosts. Processes are far better abstractions than threads
       | for modeling concurrent systems since their logic is
       | fundamentally the same regardless of whether they run across
       | different CPU cores or different hosts.
       | 
       | > The overhead of cross-process context switches is inherently
       | higher than switching between threads in the same process
       | 
       | I remember researching this a while back. It depends on the
       | specific OS and hardware. It's not so straight forward and this
       | is something which tends to change over time and the differences
       | are usually insignificant anyway.
       | 
       | Also, it's important not to conflate performance with scalability
       | - These two characteristics are orthogonal at best and oftentimes
       | conflicting.
       | 
       | Oftentimes, to scale horizontally, a system needs to incur a
       | performance penalty as additional work is required to route and
       | coordinate actions across multiple CPUs or hosts. A scalable
       | system can service a much larger number (or even sometimes
       | theoretically unlimited) number of requests but it will typically
       | perform worse than a non-scalable system if you judge it on a
       | requests-per-CPU-core basis.
        
       | uudecoded wrote:
       | I am going to go ahead and trust Tom Lane on this one, over
       | someone who is working on "serverless Postgres". Godspeed to the
       | forthcoming fork.
        
         | aseipp wrote:
         | Heikki Linnakangas is one of the top Postgres contributors of
         | all time, he isn't just "someone." The fact he's working for a
         | startup on a fork (that already exists, which you can run right
         | now on your local machine) doesn't warrant any snide dismissal.
         | Robert Haas admitted that it would be a huge amount of work and
         | that it would only be achievable by a small few people anyway,
         | Heikki being among them.
         | 
         | Anyway, I think there are definitely limits that are starting
         | to appear with Postgres in some spots. This is probably one of
         | the most difficult possible solutions to some of those
         | problems, but even if they don't switch to a fully threaded
         | model, being more CPU efficient, better connection handling,
         | etc will all go a substantial way. Doing some of the really
         | hard work is better than none of it, probably.
        
           | selimnairb wrote:
           | I wonder what AWS's PostgreSQL-compatible Aurora looks like
           | under the hood. Does it use threading, processes, both?
        
       | winrid wrote:
       | At the end of the day, this doesn't solve any problems. Small
       | setups use postgres directly just fine, and large setups use
       | pgbouncer, and having process isolation with extensions is a good
       | thing and probably simplifies things a lot.
       | 
       | My $0.02
        
       | rburhum wrote:
       | Sorry if I offend anybody, but this sounds like such a bad idea.
       | I have been running various versions of postgres in production
       | for 15 years with thousands of processes on super beefy machines,
       | and I can tell you without a doubt that sometimes those processes
       | crash - specially if you are running any of the extensions.
       | Nevertheless, Postgres has 99% of the time proven to be
       | resilient. The idea that a bad client can bring the whole cluster
       | down because it hit a bug sounds scary. Every try creating a
       | spatial index on thousands/millions of records that have nasty
       | overly complex or badly digitized geometries? Sadly, crashes are
       | part of that workflow, and changing this from process to
       | threading would mean all the other clients also crashing and
       | cutting connections. This as a potential problem because I want
       | to avoid context switching overhead or cache misses, no thanks.
        
         | mike_hock wrote:
         | Also, reducing context switching overhead (or any other CPU
         | overhead) is probably not gonna fix the garbage I/O
         | performance.
        
         | Shorel wrote:
         | Reading your comment makes me think it is not only a good idea,
         | it is a necessity.
         | 
         | Relying on crashing as a bug recovery system is a good idea?
         | Crashing is just part of the workflow? That's insane, and a
         | good argument against PostgreSQL in any production system.
         | 
         | It is possible PostgreSQL doesn't migrate to a thread based
         | model, and I am not arguing they should.
         | 
         | But debug and patch the causes of these crashes? Absolutely
         | yes, and the sooner, the better.
        
           | dialogbox wrote:
           | It's all about trade off.
           | 
           | Building a database which is never gonna crash might be
           | possible but at what cost? Can you name any single real world
           | system archived that? Also, there can be a regression. More
           | tests? Sure but again, at what cost?
           | 
           | While we are trying to get there, having a crash proof
           | architecture is also a very practical approach.
        
           | bhaney wrote:
           | > Relying on crashing as a bug recovery system is a good
           | idea? Crashing is just part of the workflow? That's insane
           | 
           | Erlang users don't seem to agree with you
        
           | mindwok wrote:
           | They are still debugging and patching the causes. The crash
           | detection is just to try and prevent a single bug from
           | bringing down the whole system.
        
           | paulddraper wrote:
           | Cars are designed with airbags?!
           | 
           | Like, they are _supposed_ to crash?!?
        
           | anarazel wrote:
           | We do fix crashes etc, even if the postgres manages to
           | restart.
           | 
           | I think the post upthread references an out-of-core extension
           | we don't control, which in turn depends on many external
           | libraries it doesn't control either...
        
           | fabian2k wrote:
           | A database has to handle situations outside its control, e.g.
           | someone cutting the power to the server. That should not
           | result in a corrupted database, and with Postgres it doesn't.
           | 
           | The fundamental problem is that when you're sharing memory,
           | you cannot safely just stop a single process when
           | encountering an unexpected error. You do not know the current
           | state of your shared data, and if it could lead to further
           | corruption. So restarting everything is the only safe choice
           | in this case.
        
           | rtpg wrote:
           | We don't want stuff to crash. But we also want data integrity
           | to be maintained. We also want things to work. In a world
           | with extensions written in C to support a lot of cool things
           | with Postgres, you want to walk and chew bubblegum on this
           | front.
           | 
           | Though to your point, a C extension can totally destroy your
           | data in other ways, and there are likely ways to add more
           | barriers. And hey, we should fix bugs!
        
         | dan-robertson wrote:
         | Is the actual number you got 99%? Seems low to me but I don't
         | really know about Postgres. That's 3 and a half days of
         | downtime per year, or an hour and a half per week.
        
           | dfox wrote:
           | Well, hour and half per week is the amount of downtime that
           | you need for modestly sized database (units of TB) accessed
           | by legacy clients that have ridiculously long running
           | transactions that interfere with autovacuum.
        
         | zeroimpl wrote:
         | However, it's already the case that if a postgres process
         | crashes, the whole cluster gets restarted. I've occasionally
         | seen this message:                   WARNING: terminating
         | connection because of crash of another server process
         | DETAIL: The postmaster has commanded this server process to
         | roll back the current transaction and exit, because another
         | server process exited abnormally and possibly corrupted shared
         | memory.         HINT: In a moment you should be able to
         | reconnect to the database and repeat your command.         LOG:
         | all server processes terminated; reinitializing
        
           | niccl wrote:
           | yes, but postmaster is still running to roll back the
           | transaction. If you crash a single multi-threaded process,
           | you may lose postmaster as well and then sadness would ensue
        
             | jtc331 wrote:
             | If you read the thread you'd see the discussion includes
             | still having e.g. postmaster as a separate process.
        
             | mattashii wrote:
             | The threaded design wouldn't necessarily be single-process,
             | it would just not have 1 process for every connection.
             | Things like crash detection could still be handled in a
             | separate process. The reason to use threading in most cases
             | is to reduce communication and switching overhead, but for
             | low-traffic backends like a crash handler the overhead of
             | it being a process is quite limited - when it gets
             | triggered context switching overhead is the least of your
             | problems.
        
               | Yoric wrote:
               | Seconded. For instance, Firefox' crash reporter has
               | always been a separate process, even at the time Firefox
               | was mostly single-process, single-threaded. Last time I
               | checked, this was still the case.
        
             | cyberax wrote:
             | PostgreSQL can recover from abruptly aborted transactions
             | (think "pulled the power cord") by replaying the journal.
             | This is not going to change anyway.
        
             | cogman10 wrote:
             | Transaction roll back is a part of the WAL. Databases write
             | to the disk an intent to change things, what should be
             | changed, and a "commit" of the change when finished so that
             | all changes happen as a unit. If the DB process is
             | interrupted during that log write then all changes
             | associated with that transaction are rolled back.
             | 
             | Threaded vs process won't affect that.
        
               | dfox wrote:
               | Running the whole DBMS as a bunch of threads in single
               | process changes how fast is the recovery from some kind
               | of temporary inconsistency. In the ideal world, this
               | should not happen, but in reality it does and you do not
               | want to bring the whole thing down because of some
               | superficial data corruption.
               | 
               | On the other hand, all cases of fixable corrupted data in
               | PostgreSQL I have seen were result of somebody doing
               | something totally dumb (rsyncing live cluster, even
               | between architectures), while on InnoDB it seems to
               | happen somewhat randomly without any obvious reason of
               | somebody doing stupid things.
        
             | anarazel wrote:
             | We would still have a separate process doing that part of
             | postmaster's work.
        
             | tracker1 wrote:
             | You can still have a master control process separate from
             | the client connections.
        
             | moonchrome wrote:
             | Restart on crash doesn't sound that difficult to do.
        
           | lelanthran wrote:
           | > However, it's already the case that if a postgres process
           | crashes, the whole cluster gets restarted. I've occasionally
           | seen this message:
           | 
           | Sure, but the blast radius of corruption is limited to that
           | shared memory, not all the memory of all the processes. You
           | can at least use the fact that a process has crashed to
           | ensure that the corruption doesn't spread.
           | 
           | (This is why it restarts: there is no guarantee that the
           | shared memory is valid, so the other processes are stopped
           | before they attempt to use that potentially invalid memory)
           | 
           | With threads, _all_ memory is shared memory. A single thread
           | that crashes can make other threads data invalid _before_ the
           | detection of the crash.
        
       | BoardsOfCanada wrote:
       | I recently looked through the source code of postgresql and every
       | source files starts with a (really good) description of what the
       | file is supposed to do, which made it really easy to get in to
       | the code compared to other open source projects I've seen. So
       | thanks for that.
        
         | nonethewiser wrote:
         | Uncle Bob hates this.
        
         | Alekhine wrote:
         | I have no idea why that isn't standard practice in every
         | codebase. I should be able to figure out your code without
         | having to ask, or dig through issues or commit messages. Just
         | tell me what it's for!
        
           | nologic01 wrote:
           | the average programmer thinks they are writting significantly
           | above average clean code, so no need to document it :-)
        
           | ComputerGuru wrote:
           | It kind of is in rust now, with module-level documentation
           | given its own specific AST representation instead of just
           | being a comment at the top of the file (a file is a module).
        
           | SoylentOrange wrote:
           | Because it takes a lot of time and because the comments can
           | get outdated. I also want this for all my code bases. But do
           | I always do this myself? No, especially on green field
           | projects. I will sometimes go back and annotate them later.
        
             | mhh__ wrote:
             | They can get outdated but they usually don't. It's a good
             | litmus test for if a file is too big / small if it's
             | purpose is hard to nail down.
        
             | withinboredom wrote:
             | Even outdated comments can tell you the original purpose of
             | the code, which helps if you're looking for a bug.
             | Especially if you're looking for a bug.
             | 
             | If someone didn't take the time to update the comments and
             | the reviewers didn't point it out, then you've probably
             | found the bug because someone was cowboying some shitty
             | code.
        
               | alex_reg wrote:
               | I have the opposite experience.
               | 
               | Outdated comments are often way worse than no comments,
               | because they can give you wrong ideas that aren't true
               | anymore, and send you off in the wrong direction before
               | you finally figure out the comment was wrong.
        
               | elteto wrote:
               | Indeed. I recently found this piece of code:
               | if (X) assert(false); // we never do X, ever, anywhere.
               | 
               | Then I look over to the other pane, where I have a
               | different, but related file open:                   if
               | (exact same X) { do_useful_stuff(); }
               | 
               | It got a chuckle out of me.
        
               | jgilias wrote:
               | Did you update the comment? :-)
        
               | withinboredom wrote:
               | // there are two kinds of mutually exclusive commentors
               | 
               | enum kinds { writers; readers; updaters; }
        
             | akira2501 wrote:
             | Trying to understand what I previously wrote and why I
             | wrote it takes more time than I ever care to spend. I'd
             | much rather have the comments, plus at this point, by
             | making them a "first class" part of my code, I find them
             | much easier to write and I find the narrative style I use
             | incredibly useful in laying out a new structure but also in
             | refactoring old ones.
        
       | cmrdporcupine wrote:
       | It sounds like the specific concerns here are actually around
       | buffer pool management performance in and around the TLB: _" Once
       | you have a significant number of connections we end up spending a
       | *lot* of time in TLB misses, and that's inherent to the process
       | model, because you can't share the TLB across processes. "_
       | 
       | Many of the comments here seem to be missing this and talking
       | about CPU-boundedness generally and thread-per-request vs process
       | etc models, but this seems orthogonal to that, and is actually
       | quite specific about the VM subsystem and seems like a legitimate
       | bottleneck with the approach Postgres has to take for buffer/page
       | mgmt with the process model it has now.
       | 
       | I'm no Postgres hacker (or a Linux kernel hacker), and I only did
       | a 6 month stint doing DB internals, but it _feels_ to me like
       | perhaps the right answer here is that instead of Postgres getting
       | deep down in the weeds refactoring and rewriting to a thread
       | based model -- with all the risks in that that people have
       | pointed out -- some assistance could be reached for by working on
       | specific targeted patches in the Linux kernel?
       | 
       | The addition of e.g. userfaultfd shows that there is room for
       | innovation and acceptance of changes in and around kernel re:
       | page management. Some new flags for mmap, shm_open, etc. to
       | handle some specific targeted use cases to help Postgres out?
       | 
       | Also wouldn't be the first time that people have done custom
       | kernel patches or tuning parameters to crank performance out of a
       | database.
        
         | thesnide wrote:
         | Exactly my thinking. If the problem is TLB evictions, why not
         | improve these? PG isn't the only software that is hit by those.
         | 
         | And if you carefully craft your CPU scheduling and address
         | spaces mapping, you can reduce them by a lot.
         | 
         | Yet rewrites are always easier and more sexy. At first.
        
           | anarazel wrote:
           | The TLB issue are more a hardware issue than a software/OS
           | one. To my knowledge neither x86 nor arm provide a way to
           | _partially_ share TLB contents between processes /entities.
           | Tlb entries can be tagged with a process context identifier,
           | but that's an all or nothing thing. Either the entire address
           | space is shared, or it's not.
           | 
           | > Yet rewrites are always easier and more sexy. At first.
           | 
           | Moving to threads would not at all be a rewrite.
        
       | eclipticplane wrote:
       | For the record, I think this will be a disaster.  There is far
       | too much         code that will get broken, largely silently, and
       | much of it is not         under our control.             regards,
       | tom lane
       | 
       | (via https://lwn.net/ml/pgsql-
       | hackers/4178104.1685978307@sss.pgh....)
       | 
       | If Tom Lane says it will be a disaster, I believe it will be a
       | disaster.
        
         | abhibeckert wrote:
         | Reminds me of PHP 6...
         | 
         | For those who don't follow PHP closely - that version was an
         | attempted refactor of the string implementation which
         | essentially shut down nearly all work on PHP for a decade,
         | stagnating the language until it became pretty terrible
         | compared to other options. They finally gave up and started
         | work on PHP 7 which uses the (perfectly good) PHP 5 strings.
         | 
         | Ten years of wasted time by the best internal PHP developers
         | crippled the project - I'm amazed it survived at all.
        
           | progmetaldev wrote:
           | I've used PHP in the past (PHP 4 and 5), as well as some
           | simple templated projects in PHP 7. I try to keep up on news
           | with what is happening in the PHP world, and it's difficult
           | because of the hate for the language. Is the solution to
           | Unicode strings still to just use the "mb_*" functions?
           | 
           | I got my real professional start using PHP, and have built
           | even financial systems in the language (since ported to .NET
           | 6 for my ease of maintenance, and better number handling).
           | I'm still very interested in the language itself, in case I
           | ever have the need to freelance or provide a solution to a
           | client that can't afford what I can build in .NET (although
           | to be honest, at this point I'm roughly able to code at the
           | same speed in .NET as in PHP, but with the added type-safety,
           | although I know PHP has really stepped up in providing this).
        
             | Hayvok wrote:
             | I believe so - most (all?) string functions have an mb_
             | equivalent, for working on multibyte strings.
             | 
             | Regular PHP strings are actually pretty great, since you
             | can treat them like byte arrays. Fun fact: PHPs streaming
             | API has an "in-memory" option and it's... just a string
             | under the hood.
             | 
             | Just don't forget to use multibyte functions when you're
             | handling things like user input.
        
           | robomc wrote:
           | I have the "Professional PHP6" book which I feel like should
           | be a collectors item or something.
           | 
           | Weird book IMO, because it has _a lot_ of content that 's
           | just about general software development, rather than anything
           | to do with PHP specifically, or the theoretical PHP6 APIs in
           | particular.
        
             | pmontra wrote:
             | PHP used to be the first computer language learned by
             | people wanting to create a scripted web page. This was more
             | true in the 90s but maybe it stuck. So it would be OK to
             | add some general guidance about writing software and
             | organizing projects.
        
           | SoftTalker wrote:
           | _Things You Should Never Do_
           | 
           | https://www.joelonsoftware.com/2000/04/06/things-you-
           | should-...
           | 
           | An oldie but a goodie
        
             | h0l0cube wrote:
             | > Well, yes. They did. They did it by making the single
             | worst strategic mistake that any software company can make:
             | 
             | > They decided to rewrite the code from scratch.
             | 
             | Absolutely not what's being proposed for Postgres.
        
               | yxre wrote:
               | Process isolation affects so many things in C. The
               | strategy change is going to require changes to so many
               | modules that it will either be a re-write or buggy.
               | 
               | In practical terms, if every line needs to be audited and
               | updated, it is a re-write
        
               | h0l0cube wrote:
               | > either be a re-write or buggy
               | 
               | A large refactor at best. It will touch lots of parts of
               | the code base, but the vast majority of the source code
               | would remain intact. Otherwise they could just Rewrite it
               | in Rust(tm) while they're at it
               | 
               | > if every line needs to be audited and updated, it is a
               | re-write
               | 
               | I'm not sure why you believe every line needs to be
               | updated. Most code is thread agnostic.
        
               | anarazel wrote:
               | What makes you think that it will require that many
               | changes? There will be some widespread mechanical changes
               | (which can be verified to be complete with a bit of low
               | level work, like a script using objdump/nm to look for
               | non-TLS mutable variables) and some areas changing more
               | heavily (e.g. connection establishment, crash detection,
               | signal handling, minor details of the locking code). But
               | large portions of the code won't need to change. Note
               | that we/postgres already shares a lot of state across
               | processes.
        
               | giraffe_lady wrote:
               | I'm not the person you asked and I don't have any
               | particular knowledge of postgres internals.
               | 
               | Experience with other systems has taught me that in a
               | system that's been in active use and development for
               | decades, entanglement will be deep, subtle, and
               | pervasive. If this isn't true of postgres then it's an
               | absolute freak anomaly of a codebase. It is that in other
               | ways, so it's possible.
               | 
               | But the article mentions there being thousands of global
               | variables. And Tom Lane himself says he considers it
               | untenable for exactly this reason. That's a _very_ good
               | reason to think that it will require that many changes
               | imo.
        
               | h0l0cube wrote:
               | > that many changes
               | 
               | The 'that many changes' in question is a complete
               | rewrite. Many changes across many files, yes, but nothing
               | even approaching a rewrite.
               | 
               | > I don't have any particular knowledge of postgres
               | 
               | Judging by their bio, the person you're replying to does.
        
             | bandrami wrote:
             | The lesson that "cruft is problems someone solved before
             | you" is unfortunately entirely lost on most devs today
        
             | HeavyStorm wrote:
             | Love this article. Completely changed the way I think about
             | certain projects.
        
             | IgorPartola wrote:
             | I think this is a great article that takes a maximalist
             | point and that's its flaw.
             | 
             | You should rewrite code only when the cost of adding a new
             | feature (one that is actually necessary) to the old
             | codebase becomes comparable to designing your entire system
             | from scratch to allow for that feature to be added easily.
             | That is to say that the cost of the rewrite should become
             | comparable to the cost of continuing development. I have
             | been a part of a couple of rewrites like that, one of them
             | quite complex, and yes they were warranted and yes they
             | worked.
             | 
             | But having said that you should absolutely be conservative
             | with rewriting code. It's a bad habit to always jump to a
             | rewrite.
        
               | devjab wrote:
               | I think it's very dependent on how you use words like
               | "rewrite" or "refactor". The point the author makes about
               | the two page function, and all the bug-fixes (lessons
               | learned) makes sense only if you "rewrite" from scratch
               | without looking at the history. You can absolutely
               | "rewrite" the function in a manner that is "refactoring",
               | but will often get called "rewrite" in the real world.
               | This may be because "refactor" is sort of this English CS
               | term that doesn't have a real translation or usage in
               | many languages and "rewrite" is sort of universal for
               | changing text, but in CS is sort of "rebuilding" things.
               | 
               | I don't think you necessarily need to be conservative
               | about rewriting things. We do it all the time in fact. We
               | build something to get it out there and see the usage,
               | and then we build it better and then we do it again.
               | Which often involves a lot of "rewriting" but thanks to
               | principles like SOLID's single responsibility makes this
               | rather easy to both do and maintain (we write a lot of
               | semi-functional code and try to avoid using OOP unless
               | necessary, so we don't really use all the parts of SOLID
               | religiously).
               | 
               | I do agree that it's never a good idea to get into things
               | with the mind-set of "we can do this better if we start
               | from scratch" because you can't.
        
               | giraffe_lady wrote:
               | There's currently a trend towards shitting on
               | microservices-everything, imo largely justified. But
               | missing from that is that identifying a logical feature
               | and moving it to a microservice is one of the safer ways
               | to begin a gradual rewrite of a critical system. _And_
               | usually possible to get cross-dept buyin for various not
               | always wholesome reasons. It may not always be the best
               | technical solution but it 's often the best political one
               | when a rewrite is necessary.
        
             | pphysch wrote:
             | The issue here isn't "rewriting" per se but "stopping
             | development".
             | 
             | You shouldn't stop development on your important products.
             | 
             | Letting a couple of your talented programmers loose on a
             | greenfield reimplementation is a perfectly sane strategic
             | move.
             | 
             | Stopping development on important products because you are
             | 100% certain that the reimplementation will be successful
             | by $DEADLINE is a foolish gamble.
        
               | devjab wrote:
               | Isn't that how you end up with Python 2 and 3 though?
        
               | pphysch wrote:
               | Yes, that was a rough migration process, but the long-
               | term result is we have an improved language and growing
               | community instead of Python going the way of PHP and
               | Perl.
               | 
               | Between the 3 P's, Python's strategic decisions in 2000s
               | were clearly the most successful.
               | 
               | And it wasn't a total rewrite.
        
               | wiz21c wrote:
               | Python3 changes were many "little" things; some more
               | fundamental than other (unicode str). So I guess they
               | were able to split the work in tiny pieces and,
               | ultimately, were able to manage the project...
        
               | paulddraper wrote:
               | The problem is if it snowballs
        
               | pphysch wrote:
               | Any project, whether it's a "rewrite" or not, can succumb
               | to scope creep and poor project management.
        
               | hurril wrote:
               | The big problem there is that the people you are letting
               | loose on the alternative, are lost from the original, so
               | O loses steam that A gains. You still have to produce bug
               | fixes and features to _both_ O and A to keep them in
               | sync. So you essentially have a doubled required
               | production rate to be delivered using the same staff.
               | 
               | So in order for there to be a net gain, the gang working
               | on the alternative have to be able to find such big wins
               | as to being neigh impossible.
               | 
               | This is a very very hard problem in our domain. 99% of
               | the time, we have to simply resist the urge to _just
               | rewrite the sucker_. No! Don't do it! (And this is
               | incredibly hard because we all want to.)
        
               | pphysch wrote:
               | Getting some _Mythical Man Month_ vibes here.
               | Productivity isn 't a zero-sum game.
        
           | zilti wrote:
           | Implying it had ever not been terrible compared to other
           | options
        
           | kqr wrote:
           | On the other hand, there's also the case of the Lunar Module
           | guidance software that was hard-coded to run exactly every
           | two seconds. If the previous subroutine call was still
           | running when the next one was due, the previous one was
           | harshly terminated (with weird side effects).
           | 
           | One of the main programmers suggested making it so that the
           | next guidance routine wouldn't run until the previous one was
           | done. This would make the code less sensitive to race
           | conditions and allow more useful functionality for the pilots
           | (who were the actual users and did seem to want it). However
           | everyone assumed the two-second constant was implicitly
           | embedded everywhere.
           | 
           | It wasn't -- only in a few places -- and with that fixed the
           | code got more general and the proof of concept ran better
           | than ever in about every simulator available. The amount of
           | control it gave pilots was years ahead of the curve. But it
           | never got a chance to fly on a real mission because what was
           | there was "good enough" and nobody bothered to try.
           | 
           | In our combined comments there's a lesson about growing
           | experiments and figuring out how to achieve failure quickly.
        
         | idiomaticrust wrote:
         | He is right. Such rewrites cause a lot of problems if your
         | compiler doesn't help you with avoiding data races.
         | 
         | But there is another way.
        
           | zilti wrote:
           | Indeed, Zig is a nice language for this
        
           | hnarn wrote:
           | > But there is another way.
           | 
           | Ok?
        
             | mycall wrote:
             | Microsoft SQL Server has SQLOS which is another way [0].
             | 
             | [0] https://www.thegeekdiary.com/what-is-sql-server-
             | operating-sy...
        
             | carstenhag wrote:
             | The person probably implied that Postgres should switch to
             | another toolchain that guarantees more things at compile
             | time, so probably Rust.
        
               | bb88 wrote:
               | You can take a chunk of code and just rewrite it in Rust.
               | You'll learn a lot quickly by this.
        
               | steve_adams_86 wrote:
               | It's sort of like the inverse of the Matrix when Neo
               | learns kung fu. You realize that you actually don't know
               | how to program :)
        
               | tylerhou wrote:
               | The boundaries within database code are not clear. There
               | are too many interlocking parts to take a nontrivial
               | chunk and rewrite it Rust.
        
               | blincoln wrote:
               | If the existing code is old-school enough to use
               | thousands of global variables in a thread-unsafe way,
               | seems like changing it enough to compile as safe Rust
               | code would push the "non-trivial" envelope pretty far.
        
             | chc wrote:
             | I think it's meant to imply the solution given in their
             | username ("idiomatic Rust").
        
               | lelanthran wrote:
               | > I think it's meant to imply the solution given in their
               | username ("idiomatic Rust").
               | 
               | I think "Idiom: a tic (Rust)" can also fit if I squint
               | hard enough and decide it looks like a definition from an
               | online dictionary :-)
        
             | avgcorrection wrote:
             | Don't mind the gimmick gallery (username).
        
           | [deleted]
        
         | lenkite wrote:
         | Feel like the PostgreSQL Core Team should just build a new
         | database from scratch using what they have learned from
         | experience instead of attempting such a fundamental
         | architectural migration. It would give them more freedom to
         | change things also. Call it "postgendb" and provide a data
         | migrator.
        
           | rdevsrex wrote:
           | That's a great idea. I've been considering whether or not to
           | use Cockroach Db at work, and I love the fact that it's
           | distributed from the get go.
           | 
           | Why not work on something like that instead of changing
           | something that works? Especially since they the process model
           | really only runs into trouble on large systems.
        
         | kristiandupont wrote:
         | Yeah.
         | 
         | Without being familiar with the Postgres source, this seems to
         | be what I call a "somersault problem": hard to break down into
         | sub-goals. I have heard that the Postgres codebase is solid
         | which makes it easier but it's still mature and highly complex.
         | It doesn't sound feasible to me.
         | 
         | https://kristiandupont.medium.com/somersault-problems-69c478...
        
           | seedless-sensat wrote:
           | The original post does describe several sub-problems. The
           | group could first chip away at global state, signals,
           | libraries. They can do this before changing the process model
           | in any way.
        
             | kristiandupont wrote:
             | Good point.
        
         | osigurdson wrote:
         | Heikki Linnakangas has a good understanding of Postgres as well
         | however. We all want Postgres to be competitive with numbers of
         | connections, don't we?
        
         | gremlinsinc wrote:
         | Maybe a better option would be finding a team to create nugres,
         | aka a fork for this and other experiments. So that mainline
         | remains stable.
        
           | mattashii wrote:
           | There are several forks of PostgreSQL, in various levels of
           | license, additional features and activity. However,
           | maintaining a fork in addition to a main project is
           | inherently more expensive than maintaining just a single
           | project, so adding features to new major releases of the main
           | project is generally preferred over forking every release
           | into its own, newly named, project. After all, that is what
           | we have major (feature) releases and stabalization windows
           | (beta releases) for.
        
             | j16sdiz wrote:
             | This won't work well for a multiyear project.. Either you
             | have to stall the release process, divide it into smaller
             | parts or fork.
        
         | datavirtue wrote:
         | This should be considered a research effort, assuming it will
         | be a complete rewrite. In light of that, you should not draw
         | down resources from the established code base to work on it.
         | 
         | Ignoring the above, first state the explicit requirements
         | driving this change and let people weigh in on those. This
         | sounds like a geeky dev itch.
        
         | duped wrote:
         | That's an awful message with the only sensible reply.
        
         | kccqzy wrote:
         | I don't expect you or others to buy into any particular code
         | change at          this point, or to contribute time into it.
         | Just to accept that it's a          worthwhile goal. If the
         | implementation turns out to be a disaster, then          it
         | won't be accepted, of course. But I'm optimistic.
         | 
         | The reply is much more reasonable than this blanket assertion
         | of a disaster.
        
           | giraffe_lady wrote:
           | As an outsider it doesn't sound like something a few people
           | could spin off in a branch in a couple months and see how
           | code review goes. They're talking about doing it over
           | multiple (yearly?) releases. It seems like it'll take a lot
           | of expert attention, which won't be available for other work
           | and the changes themselves will impact all other ongoing
           | work.
           | 
           | I'm not trying to naysay it per se, bc again I don't have
           | technical knowledge of this codebase. But that's exactly the
           | sort of scenario that can cause a large project to splinter
           | or stall for years. Talking about "the implementation" absent
           | the context that would be necessary to _create_ that
           | implementation seems naively optimistic, or at worst
           | irresponsible.
        
             | lbriner wrote:
             | You are talking about implementation, the OP was talking
             | about raising the concept with interested parties and
             | seeing whether it is worth even _starting_ to think about
             | it.
             | 
             | They could fork, they could add threading to some sub
             | systems and roll it out over several versions.
             | 
             | I don't know enough about the code but, of course, it is a
             | hard problem but the solution might be to build it from the
             | ground up as a threaded system, using the skills learned
             | over 30 years and taking the hit on the rebuild instead of
             | reworking what is there.
             | 
             | I am most interested because I didn't realise there was a
             | performance problem in the first place.
        
               | axman6 wrote:
               | Am I going crazy, or has the obvious implementation of
               | such a change been missed on people? If they were
               | proposing taking a multi-threaded app and splitting it
               | into a multi-process one, I would predict they would find
               | a hell of a lot of unexpected or unknown implicit
               | communication between threads, which would be a nightmare
               | to untangle.
               | 
               | Going the other way, there is an extremely well
               | understood interface between all the processes which run
               | in isolation: shared memory. Nearly by definition this
               | must be well coordinated between the processes.
               | 
               | So the first step in moving to a multi-threaded
               | implementation would be to change nearly nothing about
               | each process, and then just run each process in its own
               | pthread, keeping all the shared memory 'n all.
               | 
               | You would expect performance to be about the same, maybe
               | a little better with the reduces TLB churn, but the
               | architecture is basically unchanged. At that point, you
               | can start to look at what are more appropriate
               | communication/synchronisation mechanisms now you're
               | working in the same address space.
               | 
               | I just don't understand why so many people seem to think
               | this requires an enormous rewrite - having developed as a
               | multi-process system means you've had to make so much of
               | the problematic things explicit and control for them, and
               | none of these threads would know anything at all about
               | each other's internals.
        
       | jasonhansel wrote:
       | I'm rather surprised that their focus is on improving vertical
       | scalability, rather than on adding more features for scaling
       | Postgres horizontally.
        
         | tracker1 wrote:
         | If you're more interested in horizontal scaling, you may want
         | to look into CockroachDB, which has a Postgres compatible
         | protocol, but still quite different. There are a lot more
         | limitations with CDB over Pg though.
         | 
         | With the changes suggested, I'm not sure it's the best idea
         | from where Postgres is... if might be an opportunity to rewrite
         | bits in Rust, but even then, there is a _LOT_ that can go
         | wrong. The use of shared memory is apparently already in place,
         | and the separate process and inter-process communication isn 't
         | the most dangerous part... it's the presumption, variables and
         | other contextual bits that are currently process globals that
         | wouldn't be in the "after" version.
         | 
         | The overall surface is just massive... That doesn't even get
         | into plugin compatibility.
        
       | mynonameaccount wrote:
       | "the benefits would not justify the cost". PostgreSQL, like any
       | software, at some point in it's life need to be refactored. Why
       | not refactor with a thread model. Of course there will be bugs.
       | Of course it will be difficult. But I think it is a worthwhile
       | endeavor. Doesn't sound like this will happen but a new project
       | would be cool.
        
         | timtom39 wrote:
         | > like any software, at some point in it's life need to be
         | refactored.
         | 
         | This is simply not true for most software. Software has a
         | product life cycle like everything else and major
         | refactors/rewrites should be weighed carefully against
         | cost/risk of the refactor. Many traditional engineering fields
         | do much better at this analysis.
         | 
         | Although, because I run a contracting shop, I have personally
         | profited greatly by clients thinking this is true and being
         | unable to convince them otherwise.
        
         | smsm42 wrote:
         | "Difficult" doesn't even begin to do it justice. Making a code
         | which has 2k global variables and probably order of magnitude
         | as many underlying assumptions (the code should know that now
         | every time you touch X you may be influenced or influence all
         | other threads that may touch X) is a gargantuan task, and will
         | absolutely for sure involve many iterations which any sane
         | person would never let anywhere near valuable data (and how
         | long would it take until you'd consider it safe enough?). And
         | making this all performant - given that shared-state code
         | requires completely different approach to thinking about
         | workload distribution, something that performs when running in
         | isolated processes may very well get bogged down in locking or
         | cache races hell when sharing the state - would be even harder.
         | I am not doubting Postgres has some very smart people - much,
         | much smarter than me, in any case - but I'd say it could be
         | more practical to write new core from scratch than trying to
         | "refactor" the core that organically grew for decades with
         | assumptions of share-nothing model.
        
         | djur wrote:
         | What you're talking about is a rewrite, not a refactor.
        
         | gremlinsinc wrote:
         | a better option would just create an experimental fork that has
         | a different name and is obviously a different product but based
         | on the original source. That way pg gets updates and remains
         | stable and if they fail, they fail and it doesn't hurt all the
         | pg in production.
        
       | wielebny wrote:
       | Having been using and administering a lot of PostgreSQL servers,
       | I hope they don't lose any stability over this.
       | 
       | I've seen (and reported) bugs that caused panics/segfaults in
       | specific psql processes. Not just connections, also processes
       | related to wal writing or replication. The way it's built right
       | now, a child process can be just forced to quit and it does not
       | affect other processes. Hopefully switching into thread won't
       | force whole PostgreSQL to panic and shut down.
        
         | tracker1 wrote:
         | Most likely, the postmaster will maintain a separate process,
         | much like today with pg, or similar to Firefox or Chrome's
         | control process that can catch the panic'd process, cleanup and
         | restart them. The WAL can be recovered as well if there were
         | broken transactions in flight.
        
         | jtc331 wrote:
         | Because of shared memory most panics and seg faults in a worker
         | process take down the entire server already (this wasn't always
         | the case, but not doing so was a bug).
        
         | vbezhenar wrote:
         | Of course it will. That's better than continue working with
         | damaged memory structures and unpredictable consequences. For
         | database it's more important than ever. Imagine writing
         | corrupted data because other thread went crazy.
        
           | wizofaus wrote:
           | You're implying that only an OS can provide memory separation
           | between units of execution - at least in .NET AppDomains give
           | you the same protection within a single process, so why
           | couldn't postgres have its own such mechanism? I'd also think
           | with a database engine shared state is not just in-memory -
           | i.e. one process can potentially corrupt the behaviour of
           | another by what it writes to disk, so moving to a single-
           | process model doesn't necessarily introduce problems that
           | could never have existed previously (but, yes, would arguably
           | make them more likely)
        
             | vbezhenar wrote:
             | I don't know .NET enough to comment here, but I'm pretty
             | sure that if you would manage to run bare metal C inside
             | your .NET app (should be possible), it'll destroy all your
             | domains easily. RAM is RAM. The only memory protection that
             | we have is across process boundary (even that protection is
             | not perfect with shared memory, but at least it allows to
             | protect private memory).
             | 
             | At least I'm not aware of any way to protect private thread
             | memory from other threads.
             | 
             | Postgres is C and that's not going to change ever.
        
               | wizofaus wrote:
               | I certainly wasn't suggesting it would make sense to
               | rewrite Postgres to run on .NET (using any language, even
               | managed C++, assuming anyone still uses that). Yes, it's
               | inherent in the C/C++ language that it's able to randomly
               | access any memory that a process has access to, and
               | obviously on that basis OS-provided process-separation is
               | the "best" protection you can get, just pointing out that
               | it's not the only possibility.
        
             | szundi wrote:
             | For a decades old codebase probably only the OS can.
             | 
             | Point is it getting worse if this is changed.
        
             | SigmundA wrote:
             | No AppDomains are not as good as processes, I have tried to
             | go that route before, you cannot stop unruly code reliably
             | in an app domain (you must use thread.abort() which is not
             | good) and memory can still leak in any native code used
             | there.
             | 
             | The only reliable way to stop bad code like say an infinite
             | loop is to run in another process even in .Net.
             | 
             | They also removed Appdomain in later versions of .Net
             | because they had little benefit and weak protections
             | compared to a a full process.
        
               | wizofaus wrote:
               | Not claiming they're as good, just noting that there are
               | alternative ways to provide memory barriers, though
               | obviously if it's not enforced at the language/runtime
               | level, it requires either super strong developer disciple
               | or the use of some other tool to do so. I can't find
               | anything suggesting AppDomains have been removed
               | completely though, just they're not fully supported on
               | non-Windows platforms, which is interesting, I wonder if
               | that means they do have OS-level support.
        
               | SigmundA wrote:
               | https://learn.microsoft.com/en-
               | us/dotnet/api/system.appdomai...
               | 
               | "On .NET Core, the AppDomain implementation is limited by
               | design and does not provide isolation, unloading, or
               | security boundaries. For .NET Core, there is exactly one
               | AppDomain. Isolation and unloading are provided through
               | AssemblyLoadContext. Security boundaries should be
               | provided by process boundaries and appropriate remoting
               | techniques."
               | 
               | AppDomains pretty much only allowed you to load unload
               | assemblies and provided little else. If you wanted to
               | stop bad code you still used Thread.Abort which left your
               | runtime in a potentially bad state due to no isolation
               | between threads.
               | 
               | The only way to do something like an AppDomain to replace
               | process isolation would be to re-write the whole OS in a
               | memory safe language similar to
               | https://en.wikipedia.org/wiki/Midori_(operating_system) /
               | https://en.wikipedia.org/wiki/Singularity_(operating_syst
               | em)
        
               | wizofaus wrote:
               | Is that saying global variables are shared between
               | AppDomains on .NET core then? Scary if so, we have a
               | bunch of .NET framework code we're looking at porting to
               | .NET core in the near future, and I know it relies on
               | AppDomain separation currently. It's not the first
               | framework->Core conversation I've done, but I don't
               | remember changes in AppDomain behaviour causing any
               | issues the first time.
               | 
               | As it happens I already know there are bits of code
               | currently not working "as expected" exactly because of
               | AppDomain separation - i.e. attempting to use a shared-
               | memory cache to improve performance and in one or two
               | cases in an attempt to share state, and I got the
               | impression whoever wrote that code didn't understand that
               | there even were two AppDomains involved, and used various
               | ugly hacks to "fall back" to alternative means of state-
               | sharing, but in fact the fall-back is the only thing that
               | actually ever works.
        
               | electroly wrote:
               | > Is that saying global variables are shared between
               | AppDomains on .NET core then?
               | 
               | No, you can't create a second AppDomain _at all_.
               | AppDomains are dead and buried; you would need to remove
               | all of that from your code in order to migrate to current
               | .NET. The class only remains to serve a couple ancillary
               | functions that don 't involve actually creating
               | additional AppDomains.
        
               | wizofaus wrote:
               | We're not creating them ourselves, they're created by
               | IIS.
        
             | dikei wrote:
             | .NET is a managed-language with a VM. In such language, a
             | memory error in managed-code will often trigger a jump back
             | to the VM, where they can attempt to recover from there.
             | 
             | For native code, there's no such safety net. Likewise, even
             | for managed language, an error in the interpreter code will
             | still crash the VM, since there's nothing to fallback to
             | anymore.
        
               | wizofaus wrote:
               | True, if you're talking unrestricted native code, I'd
               | essentially agree with the OP's implication that only the
               | OS (and the CPU itself) is capable of providing that sort
               | of memory protection. I guess I was just wondering what
               | something like AppDomains in C might even look like (e.g.
               | all global variables are implicitly "thread_local"), and
               | how much could be done at compile-time using tools to
               | prevent potentially "dangerous" memory accesses. I've
               | never looked at the postgres source in any detail so I'm
               | likely underestimating the difficulty of it.
        
         | eastern wrote:
         | 100%. Same here. There's a lot of baby in the processes, not
         | just bathwater.
         | 
         | As a longstanding PG dev/DBA who doesn't know much about its
         | internals, I would say that they should just move connection
         | pooling into the main product.
         | 
         | Essentially, pgbouncer should be part of PG and should be able
         | to manage connections with knowledge of what each connections
         | is doing. That, plus, some sort of dynamic max connection
         | setting based on what's actually going on.
         | 
         | That'll remove almost all the dev/DBA pain from separate
         | processes.
        
       | [deleted]
        
       | rbancroft wrote:
       | Changing something so fundamental seems like it should be a
       | rewrite.
        
       | MR4D wrote:
       | Changing the entire architecture of PG to suit the 0.1% of edge
       | cases seems like a poor trade off.
       | 
       | Are there a much larger percentage of users that really need
       | this?
        
       | doctor_eval wrote:
       | I know I'm probably being naive about this, but is it stupid to
       | ask if there's a way to make multi process work better on Linux -
       | rather than "fixing" PG?
       | 
       | I feel like the thread vs process thing is one of those
       | pendulums/fads that comes and goes. I'd hate to see PG go down a
       | rabbit hole only to discover the OS could be modified to make
       | things go better.
       | 
       | (I understand not all PG instances run on Linux, just using it as
       | an example)
        
         | chmod775 wrote:
         | > I feel like the thread vs process thing is one of those
         | pendulums/fads that comes and goes.
         | 
         | In this context threads can be understood as processes that
         | share the same address space and vice-versa processes as
         | threads with separate address space.
         | 
         | One gives you isolation, the other convenience and performance.
         | Either can be desirable.
         | 
         | What would you change about this?
        
         | dikei wrote:
         | That'll likely be an even bigger task, and harder to get into
         | mainline kernel.
         | 
         | Linux multi-process is already pretty efficient compared to
         | Windows. However, multi-process is inherently less efficient
         | than multi-thread due to more safety predicates / isolation
         | guaranteed by the kernel, I feel lowering it might lead to more
         | security issues, similar to how Hyper Threading triggered a
         | bunch of issues with Intel Processors.
        
           | doctor_eval wrote:
           | Right - yeah I was really just wondering if some of the
           | safety predicates could be reduced when there is a
           | relationship between processes, such as the mitigations
           | against cache attacks. I think the cache misses caused by
           | multi-process were one of the reasons given that it's slower
           | than threading. But I don't understand why this is
           | necessarily the case given that the shared memory and
           | executable text ultimately refer to the same data. But I
           | suppose this would need to work with processor affinity and
           | other elements to prevent the cache being knocked around by
           | non-PG processes, and I guess this is one place where it
           | starts getting complicated.
           | 
           | That said, please understand that I'm just being curious - I
           | really don't know what I'm talking about, I haven't built a
           | Linux kernel or dabbled in Unix internals in like 20 years,
           | but thanks for replying :) Postgresql is my favourite open
           | source project and I'm spooked by the threading naysayers.
        
             | anarazel wrote:
             | The TLB is basically keyed by (address space, virtual
             | address % granularity), or needs to be flushed entirely
             | when switching between different views of the address space
             | (e.g. switching between processes). Unless your address
             | space is exactly the same, you're at least going to
             | duplicate TLB contents. Leading to a lower hit rate.
             | 
             | This isn't really an OS issue, more a hardware one,
             | although potential hardware improvements would likely have
             | to be explicitly utilized by operating systems.
             | 
             | Note that the TLB issue is different from the data /
             | instruction cache situation.
        
       | papito wrote:
       | This has Python 3 vibes.
        
       | newaccount74 wrote:
       | A big advantage of the process-based model is its resilience
       | against many classes of errors.
       | 
       | If a bug in PostgreSQL (or in an extension) causes the server to
       | crash, then only that process will crash. Postmaster will detect
       | the child process termination, and send an error message to the
       | client. The connection will be lost, but other connections will
       | be unaffected.
       | 
       | It's not foolproof (there are ways to bring the whole server
       | down), but it does protect against many error conditions.
       | 
       | It is possible to trap on some exceptions in a threaded
       | environment, but cleaning up after eg. an attempted NULL pointer
       | dereference is going to be very difficult or impossible.
        
         | anarazel wrote:
         | We would still have a separate supervisor process of we moved
         | connections to threads.
        
       | baggy_trough wrote:
       | I hope they are conservative about this, because even the
       | smartest and best programmers in the world cannot create bug free
       | multithreaded code.
        
         | jerf wrote:
         | I mentally snarked to myself that "obviously they should
         | rewrite it in Rust first".
         | 
         | Then, after more thought, I'm not entirely sure that would be a
         | bad approach. I say this not to advocate for actually rewriting
         | it in Rust, but as a way of describing how difficult this is.
         | I'm not actually sure rewriting the relevant bits of the system
         | in Rust _wouldn 't_ be easier in the end, and obviously, that's
         | really, really hard.
         | 
         | This is _really_ hard transition.
         | 
         | I don't think multithread code quality should be measured in
         | absolutes. There are things that are so difficult as to be
         | effectively impossible, which is the lock-based approach that
         | was dominant in the 90s, and convinced developers that it's
         | just impossible difficult, but it's not multithreaded code
         | that's impossibly difficult, it's lock-based multithreading.
         | Other approaches range from doable to even not that hard once
         | you learn the relevant techniques (Haskell's full immutability
         | & Rust's borrow checker are both very solid), but of course
         | even "not that hard" becomes a lot of bugs when scaled up to
         | something like Postgres. But it's not like the current model is
         | immune to that either.
        
         | xwdv wrote:
         | Nonsense, multithreaded code can be written as bug free as
         | regular code. No need to fear.
        
           | preordained wrote:
           | It _can_ be. Anything can be. It is far more treacherous,
           | though.
        
           | taeric wrote:
           | I think the point is that some mistakes in process based code
           | are not realized as the bugs that they will be in threaded
           | code?
        
           | dboreham wrote:
           | This is true. However, the blast radius may be smaller with a
           | process model. Also recovering from a fatal error in one
           | session could possibly be easier. I say this as a 30-year
           | threading proponent.
        
           | PhilipRoman wrote:
           | I'm assuming you're referring to formally proven programs. If
           | that's the case, do you have any pointers?
           | 
           | Aside from the trivial while(!transactionSucceeded){retry()}
           | loop, I have trouble proving the correctness of my programs
           | when the number of threads is not small and finite.
        
           | baggy_trough wrote:
           | In theory, yes. In practice, no.
        
           | ajkjk wrote:
           | It is just harder.
        
         | mmphosis wrote:
         | _Concurrency isn't a "nice layer over pthreads" - the most
         | important thing is isolation - anything that mucks up isolation
         | is a mistake.
         | 
         | -- Joe Armstrong_
         | 
         | Threads are evil. https://www.sqlite.org/faq.html#q6
         | https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-...
         | 
         | Nginx uses an asynchronous event-driven approach, rather than
         | threads, to handle requests.
         | https://aosabook.org/en/v2/nginx.html
         | http://www.kegel.com/c10k.html
        
         | jrott wrote:
         | The fact that they are planning on doing this across multiple
         | releases gives me hope that they'll be cautious with this.
        
         | johannes1234321 wrote:
         | The code already is multithreaded. They have shared state just
         | across multiple processes instead of threads within a process.
         | 
         | They might even reduce complexity that way.
        
           | usefulcat wrote:
           | It's not the same at all for global variables, of which pgsql
           | apparently has around a couple thousand.
           | 
           | If every process is single threaded, you don't have to
           | consider the possibility of race conditions when accessing
           | any of those ~2000 global variables. And you can pretty much
           | guarantee that little if any of the existing code was written
           | with that possibility in mind.
        
             | ants_a wrote:
             | Those global variables would be converted to thread locals
             | and most of the code would be oblivious of the change. This
             | is not the hard part of the change.
        
         | anarazel wrote:
         | Postgres is already concurrent today. There's a _lot_ of shared
         | state between the processes (via shared memory).
        
       | mastax wrote:
       | It would be interesting to have something between threads and
       | processes. I'll call them heavy-threads for sake of discussion.
       | 
       | Like light-threads, heavy-threads would share the same process-
       | security-boundary and therefore switching between them would be
       | cheap. No need to flush TLB, I$, D$.
       | 
       | Like processes, heavy-threads would have mostly-separate address
       | spaces by default. Similar to forking a process, they could share
       | read-only mappings for shared libraries, code, COW global
       | variables, and explicitly defined shared writable memory regions.
       | 
       | Like processes, heavy-threads would isolate failure states. A C++
       | exception, UNIX signal, segfault, etc. would kill only the heavy-
       | thread responsible.
        
         | mike_hearn wrote:
         | There are some problems.
         | 
         | 1. Mostly separate address spaces requires changing the TLB on
         | context switch (modern hw lets it be partial). You could use
         | MPKs to share a single address space with fast protection
         | switches.
         | 
         | 2. Threads share the global heap, but your heavy threads would
         | require explicitly defined shared writeable memory regions, so
         | presumably each one has its own heap. That's a fair bit of
         | overhead.
         | 
         | 3. Failure isolation is more complicated than deciding what to
         | kill.
         | 
         | The expand on the last point, Postgres _doesn 't_ isolate
         | failures to a single process because they do share memory and
         | might corrupt those shared memory regions. But even if you
         | don't have shared memory failure recovery isn't always easy.
         | Software has to be written specifically to plan for it. You can
         | kill processes because everything in the OS is written around
         | allowing for that possibility, for example, shells know what to
         | do if a sub-process is killed unexpectedly. Killing a heavy
         | thread (=process) is no good if the parent process is going to
         | wait for a reply from it forever because it wasn't written to
         | handle the process going away.
        
         | anarazel wrote:
         | > Like light-threads, heavy-threads would share the same
         | process-security-boundary and therefore switching between them
         | would be cheap. No need to flush TLB, I$, D$.
         | 
         | > Like processes, heavy-threads would have mostly-separate
         | address spaces by default. Similar to forking a process, they
         | could share read-only mappings for shared libraries, code, COW
         | global variables, and explicitly defined shared writable memory
         | regions.
         | 
         | I don't think you realistically can have separate address
         | spaces and not have TLB etc impact. If they're separate address
         | spaces, you need separate TLB entries => lower TLB hit ratio.
        
         | dasyatidprime wrote:
         | So what would be different between those and forked processes?
        
         | ShroudedNight wrote:
         | I've been pondering / ruminating with this too; I've been
         | somewhat surprised that few operating systems have played with
         | reserving per-thread address space as thread-local storage, or
         | requiring something akin to a 'far' pointer to access commonly-
         | addressed shared memory.
        
         | wbl wrote:
         | You cannot COW and share the TLB state. The caches aren't
         | flushed in process changes either: it's that the data is
         | different so evictions happen.
        
         | mattashii wrote:
         | > No need to flush TLB
         | 
         | TLB isn't "flushed" so much as it is useless across different
         | memory address spaces. Switching processes means switching
         | address spaces, which means you have to switch the contents of
         | the TLB to the new process' TLB entries, which eventually
         | indeed flushes the TLB, but that is only over time, not
         | necessarily the moment you switch processes.
         | 
         | > Like processes, heavy-threads would have mostly-separate
         | address spaces by default.
         | 
         | This thus conflicts with the need to not flush TLBs. You can't
         | not change TLB contents across address spaces.
        
       | lukeschlather wrote:
       | This sounds like a problem that would border on the complexity of
       | replacing the GIL in Ruby or Python. The performance benefits are
       | obvious but it seems like the correctness problems would be
       | myriad and a constant source of (unpleasant) surprises.
        
         | narrator wrote:
         | The correctness problem should be handled by a suite of
         | automated tests which PostgreSQL has. If all tests pass, the
         | application must work correctly. The project is too big, and
         | has too many developers to make much progress without full test
         | coverage. Where else would up-to-date documentation regarding
         | the correct behavior of PostgreSQL exist? In some developers
         | head? SQLite is pretty famous for there extreme approach to
         | testing including out of memory conditions, and other rare
         | circumstances: https://www.sqlite.org/testing.html
        
           | abalashov wrote:
           | > If all tests pass, the application must work correctly.
           | 
           | These are "famous last words" in many contexts, but when
           | talking about difficult-to-reproduce parallelism issues, I
           | just don't think it's a particularly applicable viewpoint at
           | all. No disrespect. :)
        
           | lukeschlather wrote:
           | Parallelism is often incredibly hard to write automated tests
           | for, and this will most likely create parallelism issues that
           | were not dreamed of by the authors of the test suite.
        
         | MuffinFlavored wrote:
         | Does GIL stand for Global Interpreter Lock?
        
           | Yujf wrote:
           | yes
        
         | dialogbox wrote:
         | Even the performance benefits are not big enough compare to the
         | GIL.
         | 
         | Biggest problem of the process model might be the cost of
         | having too many DB connections. Each client need a dedicated
         | server process. Memory usage and the context switching
         | overhead. Or if there is no connection pool, connection time
         | overhead is very high.
         | 
         | This problem has been well addressed with a connection pool. Or
         | having a middle ware instead of exposing the DB directly. That
         | works very well so far.
         | 
         | Oracle has been supporting the thread based model and it's been
         | usable for decades. I remember I tried the thread based
         | configuration option (MTS or shared server) in 1990s. But no
         | one likes that at least within my Oracle DBA network.
         | 
         | It would be a great research project but it would be a big
         | problem if the community pushs this too early.
        
         | cactusfrog wrote:
         | This is different because there isn't a whole ecosystem of
         | packages that depend on access to a thread unsafe C API.
         | Getting the GIL out of core Python isn't too challenging.
         | Getting all of the packages that depend on Python's C API
         | working is.
        
           | masklinn wrote:
           | An other component of the Gil story is that removing the Gil
           | require adding fine grained locks, which (aside from making
           | VM development more complicated) significantly increases lock
           | traffic and thus runtime costs, which noticeably impacts
           | single-threaded performance, which is of major import.
           | 
           | Postgres starts from a share-nothing architecture, it's quite
           | a bit easier to evaluate the addition of sharing.
        
             | bsder wrote:
             | > which noticeably impacts single-threaded performance,
             | which is of major import.
             | 
             | 1) I don't buy this a priori. Almost everybody who removed
             | a gigantic lock suddenly realizes that there was more
             | contention than they thought and that atomizing it made
             | performance improve.
             | 
             | 2) Had Python bitten the bullet and removed the GIL back at
             | Python 3.0, the performance would likely already be back to
             | normal or better. You can't optimize hypothetically.
             | Optimization on something like Python is an accumulation of
             | lots of small wins.
        
               | masklinn wrote:
               | > I don't buy this a priori.
               | 
               | You don't have to buy anything, that's been the result of
               | every attempt so far and a big reason for their
               | rejection. The latest effort only gained some traction
               | because the backers also did optimisation work which
               | compensated (and then was merged separately).
               | 
               | > Almost everybody who removed a gigantic lock
               | 
               | See that's the issue with your response, you're not
               | actually reading the comment you're replying to.
               | 
               | And the "almost" is a big tell.
               | 
               | > suddenly realizes that there was more contention than
               | they thought and that atomizing it made performance
               | improve.
               | 
               | There is no contention on the gil in single threaded
               | workloads.
               | 
               | > Had Python bitten the bullet and removed the GIL back
               | at Python 3.0
               | 
               | It would have taken several more years and been
               | completely DOA.
        
             | anarazel wrote:
             | Postgres already shares a lot of state between processes
             | via shared memory. There's not a whole lot that would
             | initially change from a concurrency perspective.
        
             | ComputerGuru wrote:
             | > which (aside from making VM development more complicated)
             | significantly increases lock traffic and thus runtime
             | costs, which noticeably impacts single-threaded
             | performance, which is of major import.
             | 
             | I don't think that's a fair characterization of the trade
             | offs. Acquiring uncontended mutexes is basically free (and
             | fairly side-effect free) so single-threaded performance
             | will not be noticeably impacted.
             | 
             | Every large C project I'm aware of (read: kernels) that has
             | publicly switched from coarse locks to fine-grained locks
             | has considered it to be a huge win with little to no impact
             | on single-threaded performance. You can even gain
             | performance if you chop up objects or allocations into
             | finer-grained blobs to fit your finer-grained locking
             | strategy because it can play nicer with cache friendliness
             | (accessing one bit of code doesn't kick the other bits of
             | code out of the cache).
        
           | erikpukinskis wrote:
           | > there isn't a whole ecosystem of packages that depend on
           | access to a thread unsafe C API
           | 
           | They mentioned a similar issue for Postgres extensions, no?
           | 
           | > Haas, though, is not convinced that it would ever be
           | possible to remove support for the process-based mode.
           | Threads might not perform better for all use cases, or some
           | important extensions may never gain support for running in
           | threads.
        
             | scolby33 wrote:
             | I question how important an extension is if there's not
             | enough incentive to port it to the new paradigm, at least
             | eventually.
        
               | abalashov wrote:
               | Well. The thing with that is just that there are a lot of
               | extensions. Like, a lot!
        
       | gjvc wrote:
       | surely this is "Some guy reconsiders the process-based model of
       | PostgreSQL"
        
         | Icathian wrote:
         | Uh. Heikki is definitely not just "some guy". Dude is one of
         | the top contributors to Postgres.
        
           | gjvc wrote:
           | How does that make him immune to having dumb ideas? See, I'm
           | judging the idea on merit.
           | 
           | You're just defending your hero who has gone rogue.
        
             | Icathian wrote:
             | I must have missed all the nuanced judgment in your
             | original post. Maybe you can quote some for me.
        
               | gjvc wrote:
               | not rising to this sarcastic bait.
        
             | anarazel wrote:
             | Heikki is far from the only "senior" postgres contributor
             | thinking that this is the right direction in the long term.
             | 
             | > You're just defending your hero who has gone rogue.
             | 
             | That doesn't sound like judging an idea on its merit, it
             | sounds like judging a person for something you haven't
             | analyzed yourself.
        
               | gjvc wrote:
               | > Heikki is far from the only "senior" postgres
               | contributor thinking that this is the right direction in
               | the long term.
               | 
               | sounds like groupthink
        
       | jpgvm wrote:
       | A close to impossible task, if anyone can do it's probably Heikki
       | though.
       | 
       | Unfortunately I expect this to go the way of zheap et al.
       | Fundamental design changes like this have just had such a rough
       | time of succeeding thus far.
       | 
       | I think for such a change to work it probably needs not just the
       | support of Neon but also of say Microsoft (current stewards of
       | Citus) that have larger engineering resources to throw at the
       | problem and grind out all the bugs.
        
         | rashidujang wrote:
         | Hey I'm fairly new to the who's who in the PostgreSQL world,
         | would you mind telling why Heikki might be able to pull this
         | off?
        
           | anarazel wrote:
           | Not who you asked, but: He is a longtime contributor who has
           | written/resigned important parts of postgres (WAL format,
           | concurrent WAL insertion, 2PC support, parts of SSI support,
           | much more). And he is just a nice person to work with.
        
             | rashidujang wrote:
             | Cool! He seems like a powerhouse in this space - thank you
             | for the answer
        
         | anarazel wrote:
         | I know that at least people from EDB (Robert) and Microsoft
         | (Thomas, me) are quite interested in eventually making this
         | transition, it's not just Heikki. Personally I won't have a lot
         | of cycles for the next release or two, but after that...
        
           | jpgvm wrote:
           | That gives me some faith, I hope everyone is able to come
           | together to make it happen.
        
       | sargun wrote:
       | I'm curious if they can take advantage of vfork / CLONE_VM, to
       | get the benefits of sharing memory and lower overhead context
       | switches, with the trade of still getting benefits from the
       | scheduler, and sysadmin-friendliness.
       | 
       | The other thing that might be interesting is FUTEX_SWAP / UMCG.
       | Although it doesn't remove the overhead induced by context
       | switches entirely (specifically, you would still deal with TLB
       | misses), you can avoid dealing with things like speculative
       | execution exploit mitigations.
        
         | nneonneo wrote:
         | Per the article, Postgres has many, many global variables, many
         | of which track per-session state; much session state is "freed"
         | via process exit rather than being explicitly cleaned up.
         | Switching to CLONE_VM requires these problems to all be solved.
        
         | why-el wrote:
         | what about support for Windows?
        
       | jupp0r wrote:
       | Please don't use mutable global state in your work. Global
       | variables are universally bad and don't provide much of a
       | benefit. The number of desirable architectural refactoring that
       | I've witnessed turning into a muddy mess because of them is
       | daunting. This is one more example of this.
        
         | orthoxerox wrote:
         | You know what a database is, do you? It is the place where you
         | store your mutable global state. You can't kick the can down
         | the road forever, _someone_ has to tackle the complexity of
         | managing state.
        
           | jupp0r wrote:
           | Databases are great, especially those who do not use global
           | variables in their implementation.
        
         | slashdev wrote:
         | Thank you for sharing your ideological views, but this is not
         | the appropriate venue for that. If you want to have a software
         | _engineering_ discussion about the trade offs involved in
         | sharing global mutable state, this is a good venue for that.
         | All engineering is trade offs. As soon as you make blanket
         | statements that X is always bad, you've transitioned into the
         | realm of ideology. Now presumably you mean to say it's almost
         | always bad. But that really depends on the context. It may well
         | be almost always bad in average software projects, but
         | PostgreSQL is not your average software project. Databases are
         | a different realm.
        
           | refulgentis wrote:
           | Global mutable state being a poor choice in software
           | architecture isn't an ideology. There is no ideology that
           | argues it is awesome.
           | 
           | If you want to have a software _engineering_ discussion about
           | the trade offs involved in sharing global mutable state, this
           | is a good venue for that.
           | 
           | All engineering is trade offs. As soon as you start telling
           | people they're making blanket statements that X is always
           | bad, you've transitioned into the realm of nitpicking.
        
             | slashdev wrote:
             | It's awesome where performance considerations are
             | paramount. It's awesome in databases. It's awesome in
             | embedded software. It's awesome in operating system
             | kernels.
             | 
             | The fact is sometimes it's good. Saying it's universally
             | bad is going beyond the realm of logic and evidence and
             | into the realm of ideology.
        
               | jupp0r wrote:
               | Can you explain how having a global variable is more
               | performant than passing a pointer to an object as a
               | function argument in practice?
        
             | megous wrote:
             | Using globals is simpler, it's also pretty natural in event
             | driven architectures. Passing everything via function
             | arguments is welcome for library code, but there's little
             | point to using it in application code. It just complicates
             | things.
        
               | jupp0r wrote:
               | The problems it causes for Postgres are outlined in the
               | article on LWN.
        
               | megous wrote:
               | > Globals work well enough when each server process has
               | its own set...
               | 
               | PostgreSQL uses a process model. So the article just
               | states that globals work fine for PostgreSQL
               | 
               | > Knizhnik has already done a threads port of PostgreSQL.
               | The global-variable problem, he said, was not that
               | difficult.
               | 
               | I see no big problem based on information from person who
               | did some porting already.
        
               | jupp0r wrote:
               | Knizhnik made these variables thread local, which is fine
               | if you have a fixed association of threads to data. This
               | looses some flexibility if your runtime needs to
               | incorporate multiple sessions on one thread (for example
               | to hide IO latency) in the future. In the end, the best
               | solution is to associate the data that belongs to a
               | session with the session itself, making it independent on
               | which thread it's running on. This is described by
               | Knizhnik as "cumbersome", which is exactly why people
               | should have not started with global variables in the
               | first place. (No blame, Postgres is from 1986 and times
               | were very different back then).
        
           | jupp0r wrote:
           | Discrediting my argument by labeling it as ideology and by
           | implying that "blanket statements are always bad" is a
           | logical fallacy that does not touch the merits of what is
           | discussed and I would argue that your argument instead of
           | mine is the one that does not belong here.
           | 
           | If you want to contribute to the discussion, I'd be happy to
           | be given an example of successful usage of global variables
           | that made a project a long term success under changing
           | requirements compared to the alternatives.
        
       | [deleted]
        
       | CodeWriter23 wrote:
       | "no objections" <> "consent"
        
         | ed25519FUUU wrote:
         | Have you ever tried to move a large organization forward in a
         | certain direction? It's really hard. At some point you have to
         | make a decision.
        
           | timcobb wrote:
           | Not in something like Postgres, I hope
        
           | CodeWriter23 wrote:
           | I have. What I've observed more is outside attackers with
           | their own agenda use the "nobody objected because they were
           | unprepared and unable to respond in the 2 minutes I gave them
           | to object" as proof their agenda is supported.
        
       | neilwilson wrote:
       | It's always amazed me with databases why they don't go the other
       | way.
       | 
       | Create an operating system specifically for the database and make
       | it so you boot the database.
       | 
       | Databases seem to spend most of their time working around the
       | operating system abstractions. So why not look at the OS, and
       | streamline it for database use - dropping all the stuff a
       | database will never need.
       | 
       | That then is a completely separate project which is far easier to
       | get started rather than shoehorning the database into an
       | operating system thread model that is already a hack of the
       | process model.
        
         | girvo wrote:
         | That was/is part of the promise of the whole unikernel thing,
         | no?
         | 
         | https://mirage.io/ or similar could then let you boot your
         | database. That said, it's not really taken off from what I can
         | tell, so I'm guessing there's more to it than that.
        
           | anarazel wrote:
           | Imo unikernels are a complicated solution in search of a
           | problem, which turns out to not exist.
           | 
           | There certainly are times OSs get in the way. But it's hard
           | enough to write a good database, we don't need to maintain a
           | third of an OS in addition.
        
             | girvo wrote:
             | Yeah indeed, that was my feeling on it as well. As much as
             | Linux et al might get in ones way at times, what we get for
             | free by relying on them is too useful to ignore for most
             | tasks I think.
             | 
             | That said, perhaps at AWS or Google scale that would be
             | different? I wonder if they've looked at this stuff
             | internally.
        
         | lelanthran wrote:
         | > Create an operating system specifically for the database and
         | make it so you boot the database.
         | 
         | (Others downthread have pointed out unikernels and I agree with
         | the criticisms)
         | 
         | This proposal is an excellent Phd project for someone like me
         | :-)
         | 
         | It ticks all of the things I like to work on the most[1]:
         | 
         | Will involve writing low-level OS code
         | 
         | Get to hyper-focus on performance
         | 
         | Writing a language parser and executor
         | 
         | Implement scheduler, threads, processes, etc.
         | 
         | Implement the listening protocol in the kernel.
         | 
         | I have to say, though, it might be easier to start off with a
         | rump kernel (netBSD), then add in a specific RAW disk access
         | that bypasses the OS (no, or fewer, syscalls to use it), create
         | a kernel module for accepting a limited type of task and
         | executing that task in-kernel (avoiding a context-swtich on
         | every syscall)[2].
         | 
         | Programs in userspace must have the lowest priority (using
         | starvation-prevention mechanisms to ensure that user input
         | would _eventually_ get processed).
         | 
         | I'd expect a non-insignificant speedup by doing all the work in
         | the kernel.
         | 
         | The way it is now,
         | 
         | userspace requests read() on a socket (context-switch to
         | kernel),
         | 
         | gets data (context-switch to userspace),
         | 
         | parses a query,
         | 
         | requests read on disk (multiple context-switches to kernel for
         | open, stat, etc, multiple switches back to userspace after each
         | call is complete). This latency is probably fairly well
         | mitigated with mmap, though.
         | 
         | logs diagnostic (multiple context-switches to and from kernel)
         | 
         | requests write on client socket (context switch to kernel back
         | and forth until all data is written).
         | 
         | The goal of the DBOS would be to remove almost all the context-
         | switching between userspace and kernel.
         | 
         | [1] My side projects include a bootable (but unfinished) x86
         | OS, various programming languages, performant (or otherwise) C
         | libraries.
         | 
         | [2] Similar to the way RealTime Linux calls work (caller shares
         | a memory buffer with rt kernel module, populates the buffer and
         | issues a call, kernel only returns when that task is complete).
         | The BPF mechanism works the same. It's the only way to reduce
         | latency to the absolute physical minimum.
        
         | mrweasel wrote:
         | You mean like Microsoft SQL Server, which basically runs a
         | small OS on top of Windows or Linux?
         | 
         | This is actually part of the reason why Microsoft was able to
         | port SQL Server to Linux fairly easily, if I recall correctly.
        
         | pizza234 wrote:
         | > Create an operating system specifically for the database and
         | make it so you boot the database.
         | 
         | I have the impression that this is similar to the adhoc
         | filesystem idea; this seems in principle very advantageous (why
         | employing two layers that do approximately the same thing on
         | top of each other?), but in reality, when implemented (by
         | Oracle), it lead to only a minor improvement (a few % points,
         | AFAIR).
        
         | samus wrote:
         | You can get most of these speedups by using advanced APIs like
         | IO_uring and friends, while still benefiting of using an OS,
         | which is taking care of the messy and thankless task of
         | hardware support.
        
         | orthoxerox wrote:
         | Sounds like IBM OS/400.
        
         | dialogbox wrote:
         | I'm not sure what you mean by OS. If you mean a whole new
         | kernel, it will take decades. They can support only small
         | number of HW. If you mean a specialized linux distro, many
         | companies does that already.
         | 
         | I don't know how that can make it easier the process based /
         | thread based problem.
        
           | rossmohax wrote:
           | This project could borrow a lot from unikernels. If they
           | mandate running it as a VM, there is no HW to support.
        
       | formerly_proven wrote:
       | ngmi
        
       | c00lio wrote:
       | Why should TLB flush performance ever be a problem on big
       | machines? You can have one process per core with 128 or more
       | cores, never flush any TLB if you pin those processes. And as it
       | is a database, shoveling data from/to disk/SSD is your main
       | concern anyways.
        
         | scottlamb wrote:
         | PostgreSQL uses synchronous IO, so you won't saturate the CPU
         | with one process (or thread) per core.
         | 
         | That said, I think there have been efforts to use io_uring on
         | Linux. I'm not sure how that would work with the process per
         | connection model. Haven't been following it...
        
           | c00lio wrote:
           | Problem with all kinds of asynchronous I/O is that your
           | processes then need internal multiplexing, akin to what
           | certain lightweight userspace thread models are doing. In the
           | end, it might be harder to introduce than just using OS
           | threads.
        
           | anarazel wrote:
           | > That said, I think there have been efforts to use io_uring
           | on Linux. I'm not sure how that would work with the process
           | per connection model. Haven't been following it...
           | 
           | There's some minor details that are easier with threads in
           | that context, but on the whole it doesn't make much of a
           | difference.
        
             | scottlamb wrote:
             | I don't understand how it works with thread per connection
             | either. io_uring is designed for systems that have a thread
             | and ring per core, for you to give it a bunch of IO to do
             | at once (batches and chains), and your threads to do other
             | work in the meantime. The syscall cost is amortized or even
             | (through IORING_SETUP_SQPOLL) eliminated. If your code is
             | instead designed to be synchronous and thus can only do one
             | IO at a time and needs a syscall to block on it, I don't
             | think there's much if any benefit in using io_uring.
             | 
             | Possibly they'd have a ring per connection and just get an
             | advantage when there's parallel IO going on for a single
             | query? or these per-connection processes wouldn't directly
             | do IO but send it via IPC to some IO-handling
             | thread/process? Not sure either of those models are
             | actually an improvement over the status quo, but who knows.
        
               | anarazel wrote:
               | > io_uring is designed for systems that have a thread and
               | ring per core
               | 
               | That's not needed to benefit from io_uring
               | 
               | > for you to give it a bunch of IO to do at once (batches
               | and chains), and your threads to do other work in the
               | meantime.
               | 
               | You can see substantial gains even if you just submit
               | multiple IOs at once, and then block waiting for any of
               | them to complete. The cost of blocking on IO is amortized
               | to some degree over multiple IOs. Of course it's even
               | better to not block at all...
               | 
               | > If your code is instead designed to be synchronous and
               | thus can only do one IO at a time and needs a syscall to
               | block on it, I don't think there's much if any benefit in
               | using io_uring.
               | 
               | We/I have done the work to issue multiple IOs at a time
               | as part of the patchset introducing AIO support (with
               | among others, an io_uring backend). There's definitely
               | more to do, particularly around index scans, but ...
        
               | scottlamb wrote:
               | Oh, I hadn't realized until now I was talking with
               | someone actually doing this work. Thanks for popping into
               | this discussion!
               | 
               | > > io_uring is designed for systems that have a thread
               | and ring per core
               | 
               | > That's not needed to benefit from io_uring
               | 
               | 90% sure I read Axboe saying that's what he designed
               | io_uring for. If it helps in other scenarios, though,
               | great.
               | 
               | > Of course it's even better to not block at all...
               | 
               | Out of curiosity, is that something you ever want/hope to
               | achieve in PostgreSQL? Many high-performance systems use
               | this model, but switching a synchronous system in plain C
               | to it sounds uncomfortably exciting, both in terms of the
               | transition itself and the additional complexity of
               | maintaining the result. To me it seems like a much
               | riskier change than the process->thread one discussed
               | here that Tom Lane already stated will be a disaster.
               | 
               | > We/I have done the work to issue multiple IOs at a time
               | as part of the patchset introducing AIO support (with
               | among others, an io_uring backend). There's definitely
               | more to do, particularly around index scans, but ...
               | 
               | Nice.
               | 
               | Is the benefit you're getting simply from adding IO
               | parallelism where there was none, or is there also a CPU
               | reduction?
               | 
               | Is having a large number of rings (as when supporting a
               | large number of incoming connections) practical? I'm
               | thinking of each ring being a significant reserved block
               | of RAM, but maybe in this scenario that's not really
               | true. A smallish ring for a smallish number of IOs for
               | the query is enough.
               | 
               | Speaking of large number of incoming connections,
               | would/could the process->thread change be a step toward
               | having a thread per active query rather than per
               | (potentially idle) connection? To me it seems like it
               | could be: all the idle ones could just be watched over by
               | one thread and queries dispatched. That'd be a nice
               | operational improvement if it meant folks no longer
               | needed a pooler [1] to get decent performance. All else
               | being equal, fewer moving parts is more pleasant...
               | 
               | [1] or even if they only needed one layer of pooler
               | instead of two, as I read some people have!
        
               | anarazel wrote:
               | > > Of course it's even better to not block at all...
               | 
               | > Out of curiosity, is that something you ever want/hope
               | to achieve in PostgreSQL? Many high-performance systems
               | use this model, but switching a synchronous system in
               | plain C to it sounds uncomfortably exciting, both in
               | terms of the delta and the additional complexity of
               | maintaining the result. To me it seems like a much
               | riskier change than the process->thread one discussed
               | here that Tom Lane already stated will be a disaster.
               | 
               | Depends on how you define it. In a lot of scenarios you
               | can avoid blocking by scheduling IO in a smart way - and
               | I think we can quite far towards that for a lot of
               | workloads and the wins are _substantial_. But that
               | obviously cannot alone guarantee that you never block.
               | 
               | I think we can get quite far avoiding blocking, but I
               | don't think we're going to a complete asynchronous model
               | in the foreseeable future. But it seems more feasible to
               | incrementally make common blocking locations support
               | asynchronicity. E.g. when a query scans multiple
               | partitions, switch to processing a different partition
               | while waiting for IO.
               | 
               | > Is having a large number of rings (as when supporting a
               | large number of incoming connections) practical? I'm
               | thinking of each ring being a significant reserved block
               | of RAM, but maybe in this scenario that's not really
               | true. A smallish ring for a smallish number of IOs for
               | the query is enough.
               | 
               | It depends on the kernel version etc. The amount of
               | memory isn't huge but initially it was affected by
               | RLIMIT_MEMLOCK... That's one reason why the AIO patchset
               | has a smaller number of io_uring "instances" than the
               | allowed connections. The other reason is that we need to
               | be able to complete IOs that other backends started
               | (otherwise there would be deadlocks), which in turn
               | requires having the file descriptor for each ring
               | available in all processes... Which wouldn't be fun with
               | a high max_connections.
               | 
               | > Speaking of large number of incoming connections,
               | would/could the process->thread be a step toward having a
               | thread per active query rather than per (potentially
               | idle) connection?
               | 
               | Yes. Moving to threads really mainly would be to make
               | subsequent improvements more realistic...
               | 
               | > That'd be a nice operational improvement if it meant
               | folks no longer needed a pooler [1] to get decent
               | performance. All else being equal, fewer moving parts is
               | more pleasant...
               | 
               | You'd likely often still want a pooler on the
               | "application server" side, to avoid TCP / SSL connection
               | establishment overhead. But that can be a quite simple
               | implementation.
        
       | chasil wrote:
       | Oracle has similar problems.
       | 
       | On UNIX systems, Oracle uses a multi-process model, and you can
       | see these:                 $ ps -ef | grep smon            USER
       | PID  PPID  STARTED   TIME %CPU %MEM COMMAND       oracle  22131
       | 1   Mar 28   3:09  0.0  4.0 ora_smon_yourdb
       | 
       | Windows forks processes about 100x slower than Linux, so Oracle
       | runs threaded on that platform in one great big PID.
       | 
       | Sybase was the first major database that fully adopted threads
       | from an architectural perspective, and Microsoft SQL Server has
       | certainly retained and improved on that model.
        
         | EvanAnderson wrote:
         | > Windows forks processes about 100x slower than Linux...
         | 
         | I work with a Windows-based COTS webapp that uses Postgres w/o
         | any connection pooling. It's nearly excruciating to use because
         | it spins-up new Postgres processes for each page load. If not
         | for the fact that the Postgres install is "turnkey" with the
         | app I'd just move Postgres over to a Linux machine.
        
           | devit wrote:
           | Use pgbouncer
        
             | ethbr0 wrote:
             | Was curious about this as an architectural solution as
             | well.
             | 
             | We're really talking about X-per-client as the primary
             | reason to move away from processes, right?
             | 
             | So if you can get most of the benefit via pooling... why
             | inherit the pain of porting?
             | 
             | Presumably latency jitter would be a difficult problem with
             | pools, but it seems easier (and safer) than porting
             | processes -> threads.
             | 
             | Disclaimer: High performance / low latency DB code is
             | pretty far outside my wheelhouse.
        
               | anarazel wrote:
               | pgbouncer is not transparent, you loose features,
               | particularly when using the pooling mode actually
               | allowing a larger number of active concurrent
               | connections. Solving those issues is a _lot_ easier with
               | threads than with processes.
        
               | ddorian43 wrote:
               | > We're really talking about X-per-client as the primary
               | reason to move away from processes, right?
               | 
               | Many other things too. Like better sharing of caches.
               | Lower overhead of thread instead of process. Etc. (read
               | the thread)
        
               | ilyt wrote:
               | The reasons are explained in article. Read the article
        
               | ethbr0 wrote:
               | I appear to have missed them, then.
               | 
               | Could you point out, aside from the large numbers of
               | clients I mentioned (and the development overhead of
               | implementing multi-process memory management code), what
               | the article mentions is a primary drawback of using
               | processes over threads?
        
               | ilyt wrote:
               | > The overhead of cross-process context switches is
               | inherently higher than switching between threads in the
               | same process - and my suspicion is that that overhead
               | will continue to increase. Once you have a significant
               | number of connections we end up spending a _lot_ of time
               | in TLB misses, and that 's inherent to the process model,
               | because you can't share the TLB across processes.
        
               | ethbr0 wrote:
               | Yes, that's per-client performance scaling ("significant
               | number of connections"), which indicates a pooled
               | connection model might mitigate most of the performance
               | impact while allowing some core code to remain process-
               | oriented (and thus, not rewritten).
        
             | treis wrote:
             | That helps a lot but it's not a replacement for large
             | number of persistent connections. If you had that you could
             | simplify things in the application layer and do interesting
             | things with the DB.
        
           | ComputerGuru wrote:
           | If you run postgres under WSLv1 (now available on Server
           | Edition as well), the WSL subsystem handles processes and
           | virtual memory in a way that has been specifically designed
           | to optimize process initialization as compared to the
           | traditional Win32 approach.
        
           | chasil wrote:
           | It would not be difficult to simply "pg_dump" all the data to
           | Postgres on a Linux machine, then quietly set the clients to
           | use the new server.
        
         | blinkingled wrote:
         | Didn't Oracle switch to threaded model in 12c - at least on
         | Linux I remember there being a parameter to do that - it
         | dropped the number of processes significantly.
        
           | chasil wrote:
           | No, I ran that on v19.                 $ ps -ef | grep smon
           | UID        PID  PPID  C STIME TTY          TIME CMD
           | oracle   22131     1  0 Mar28 ?        00:03:09
           | ora_smon_yourdb            $ $ORACLE_HOME/bin/sqlplus -silent
           | '/ as sysdba'       select version_full from v$instance;
           | VERSION_FULL       -----------------       19.18.0.0.0
        
             | blinkingled wrote:
             | https://oracle-base.com/articles/12c/multithreaded-model-
             | usi...
             | 
             | Probably still requires the parameter to be set.
        
               | chasil wrote:
               | Contrast this to Microsoft SQL Server:                 $
               | systemctl status mssql-server       * mssql-
               | server.service - Microsoft SQL Server Database Engine
               | Loaded: loaded (/usr/lib/systemd/system/mssql-
               | server.service; disabled; vendor preset: disabled)
               | Active: active (running) since Mon 2023-06-19 15:48:05
               | CDT; 1min 18s ago            Docs:
               | https://docs.microsoft.com/en-us/sql/linux        Main
               | PID: 2125 (sqlservr)           Tasks: 123
               | CGroup: /system.slice/mssql-server.service
               | +-2125 /opt/mssql/bin/sqlservr                  +-2156
               | /opt/mssql/bin/sqlservr
        
               | CoolCold wrote:
               | I'm not sure what I wonder on more - seeing its not
               | enabled on boot or seeing mssql under systemd
        
               | blinkingled wrote:
               | Yeah multiprocess isn't Microsoft's style given how
               | expensive creating processes is on Windows.
               | 
               | Oracle - never had a scalability issue on very big Linux,
               | Solaris and HPUX systems though - they do it well in my
               | experience.
        
           | hans_castorp wrote:
           | > Didn't Oracle switch to threaded model in 12c
           | 
           | It's optional, and the default is still a process model on
           | Linux.
        
       | 0xbadcafebee wrote:
       | So compromise. Take the current process model, add threading and
       | shared memory, with feature flags to limit number of processes
       | and number of threads.
       | 
       | Want to run an extension that isn't threadsafe? Run with 10
       | processes, 1 threads. Want to run high-performance? Run with 1
       | process, 10 threads. Afraid of "stability issues"? Run with 1
       | process, 1 thread.
       | 
       | Will it be hard to do? Sure. Impossible? Not at all. Plan for it,
       | give a very long runway, throw all your new features into the
       | next major version branch, and tell people everything else is off
       | the table for the next few years. If you're _really sure_
       | threading is going to be increasingly necessary, better to start
       | now than to wait until it 's too late. But this idea of "oh it's
       | hard", "oh it's dangerous", "too complicated", etc is bullshit.
       | We've built fucking spaceships that visit other planets. We can
       | make a database with threads that doesn't break. Otherwise we
       | admit that basic software development using practices from the
       | past 30 years is too much for us to figure out.
        
       | MichaelMoser123 wrote:
       | i wonder if gpt-11 will be able to do this kind of project on its
       | own...
        
       | est wrote:
       | Not sure what's going on here, but one connection per process
       | seems... ancient.
       | 
       | Using threaded model is difficult, how about pre-fork? Some
       | connections per process is a good improvement.
        
         | anarazel wrote:
         | The issue is costlier (runtime, complexity, memory) resource
         | sharing, not the cost of the fork itself. Pre-forking isn't
         | going to help with any of that.
        
       | EGreg wrote:
       | I hope they don't do it.
       | 
       | I've had a similar situation with PHP, where we had written quite
       | a large engine (https://github.com/Qbix/Platform) with many
       | features (https://qbix.com/features.pdf) . It took advantage of
       | the fact that PHP isolated each script and gave it its own global
       | variables, etc. In fact, much of the request handling did stuff
       | like this:                 Q_Request::requireFields(['a', 'b',
       | 'c']);       $uri = Q_Dispatcher::uri();
       | 
       | instead of stuff like this:
       | $this->getContext()->request()->requireFields(['a', 'b', 'c']);
       | $this->getContext()->dispatcher()->uri();
       | 
       | Over the last few years, I have run across many compelling
       | things:                 amp       reactPHP       Swoole (native
       | extension)       Fibers (inside PHP itself)
       | 
       | It seemed so cool! PHP could behave like Node! It would have an
       | event loop and everything. Fibers were basically PHP's version of
       | Swoole's coroutines, etc. etc.
       | 
       | Then I realized... we would have to go through the entire code
       | and redo how it all works. We'd also no longer benefit from PHP's
       | process isolation. If one process crapped out or had a memory
       | leak, it could take down everything else.
       | 
       | There's a reason PHP still runs 80% of all web servers in the
       | world (https://kinsta.com/blog/is-php-dead/) ... and one of the
       | biggest is that commodity servers can host terrible PHP code and
       | it's mostly isolated in little processes that finish "quickly"
       | before they can wreak havoc on other processes or on long-running
       | stuff.
       | 
       | So now back to postgres. It's been praised for its rock-solid
       | reliability and security. It's got so many features and the MVCC
       | is very flexible. It seems to use a lot of global variables. They
       | can spend their time on many other things, like making it
       | byzantine-fault-tolerant, or something.
       | 
       | The clincher for me was when I learned that php-fpm (which spins
       | up processes which sleep when waiting for I/O) is only 50% slower
       | than all those fancy things above. Sure, PHP with Swoole can
       | outperform even Node.js, and can handle twice as many requests.
       | But we'd rather focus on soo many other things we need to do :)
        
         | zackmorris wrote:
         | I've been using PHP for decades and have found its isolated
         | process model to be about the best around, certainly for any
         | mainstream language. Also Symfony's Process component
         | encapsulates most of the errata around process management in a
         | cross-platform way:
         | 
         | https://symfony.com/doc/current/components/process.html
         | 
         | Going from a working process implementation to async/threads
         | with shared memory is pretty much always a mistake IMHO,
         | especially if it's only done for performance reasons. Any speed
         | gains will be eclipsed by endless whack-a-mole bug fixes, until
         | the code devolves into something unrecognizable. Especially
         | when there are other approaches similar to map-reduce and
         | scatter-gather arrays where data is processed in a distributed
         | fashion and then joined into a final representation through
         | mechanisms like copy-on-write, which are supported by very few
         | languages outside of PHP and the functional programming world.
         | 
         | The real problem here is the process spawning and context-
         | switching overhead of all versions of Windows. I'd vote to
         | scrap their process code in its entirety and write a new
         | version based on atomic operations/lists/queues/buffers/rings
         | with no locks and present an interface which emulates the
         | previous poor behavior, then run it through something like a
         | SAT solver to ensure that any errata that existing software
         | depends on is still present. Then apps could opt to use the
         | direct unix-style interface and skip the cruft, or refactor
         | their code to use the new interface.
         | 
         | Apple did something similar to this when OS X was released,
         | built on a mostly POSIX Darwin, NextSTEP, Mach and BSD Unix. I
         | have no idea how many times Microsoft has rewritten their
         | process model or if they've succeeded in getting performance on
         | par with their competitors (unlikely).
         | 
         | Edit: I realized that the PHP philosophy may not make a lot of
         | sense to people today. In the 90s, OS code was universally
         | terrible, so for example the graphics libraries of Mac and
         | Windows ran roughly 100 times slower than they should for
         | various reasons, and developers wrote blitters to make it
         | possible for games to run in real time. That was how I was
         | introduced to programming. PHP encapsulated the lackluster OS
         | calls in a cross-platform way, using existing keywords from
         | popular languages to reduce the learning curve to maybe a day
         | (unlike Perl/Ruby, which are weird in a way that can be fun but
         | impractical to grok later). So it's best to think of PHP more
         | like something like Unity, where the nonsense is abstracted and
         | developers can get down to business. Even though it looks like
         | Javascript with dollar signs on the variables. It's also more
         | like the shell, where it tries to be as close as possible to
         | bare-metal performance, even while restricted to the 100x
         | interpreter slowdown of languages like Python. I find that PHP
         | easily saturates the processor when doing things in a data-
         | driven way by piping bytes around.
        
       | bob1029 wrote:
       | I think an interesting point of comparison is the latest
       | incarnation of SQL Server. You can't even point at 1 specific
       | machine anymore with their hyperscale architecture.
       | 
       | https://learn.microsoft.com/en-us/azure/azure-sql/database/h...
        
       | js4ever wrote:
       | Finally! This and a good multi master story and I'll finally
       | start to love Postgres
        
       | waselighis wrote:
       | It sounds to me like migrating to a fully multi-threaded
       | architecture may not be worth the effort. Simply reducing the
       | number of processes from thousands to hundreds would be a huge
       | win and likely much more feasible than a complete re-
       | architecture.
        
       | sbr464 wrote:
       | Unpopular idea: age limit on votes/contributions.
       | 
       | We are in a unique time period and generation that has strong
       | opinions based on a history that only exists within itself.
       | 
       | At some point it needs to rewrite.
        
       | chucky_z wrote:
       | I wish they would do some kind of easy shared storage instead, or
       | in addition too. This sounds like an odd solution, however I've
       | scaled pgsql since 9 on very, very large machines and doing 1
       | pgsql cluster per physical socket ended up doing near-linear
       | scaling even on 100+ total core machines with TB+ of memory.
       | 
       | The challenge with this setup is that you need to do 1 writer and
       | multiple reader clusters so you end up doing localhost
       | replication which is super weird. If that requirement was somehow
       | removed that'd be awesome for scaling really huge clusters.
        
       | t43562 wrote:
       | Calm down guys! Threading is tricky but they can rewrite it all
       | in Rust so it'll be completely ok........
       | 
       | ;-)
        
       ___________________________________________________________________
       (page generated 2023-06-20 23:02 UTC)