[HN Gopher] PostgreSQL reconsiders its process-based model
___________________________________________________________________
PostgreSQL reconsiders its process-based model
Author : todsacerdoti
Score : 784 points
Date : 2023-06-19 16:33 UTC (1 days ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| mihaic wrote:
| I'm honestly surprised it took them so long to reach this
| conclusion.
|
| > That idea quickly loses its appeal, though, when one considers
| trying to create and maintain a 2,000-member structure, so the
| project is unlikely to go this way.
|
| As repulsive as this might sound at first, I've seen structures
| of hundreds of fields work fine if the hierarchy inside them is
| well organized and they're not just flat. Still, I have no real
| knowledge of the complexity of the code and wish the Postgres
| devs all the luck in the world to get this working smoothly.
| rsaxvc wrote:
| This is how I made my fork of libtcc lock-free.
|
| Mainline has a lock so that all backends can use global
| variables, but only one instance can do codegen at a time.
|
| It was a giant refactoring Especially fun was when multiple
| compilation units used the same static variable name, but it
| all worked in the end.
| 72deluxe wrote:
| Out of curiosity, where is this fork? Sounds very
| interesting.
| rsaxvc wrote:
| https://github.com/rsaxvc/tinycc-multithreaded
|
| This is the multi-threaded compiler:
| https://github.com/rsaxvc/tcc-swarm
|
| With the multi-threaded tcc above it scales about as well
| as multiprocess. With mainline it doesn't scale well at
| all.
|
| So far I haven't gotten around to reusing anything across
| libtcc handles/instances, but would eventually like to
| share mmap()'d headers across instances, as well as cache
| include paths, and take invocation arguments through stdin
| one compilation unit per line.
| paulddraper wrote:
| > I'm honestly surprised it took them so long to reach this
| conclusion.
|
| On the contrary, it's been discussed for ages. But it's a huge
| change, with only modest advantages.
|
| I'm skeptical of the ROI to be honest. Not that is doesn't have
| value, but that it has more value than the effort.
| 36364949thrw wrote:
| > it's a huge change, with only modest advantages
|
| +significant and unknown set of new problems, including new
| bugs.
|
| This reminds me of the time they lifted entire streets in
| Chicago by 14 feet to address new urban requirements.
| Chicago, we can safely assume, did not have the option of
| just starting a brand new city a few miles away.
|
| The interesting question here is should a system design that
| works quite well upto a certain scale be abandoned in order
| to extend its market reach.
| datavirtue wrote:
| Yeah, and you will run headlong into other unforseen real
| world issues. You may never reach the performance goals.
| loeg wrote:
| Yeah. I think as a straightforward, easily correct transition
| from 2000 globals, a giant structure isn't an awful idea. It's
| not like the globals were organized before! You're just making
| the ambient state (awful as it is) explicit.
| stingraycharles wrote:
| Yes, it's the most pragmatic and it's only "awful" because it
| makes the actual problem visible. And would likely encourage
| slowly refactoring code to handle its state in a more sane
| way, until you're only left with the really gnarly stuff,
| which shouldn't be too much anymore and you can put them in
| individual thread local storages.
|
| It's an easy transition path.
| mihaic wrote:
| Exactly, if you're now forced to put everything in one place
| you're forced to acknowledge and understand the complexity of
| your state, and might have incentives to simplify it.
| Sesse__ wrote:
| Here's MySQL's all-session-globals-in-one-place-class:
| https://github.com/mysql/mysql-
| server/blob/8.0/sql/sql_class...
|
| I believe I can safely say that nobody acknowledges and
| understands the complexity of all state within that class,
| and that whatever incentives there may be to simplify it
| are not enough for that to actually happen.
|
| (It ends on line 4692)
| IshKebab wrote:
| Right but that would still be true if they were globals
| instead. Putting all the globals in a class doesn't make
| any difference to how much state you have.
| Sesse__ wrote:
| > Putting all the globals in a class doesn't make any
| difference to how much state you have.
|
| I didn't make any claims about the _amount_ of state. My
| claim was that "you're forced to acknowledge and
| understand the complexity of your state" (i.e., moving it
| all together in one place helps understanding the state)
| is plain-out wrong.
| IshKebab wrote:
| It's not wrong. Obviously putting it all in one place
| makes you consider just how much of it you have, rather
| than having it hidden away all over your code.
| cakoose wrote:
| > I think as a straightforward, easily correct transition
| from 2000 globals, a giant structure isn't an awful idea.
|
| Agree.
|
| > It's not like the globals were organized before!
|
| Using a struct with 2000 fields loses some encapsulation.
|
| When a global is defined in a ".c" file (and not exported via
| a ".h" file), it can only be accessed in that one ".c" file,
| sort of like a "private" field in a class.
|
| Switching to a single struct would mean that all globals can
| be accessed by all code.
|
| There's probably a way to define things that allows you to
| regain some encapsulation, though. For example, some spin on
| the opaque type pattern:
| https://stackoverflow.com/a/29121847/163832
| pasc1878 wrote:
| No that is what a static in a .c file is for.
|
| A plain global can be accessed from other compiled units -
| agreed with no .h entry it is my=uch more error prone e.g.
| you don't know the type but the variables name is exposed
| to other objects
| remexre wrote:
| Wouldn't those statics also be slated for removal with
| this change?
| junon wrote:
| At most they'd be determined to be read only constants
| that are inlined during constant folding. This includes
| most integral sized / typed scalar values that fit into
| registers for the most part, and nothing you've taken the
| address of either - those remain as static data.
| cakoose wrote:
| I think there might be a terminology mix-up here. In C, a
| global variable with the `static` keyword is is still
| mutable. So it typically can't be constant-
| folded/inlined.
|
| The `static` modifier in that context just means that the
| symbol is not exported, so other ".c" files can't access
| it.
| bourgeoismedia wrote:
| A static variable in C is mutable in the same sense that
| a local variable is, but since it's not visible outside
| the current compilation unit the optimizer is allowed to
| observe that it's never actually modified or published
| and constant fold it away.
|
| Check out the generated assembly for this simple program,
| notice that kBase is folded even though it's not marked
| const: https://godbolt.org/z/h45vYo5x5
| cakoose wrote:
| It is also possible for a link-time optimizer to observe
| that a non-static global variable is never modified and
| optimize that away too.
|
| But the Postgres mailing list is talking about 2000
| global variables being a hurdle to multi-threading. I
| doubt they just didn't realize that most of them can be
| optimized into constants.
| anarazel wrote:
| Yea. Just about none of them could be optimized to
| constants because, uh, they're not constant. We're not
| perfect, but we do add const etc to TU level
| statics/globals that are actually read only. And if they
| are actually read only, we don't care about them in the
| context of threading anyway, since they wouldn't need any
| different behaviour anyway.
| cogman10 wrote:
| I think my bigger fear is around security. A process per
| connection keeps things pretty secure for that connection
| regardless of what the global variables are doing (somewhat
| hard to mess that up with no concurrency going on in a
| process).
|
| Merge all that into one process with many threads and it
| becomes a nightmare problem to ensure some random addon
| didn't decide to change a global var mid processing (which
| causes wrong data to be read).
| dfox wrote:
| All postgres processes run under the same system user and
| all the access checking happens completely in userspace.
| fdr wrote:
| Access checking, yes, but the scope of memory corruption
| does increase unavoidably, given the main thing the
| pgsql-hackers investigating threads want: one virtual
| memory context when toggling between concurrent work.
|
| Of course, there's a huge amount of shared space already,
| so a willful corruption can already do virtually
| anything. But, more is more.
| magicalhippo wrote:
| We did this with a project I worked on. I came on after the
| code was mature.
|
| While we didn't have 2000 globals, we did have a non-trivial
| amount, spread over about 300kLOC of C++.
|
| We started by just stuffing them into a "context" struct, and
| every function that accessed a global thus needed to take a
| context instance as a new parameter. This was tedious but
| easy.
|
| However the upside was that this highlighted poor
| architecture. Over time we refactored those bits and the main
| context struct shrunk significantly.
|
| The result was better and more modular code, and overall well
| worth the effort in our case, in my opinion.
| MuffinFlavored wrote:
| > if the hierarchy inside them is well organized
|
| is this another way to say "in a 2000 member structure, only 10
| have significant voting power"?
| Ankhers wrote:
| This statement is not about people, it is about a C struct.
| FooBarWidget wrote:
| I don't get it. How is a 2000-member structure any different
| from having 2000 global variables? How is maintaining the
| struct possibly harder than maintaining the globals?
| Refactoring globals to struct members is semantically nearly
| identical, it may as well just be a mechanical, cosmetic
| change, while also giving the possibility to move to a threaded
| architecture.
| ComputerGuru wrote:
| Because global variables can be confined to individual cpp
| files, exclusively visible in that compilation unit. It makes
| them far easier to reason with than hoisting them to the
| "global and globally visible" option if you just use a
| gargantuan struct. Which is why a more invasive refactor
| might be required.
| imtringued wrote:
| Just use thread local variables.
|
| I abuse them for ridiculous things.
| jeltz wrote:
| That is the plan for PostgreSQL.
| ComputerGuru wrote:
| Yeah, I was really into that before there was even a
| cross-compiler/cross-platform syntax for declaring TLS
| values in C++ but have since "upgraded" to avoiding TLS
| altogether where possible. The quality of the
| implementations vary greatly from compiler and platform
| to compiler and platform, you run into weird issues with
| thread_at exit if they're not primitive types, they run
| afoul of any fibers/coroutines/etc that have since become
| extremely prevalent, and a few other things.
| comboy wrote:
| I've never really been limited by CPU when running postgres
| (few TB instances). The bottleneck is always IO. Do others have
| different experience? Plus there's elegance and a feeling of
| being in control when you know query is associated with
| specific process which you can deal with and monitor just like
| any other process.
|
| But I'm very much clueless about internals, so this is a
| question rather than an opinion.
| hyperman1 wrote:
| I see postgres become CPU bound regularly: Lots of hash
| joins, copy from or to CSV, index or materialized view
| rebuild. Postgis eats CPU. Tds_fdw tends to spend a lot of
| time doing charset conversion, more than actually networking
| to mssql.
|
| I was surprised when starting with postgres. Then again, I
| have smaller databases (A few TB) and the cache hit ratio
| tends to be about 95%. Combine that with SSDs, and it becomes
| understandable.
|
| Even so, I am wary of this change. Postgres is very reliable,
| and I have no problem throwing some extra hardware to it in
| return. But these people have proven they know what they are
| doing, so I'll go with their opinion.
| aetherson wrote:
| I've also definitely seen a lot of CPU bounding on
| postgres.
| sargun wrote:
| With modern SSDs that can push 1M IOPs+, you can get into a
| situation where I/O latency starts to become a problem, but
| in my experience, they far outpace what the CPU can do. Even
| the I/O stack can be optimized further in some of these
| cases, but often it comes with the trade off of shifting more
| work into the CPU.
| ilyt wrote:
| >I've never really been limited by CPU when running postgres
| (few TB instances). The bottleneck is always IO.
|
| Throw a few NVMe drives at it and it might.
| dfox wrote:
| Throw a ridiculous amount of RAM at it is more correct
| assessment. NVMe reads are still an "I/O" and that is slow.
| And for at least 10 years buying enough RAM to have all off
| the interesting parts of OLTP psql database either in
| shared_buffers or in the OS-level buffer cache is
| completely feasible.
| rcxdude wrote:
| an array of modern SSDs can get to a similar bandwidth to
| RAM, albeit with significantly worse latency still. It's
| not that hard to push the bottleneck elsewhere in a lot
| of workloads. High performance fileservers, for example,
| need pretty beefy CPUs to keep up.
| ilyt wrote:
| > NVMe reads are still an "I/O" and that is slow
|
| It's orders of magnitude faster than SAS/SATA SSDs and
| you can throw 10 of them into 1U server. It's nowhere
| near "slow" and still easy enough to be CPU bottlenecked
| before you get IO bottlenecked.
|
| But yes, pair of 1TB RAM servers gotta cost you less than
| half year's worth of developer salary
| phamilton wrote:
| I've generally had buffer-cache hit rates in the 99.9% range,
| which ends up being minimal read I/O. (This is on AWS Aurora,
| where these bo disk cache and so shared_buffers is the
| primary cache, but an equivalent measure for vanilla postgres
| exists.)
|
| In those scenarios,there's very little read I/O. CPU is the
| primary bottleneck. That's why we run up as many as 10 Aurora
| readers (autoscaled with traffic).
| paulddraper wrote:
| Depends on your queries.
|
| If you push a lot of work into the database including JSON
| and have a lot of buffer memory...CPU can easily be limiting.
| Diggsey wrote:
| It's not just CPU - memory usage is also higher. In
| particular, idle connections still consume signficant memory,
| and this is why PostgreSQL has so much lower connection
| limits than eg. MySQL. Pooling can help in some cases, but
| pooling also breaks some important PostgreSQL features (like
| prepared statements...) since poolers generally can't
| preserve session state. Other features (eg. notify) are just
| incompatible with pooling. And pooling cannot help with
| connections that are idle but inside a transaction.
|
| That said, many of these things are solvable without a full
| switch to a threaded model (eg. by having pooling built-in
| and session-state-aware).
| ComputerGuru wrote:
| > solvable without a full switch to a threaded model (eg.
| by having pooling built-in and session-state-aware).
|
| Yeeeeesssss, but solving that is solving the hardest part
| of switching to a threaded model. It requires the team to
| come terms with the global state and encapsulating session
| state in a non-global struct.
| anarazel wrote:
| > That said, many of these things are solvable without a
| full switch to a threaded model (eg. by having pooling
| built-in and session-state-aware).
|
| The thing is that that's a lot easier with threads. Much of
| the session state lives in process private memory (prepared
| statements etc), and it can't be statically sized ahead of
| time. If you move all that state into dynamically allocated
| shared memory, you've basically paid all the price for
| threading already, except you can't use any tooling for
| threads.
| saulrh wrote:
| Also, even if a 2k-member structure is obnoxious, consider the
| alternative - having to think about and manage 2k global
| variables is probably even worse!
| megous wrote:
| Each set of globals is in a module it relates to, not in some
| central file where everything has to be in one struct.
|
| If anything, it's probably easier to understand.
| hans_castorp wrote:
| > I'm honestly surprised it took them so long to reach this
| conclusion.
|
| Oracle also uses a process model on Linux. At some point (I
| think starting with 12.x), it can now be configured on Linux to
| use a threaded model, but the default is still a process-per-
| connection model.
|
| Why does everybody think it's a bad thing in Postgres, but
| nobody thinks it's a bad thing in Oracle.
| patmorgan23 wrote:
| Well for one Postgress is open source and widely used. So
| anyone can pick it up and look at its internals, that's not
| the case for Oracle DB .
| topspin wrote:
| > I'm honestly surprised it took them so long to reach this
| conclusion.
|
| I'm not. You can get a long way with conventional IPC, and OS
| processes provide a lot of value. For most PostgreSQL instances
| the TLB flush penalty is _at least_ 3rd or 4th on the list of
| performance concerns, _far_ below prevailing storage and
| network bottlenecks.
|
| I share the concerns cited in this LWN story. Reworking this
| massive code base around multithreading carries a large amount
| of risk. PostgreSQL developers will have to level up
| substantially to pull it off.
|
| A PostgreSQL endorsed "second-system" with the (likely
| impossible, but close enough that it wouldn't matter) goal of
| 100% client compatibility could be a better approach. Adopting
| a memory safe language would make this both tractable and
| attractive (to both developers and users.) The home truth is
| that any "new process model" effort would actually play out
| exactly this way, so why not be deliberate about it?
| gmokki wrote:
| Would something like opt-in sharing of pages between
| processes that oracle has been trying to get into kernel be
| the correct option: https://lwn.net/ml/linux-
| kernel/cover.1682453344.git.khalid....
|
| Postmaster would just share the already shared memory between
| processes (containing also the locks). That explicit part of
| memory would opt-in to thread -like sharing and thus get
| faster/less tlb switching and lower memory usage. While all
| the rest of the state would still be per-process and safe.
|
| tl;dr super share the existing shared memory area with kernel
| patch
|
| All operating systems not supporting it would keep working as
| is.
| topspin wrote:
| Yes, it would mitigate the TLB problem. Interesting that
| Oracle is also looking to solve this problem, but not by
| multithreading the Oracle RDBMS.
| atonse wrote:
| Would this basically be a new front end? Like the part that
| handles sockets and input?
|
| Or more if a rewrite of subsystems? Like the query planner or
| storage engine etc?
| topspin wrote:
| Both, I'd imagine.
|
| With regard to client compatibility there are related
| precedents for this already; the PostgreSQL wire protocol
| has emerged as a de facto standard. Cockroachdb and
| ClickHouse are two examples that come to mind.
| nextaccountic wrote:
| From what I gather postgres isn't doing conventional IPC but
| instead it uses shared memory, which means the same mechanism
| threads use but with way higher complexity
| mgaunard wrote:
| What do you think IPC is?
| topspin wrote:
| As does Oracle, and others. I'm aware.
|
| IPC, to me, includes the conventional shared memory
| resources (memory segments, locks, semaphores, condition
| variable, etc.) used by these systems: resources acquired
| by processes for the purpose of communication with other
| processes.
|
| I get it though. The most general concept of shared memory
| is not coupled to an OS "process." You made me question
| whether my concept of term IPC was valid, however. So what
| does one do when a question appears? Stop thinking
| immediately and consult a language model!
|
| Q: Is shared memory considered a form of interprocess
| communication?
|
| GPT-4: Yes, shared memory is indeed considered a form of
| interprocess communication (IPC). It's one of the several
| mechanisms provided by an operating system to allow
| processes to share and exchange data.
|
| ...
|
| Why does citing ChatGPT make me feel so ugly inside?
| faangsticle wrote:
| > Why does citing ChatGPT make me feel so ugly inside?
|
| Its the modern let me Google that for you. Just like
| people don't care what the #1 result on Google is, they
| also don't care what ChatGPT has to say about it. If they
| did, they'd ask it themselves.
| pritambarhate wrote:
| Without a credible source to reconfirm what ChatGPT said,
| one can't really assume what ChatGPT says is correct.
| TeMPOraL wrote:
| I always understood IPC, "interprocess communication", in
| general sense, as anything and everything that can be
| used by processes to communicate with each other - of
| course with a narrowing provision that common use of the
| term refers to those means that are typically used for
| that purpose, are relatively efficient, and the process
| in question run on the same machine.
|
| In that view, I always saw shared memory as IPC, in that
| it is a tool commonly used to exchange data between
| processes, but of course it is not strictly tied to any
| process in particular. This is similar to files, which if
| you squint are a form of IPC too, and are also not tied
| to any specific process.
|
| > _Why does citing ChatGPT make me feel so ugly inside?_
|
| That's probably because, in cases like this, it's not
| much different to stating it yourself, but is more noisy.
| wbl wrote:
| Not necessarily. Man 3 shmem if you want a journey back to
| some bad ideas.
| shepardrtc wrote:
| I think this is a situation where a message-passing Actor-based
| model would do well. Maybe pass variable updates to a single
| writer process/thread through channels or a queue.
|
| Years ago I wrote an algorithmic trader in Python (and Cython
| for the hotspots) using Multiprocessing and I was able to get
| away with a lot using that approach. I had one process
| receiving websocket updates from the exchange, another process
| writing them to an order book that used a custom data
| structure, and multiple other processes reading from that data
| structure. Ran well enough that trade decisions could be made
| in a few thousand nanoseconds on an average EC2 instance. Not
| sure what their latency requirements are, though I imagine they
| may need to be faster.
|
| Obviously mutexes are the bottleneck for them at this point,
| and while my idea might be a bit slower than a low-load
| situation, perhaps it would be faster when you start getting to
| higher load.
| hamandcheese wrote:
| I think the Actor model is fine if you start there, but I
| can't imagine incrementally adopting it in a large,
| preexisting code base.
| ilyt wrote:
| That would most likely be several times slower than current
| model
| bb88 wrote:
| This reminds me of this poster: "You must be this tall..."
|
| https://bholley.net/blog/2015/must-be-this-tall-to-write-mul...
|
| Back about a decade ago I was "auditing" someone else's threaded
| code. And couldn't figure it out. But he was the company's
| "golden child" so by default it must be working code because he
| wrote it.
|
| And then it started causing deadlocks in prod.
|
| "What do you want me to do about it? It's the golden child's
| code. He's not even gonna show up til 2pm today."
| wmf wrote:
| The thing is... multi-process with a bespoke shared memory
| system isn't better than multithreading; it's much worse.
| citrin_ru wrote:
| The difference is between everything shared (threads) and
| some parts are shared explicitly (processes with shared
| memory). I'm not sure 2nd is worse.
| paulddraper wrote:
| It kinda is though. The process barrier is better at
| enforcing careful deliberate interactions
| throwawaylinux wrote:
| By bespoke you mean using standard interfaces to create
| shared memory pools?
|
| They do roll some of their own locking primitives, but that's
| not particularly unusual in a large portable program (and
| quite likely what they wanted is/was not available in glibc
| or other standard libraries, at least when first written).
| __turbobrew__ wrote:
| In Linux, multi process with shared memory regions is
| basically just threads. The kernel doesn't know anything
| about threads, it knows about processes and it lets you share
| memory regions between those processes if you so desire.
| anarazel wrote:
| I'm not sure if I'd judge it as harshly, but you have a good
| point: A lot of debugging / validation tooling understands
| threads, but not memory shared between processes.
| georgewfraser wrote:
| I wonder if it would be easier to create a C virtual machine that
| emulates all the OS interaction, then recompile Postgres and the
| extensions to run on this. Perhaps TruffleC would work?
|
| https://dl.acm.org/doi/10.1145/2647508.2647528
| anarazel wrote:
| Hard to believe that would provide any benefit without also
| causing massive slowdowns.
| thecopy wrote:
| This feels like developers are bored and want a challenge.
| dboreham wrote:
| It's a multi-decade ask from many PG users and a serious pain
| point for many deployments.
| benlivengood wrote:
| This is interesting because Google just created AlloyDb[0] which
| is decidedly multiprocess for performance and switches out the
| storage layer from a read/write model to write+replicate + read-
| only model.
|
| The deep dive[1] has some details; the tl;dr: is that the main
| process only has to output Write Ahead Logs to the durable
| storage layer which minimizes transaction latency. The log
| processing service materializes postgres-compatible on-disk
| blocks that read-only replicas can read from, with a caching
| layer for block reads which sends cache invalidations from the
| LPS to read replicas.
|
| I'm not sure if similar benefits could be seen within a single
| machine; using network DMA or even rDMA to transfer bytes to and
| from remote machines also avoids TLB invalidation. There are some
| mentions in the mailing list of waiting for Linux to support
| shared page mappings between processes as a solution.
|
| I'm not exactly sure I understand the reasoning behind process
| separation as crash recovery; as far as I understand each
| connection is responsible for correctness and so if a process
| crashes there seems to be an assumption that the database can
| recover and keep working by killing that process but that seems
| like it risks silent data corruption; perhaps it's equivalently
| mitigated by separating materialization of blocks from the sync'd
| WAL in a separate process from the multithreaded connection
| process producing WAL entries?
|
| [0] https://cloud.google.com/alloydb [1]
| https://cloud.google.com/blog/products/databases/alloydb-for...
| Icathian wrote:
| Some interesting discussion on this here also:
| https://news.ycombinator.com/item?id=36284487
| [deleted]
| levkk wrote:
| Pretty sure Tom Lane said this will be a disaster in that same
| pgsql-hackers thread. Not entirely sure what benefits the multi-
| threaded model will have when you can easily saturate the entire
| CPU with just 128 connections and a pooler. So I doubt there is
| consensus or even strong desire from the community to undertake
| this boil the ocean project.
|
| On the other hand, having the ability to shut down and cleanup
| the entire memory space of a single connection by just
| disconnecting is really nice, especially if you have extensions
| that do interesting things.
| ilyt wrote:
| >Not entirely sure what benefits the multi-threaded model will
| have when you can easily saturate the entire CPU with just 128
| connections and a pooler.
|
| That the all of those would work faster because of performance
| benefits, as mentioned in article
| BoardsOfCanada wrote:
| From the article:
|
| > Tom Lane said: "I think this will be a disaster. There is far
| too much code that will get broken". He added later that the
| cost of this change would be "enormous", it would create "more
| than one security-grade bug", and that the benefits would not
| justify the cost.
| timcobb wrote:
| You can think of this as an opportunity to rewrite in Rust.
| tracker1 wrote:
| AfterPostgres
| krylon wrote:
| May I humbly suggest PostPostgres, or Post2gres?
| tomjakubowski wrote:
| 2Post2Furigres
| moi2388 wrote:
| Postgr3s: Tokyo Thread
| glintik wrote:
| Mysql guys: Smart move, Postgres!
| pjmlp wrote:
| So when we as a whole decided that multiprocessing is a much
| better approach from security and application stability point of
| view, they decide to go with threads?
| blinkingled wrote:
| Horses for courses I guess - purely threaded vs purely MP both
| have different set of tradeoffs and shoehorning one over the
| other always fails some use cases. The article says they are
| also considering the possibility of having to keep both process
| and thread models indefinitely for this and other reasons.
|
| I know nothing of PG internals but I can see why process per
| connection model doesn't work for large machines and/or high
| number of connections. One way to do it would be to keep
| connection handling per thread and still keep multiprocess
| approach where it makes sense for security and doesn't add
| linear overheads.
| RcouF1uZ4gsC wrote:
| This would be one those places where a language like Rust would
| be helpful. In C/C++ with undefined behavior and crashes, process
| isolation makes a lot of sense to limit the blast radius. Rust
| borrow checker gives you at compile time a lot of the safety that
| you would rely on process isolation for.
| mattashii wrote:
| Yes, but note that the blast radius of a PostgreSQL process
| crash is already "the whole system reboots", so there are not a
| lot of differences between process- and thread-based PostgreSQL
| written in C.
|
| Rewriting in Rust would be interesting, but it would also
| probably be too invasive to make it worthwile at all - all code
| in PostgreSQL is C, while not all code in PostgreSQL interacts
| with the intrinsics of processes vs threads. Any rewrite to
| Rust would likely take several times more effort than a port to
| threads.
| megous wrote:
| PostgreSQL process crash may also just be, one query fails.
| beebmam wrote:
| Why change postgres? Just fork it if you want to change something
| this fundamental.
| anarazel wrote:
| Because it makes it a lot easier to address some of postgres'
| weaknesses? This is a proposal by long time contributor to
| postgres, that a number of other long time contributors agree
| with (and others disagree with!). Why shouldn't have Heikki
| brought this up for discussion?
| ceeam wrote:
| Is there any reason at all people use intrinsically bug-prone and
| broken multithreading mode instead of fork() and IPC apart from
| WinAPI having no proper fork?
| adwn wrote:
| Yes, and some of those reasons are even listed in the article.
| ceeam wrote:
| TLB misses? They are just a detail of particular CPU
| implementation, and the architectures change. Also, aren't
| they per core and not per process? What would that solve then
| to switch to MT?
| adwn wrote:
| > _TLB misses? They are just a detail of particular CPU
| implementation, and the architectures change._
|
| TLBs are "just a detail" of roughly 100% of server,
| desktop, and mobile CPUs.
|
| > _Also, aren 't they per core and not per process? What
| would that solve then to switch to MT?_
|
| TLB entries are per address space. Threads share an address
| space, processes do not.
| tragomaskhalos wrote:
| Worked on a codebase which was separate processes, each of which
| has a shedload of global variables. It was a nightmare working
| out what was going on, not helped by the fact that there was no
| naming convention for the globals, plus they were not declared in
| a single place. I believe their use was a performance move, ie
| having the linker pin a var to a specific memory location rather
| than copying it to the stack as a variable and referencing it by
| offset the whole time. Premature optimisation? Optimisation at
| all? Who knows, but there's a good reason coding standards
| typically militate against globals.
| TomMasz wrote:
| There are _2000_ globals here, so more like a couple of
| shedloads. While this is something you 'd sort of expect for a
| product that's been around 30+ years, it really seems like
| there's a lot of optimization that could happen and still stick
| with the process model.
| JdeBP wrote:
| Per discussion on this very page, in the headlined article, and
| in the mailing list discussion it references, PostgreSQL is not
| in that category. It has lots of _static storage duration_
| variables, which _do not_ necessarily have external linkage.
|
| Robert Haas pointed out in one message that an implementation
| pattern was to use things like file-scope static storage
| duration variables to provide session-local state for
| individual components. This is why they've been arguing against
| a single giant structure declared in "session.h" as an
| approach, as it requires every future addition to session state
| to touch the central core of the entire program.
|
| They want to keep the advantage of the fact that these
| variables are in fact _not global_. They are local; and the
| problem is rather that they have static storage duration and
| are not per-thread, and thus are not per-session in a thread-
| per-session model.
| ajkjk wrote:
| There's something to be said for globals whose access is well-
| managed, though.
|
| IMO: if the variable is _truly_ global, i.e. code all over the
| codebase cares about it, then it should just be global instead
| of pretending like it's not with some fancy architecture.
|
| The tricky part is reacting to changes to a global variable.
| Writing a bunch of "on update" logic leads to madness. The
| ideal solution is for there to be some sort of one-directional
| flow for updates, like when a React component tree is re-
| rendered... but that's very hard to build in an application
| that doesn't start out using a library like React in the first
| place.
| shariat wrote:
| this will be the beginning of the end of postgres
| haburka wrote:
| That sounds like really hard programming. I'm glad I write react
| and get paid possibly much more.
| paulddraper wrote:
| We're glad you're writing React too :)
| Exuma wrote:
| I feel this sort of undertaking could only be done by those
| programmers who truly value domain knowledge above all else
| (money, etc). I'm more of the entrepreneureal mind so I
| generally only learn as much as needed to do some task (even if
| it's very difficult), but just seeking information as a means
| to an end doesn't feel fulfilling to me. Of course many people
| DO find that, and its upon those people's shoulders that heroic
| things like this rest, and I'm very thankful to them.
| cryptica wrote:
| Seems like a bad idea. Processes are more elegant and scalable
| than threads as they discourage the use of shared memory. Shared
| memory is often a bad idea. You end up with different threads
| competing and queuing up to access or write the same data (e.g.
| waiting on each other to acquire a lock with mutexes - This
| immediately disqualifies the system from becoming embarrassingly
| parallel) and it becomes the OS's problem to figure out when to
| allow which thread to access what memory... This is bad because
| the OS doesn't care about optimizing memory access for your
| specific use case. It will treat your 'high performance' database
| in the same way as it treats a run-of-the-mill Gimp desktop
| application....
|
| With the process model, it encourages using separate memory for
| each process; this forces developers to think about things like
| memory consistency and availability and gives them more
| flexibility in terms of scalability across multiple CPU cores or
| even hosts. Processes are far better abstractions than threads
| for modeling concurrent systems since their logic is
| fundamentally the same regardless of whether they run across
| different CPU cores or different hosts.
|
| > The overhead of cross-process context switches is inherently
| higher than switching between threads in the same process
|
| I remember researching this a while back. It depends on the
| specific OS and hardware. It's not so straight forward and this
| is something which tends to change over time and the differences
| are usually insignificant anyway.
|
| Also, it's important not to conflate performance with scalability
| - These two characteristics are orthogonal at best and oftentimes
| conflicting.
|
| Oftentimes, to scale horizontally, a system needs to incur a
| performance penalty as additional work is required to route and
| coordinate actions across multiple CPUs or hosts. A scalable
| system can service a much larger number (or even sometimes
| theoretically unlimited) number of requests but it will typically
| perform worse than a non-scalable system if you judge it on a
| requests-per-CPU-core basis.
| uudecoded wrote:
| I am going to go ahead and trust Tom Lane on this one, over
| someone who is working on "serverless Postgres". Godspeed to the
| forthcoming fork.
| aseipp wrote:
| Heikki Linnakangas is one of the top Postgres contributors of
| all time, he isn't just "someone." The fact he's working for a
| startup on a fork (that already exists, which you can run right
| now on your local machine) doesn't warrant any snide dismissal.
| Robert Haas admitted that it would be a huge amount of work and
| that it would only be achievable by a small few people anyway,
| Heikki being among them.
|
| Anyway, I think there are definitely limits that are starting
| to appear with Postgres in some spots. This is probably one of
| the most difficult possible solutions to some of those
| problems, but even if they don't switch to a fully threaded
| model, being more CPU efficient, better connection handling,
| etc will all go a substantial way. Doing some of the really
| hard work is better than none of it, probably.
| selimnairb wrote:
| I wonder what AWS's PostgreSQL-compatible Aurora looks like
| under the hood. Does it use threading, processes, both?
| winrid wrote:
| At the end of the day, this doesn't solve any problems. Small
| setups use postgres directly just fine, and large setups use
| pgbouncer, and having process isolation with extensions is a good
| thing and probably simplifies things a lot.
|
| My $0.02
| rburhum wrote:
| Sorry if I offend anybody, but this sounds like such a bad idea.
| I have been running various versions of postgres in production
| for 15 years with thousands of processes on super beefy machines,
| and I can tell you without a doubt that sometimes those processes
| crash - specially if you are running any of the extensions.
| Nevertheless, Postgres has 99% of the time proven to be
| resilient. The idea that a bad client can bring the whole cluster
| down because it hit a bug sounds scary. Every try creating a
| spatial index on thousands/millions of records that have nasty
| overly complex or badly digitized geometries? Sadly, crashes are
| part of that workflow, and changing this from process to
| threading would mean all the other clients also crashing and
| cutting connections. This as a potential problem because I want
| to avoid context switching overhead or cache misses, no thanks.
| mike_hock wrote:
| Also, reducing context switching overhead (or any other CPU
| overhead) is probably not gonna fix the garbage I/O
| performance.
| Shorel wrote:
| Reading your comment makes me think it is not only a good idea,
| it is a necessity.
|
| Relying on crashing as a bug recovery system is a good idea?
| Crashing is just part of the workflow? That's insane, and a
| good argument against PostgreSQL in any production system.
|
| It is possible PostgreSQL doesn't migrate to a thread based
| model, and I am not arguing they should.
|
| But debug and patch the causes of these crashes? Absolutely
| yes, and the sooner, the better.
| dialogbox wrote:
| It's all about trade off.
|
| Building a database which is never gonna crash might be
| possible but at what cost? Can you name any single real world
| system archived that? Also, there can be a regression. More
| tests? Sure but again, at what cost?
|
| While we are trying to get there, having a crash proof
| architecture is also a very practical approach.
| bhaney wrote:
| > Relying on crashing as a bug recovery system is a good
| idea? Crashing is just part of the workflow? That's insane
|
| Erlang users don't seem to agree with you
| mindwok wrote:
| They are still debugging and patching the causes. The crash
| detection is just to try and prevent a single bug from
| bringing down the whole system.
| paulddraper wrote:
| Cars are designed with airbags?!
|
| Like, they are _supposed_ to crash?!?
| anarazel wrote:
| We do fix crashes etc, even if the postgres manages to
| restart.
|
| I think the post upthread references an out-of-core extension
| we don't control, which in turn depends on many external
| libraries it doesn't control either...
| fabian2k wrote:
| A database has to handle situations outside its control, e.g.
| someone cutting the power to the server. That should not
| result in a corrupted database, and with Postgres it doesn't.
|
| The fundamental problem is that when you're sharing memory,
| you cannot safely just stop a single process when
| encountering an unexpected error. You do not know the current
| state of your shared data, and if it could lead to further
| corruption. So restarting everything is the only safe choice
| in this case.
| rtpg wrote:
| We don't want stuff to crash. But we also want data integrity
| to be maintained. We also want things to work. In a world
| with extensions written in C to support a lot of cool things
| with Postgres, you want to walk and chew bubblegum on this
| front.
|
| Though to your point, a C extension can totally destroy your
| data in other ways, and there are likely ways to add more
| barriers. And hey, we should fix bugs!
| dan-robertson wrote:
| Is the actual number you got 99%? Seems low to me but I don't
| really know about Postgres. That's 3 and a half days of
| downtime per year, or an hour and a half per week.
| dfox wrote:
| Well, hour and half per week is the amount of downtime that
| you need for modestly sized database (units of TB) accessed
| by legacy clients that have ridiculously long running
| transactions that interfere with autovacuum.
| zeroimpl wrote:
| However, it's already the case that if a postgres process
| crashes, the whole cluster gets restarted. I've occasionally
| seen this message: WARNING: terminating
| connection because of crash of another server process
| DETAIL: The postmaster has commanded this server process to
| roll back the current transaction and exit, because another
| server process exited abnormally and possibly corrupted shared
| memory. HINT: In a moment you should be able to
| reconnect to the database and repeat your command. LOG:
| all server processes terminated; reinitializing
| niccl wrote:
| yes, but postmaster is still running to roll back the
| transaction. If you crash a single multi-threaded process,
| you may lose postmaster as well and then sadness would ensue
| jtc331 wrote:
| If you read the thread you'd see the discussion includes
| still having e.g. postmaster as a separate process.
| mattashii wrote:
| The threaded design wouldn't necessarily be single-process,
| it would just not have 1 process for every connection.
| Things like crash detection could still be handled in a
| separate process. The reason to use threading in most cases
| is to reduce communication and switching overhead, but for
| low-traffic backends like a crash handler the overhead of
| it being a process is quite limited - when it gets
| triggered context switching overhead is the least of your
| problems.
| Yoric wrote:
| Seconded. For instance, Firefox' crash reporter has
| always been a separate process, even at the time Firefox
| was mostly single-process, single-threaded. Last time I
| checked, this was still the case.
| cyberax wrote:
| PostgreSQL can recover from abruptly aborted transactions
| (think "pulled the power cord") by replaying the journal.
| This is not going to change anyway.
| cogman10 wrote:
| Transaction roll back is a part of the WAL. Databases write
| to the disk an intent to change things, what should be
| changed, and a "commit" of the change when finished so that
| all changes happen as a unit. If the DB process is
| interrupted during that log write then all changes
| associated with that transaction are rolled back.
|
| Threaded vs process won't affect that.
| dfox wrote:
| Running the whole DBMS as a bunch of threads in single
| process changes how fast is the recovery from some kind
| of temporary inconsistency. In the ideal world, this
| should not happen, but in reality it does and you do not
| want to bring the whole thing down because of some
| superficial data corruption.
|
| On the other hand, all cases of fixable corrupted data in
| PostgreSQL I have seen were result of somebody doing
| something totally dumb (rsyncing live cluster, even
| between architectures), while on InnoDB it seems to
| happen somewhat randomly without any obvious reason of
| somebody doing stupid things.
| anarazel wrote:
| We would still have a separate process doing that part of
| postmaster's work.
| tracker1 wrote:
| You can still have a master control process separate from
| the client connections.
| moonchrome wrote:
| Restart on crash doesn't sound that difficult to do.
| lelanthran wrote:
| > However, it's already the case that if a postgres process
| crashes, the whole cluster gets restarted. I've occasionally
| seen this message:
|
| Sure, but the blast radius of corruption is limited to that
| shared memory, not all the memory of all the processes. You
| can at least use the fact that a process has crashed to
| ensure that the corruption doesn't spread.
|
| (This is why it restarts: there is no guarantee that the
| shared memory is valid, so the other processes are stopped
| before they attempt to use that potentially invalid memory)
|
| With threads, _all_ memory is shared memory. A single thread
| that crashes can make other threads data invalid _before_ the
| detection of the crash.
| BoardsOfCanada wrote:
| I recently looked through the source code of postgresql and every
| source files starts with a (really good) description of what the
| file is supposed to do, which made it really easy to get in to
| the code compared to other open source projects I've seen. So
| thanks for that.
| nonethewiser wrote:
| Uncle Bob hates this.
| Alekhine wrote:
| I have no idea why that isn't standard practice in every
| codebase. I should be able to figure out your code without
| having to ask, or dig through issues or commit messages. Just
| tell me what it's for!
| nologic01 wrote:
| the average programmer thinks they are writting significantly
| above average clean code, so no need to document it :-)
| ComputerGuru wrote:
| It kind of is in rust now, with module-level documentation
| given its own specific AST representation instead of just
| being a comment at the top of the file (a file is a module).
| SoylentOrange wrote:
| Because it takes a lot of time and because the comments can
| get outdated. I also want this for all my code bases. But do
| I always do this myself? No, especially on green field
| projects. I will sometimes go back and annotate them later.
| mhh__ wrote:
| They can get outdated but they usually don't. It's a good
| litmus test for if a file is too big / small if it's
| purpose is hard to nail down.
| withinboredom wrote:
| Even outdated comments can tell you the original purpose of
| the code, which helps if you're looking for a bug.
| Especially if you're looking for a bug.
|
| If someone didn't take the time to update the comments and
| the reviewers didn't point it out, then you've probably
| found the bug because someone was cowboying some shitty
| code.
| alex_reg wrote:
| I have the opposite experience.
|
| Outdated comments are often way worse than no comments,
| because they can give you wrong ideas that aren't true
| anymore, and send you off in the wrong direction before
| you finally figure out the comment was wrong.
| elteto wrote:
| Indeed. I recently found this piece of code:
| if (X) assert(false); // we never do X, ever, anywhere.
|
| Then I look over to the other pane, where I have a
| different, but related file open: if
| (exact same X) { do_useful_stuff(); }
|
| It got a chuckle out of me.
| jgilias wrote:
| Did you update the comment? :-)
| withinboredom wrote:
| // there are two kinds of mutually exclusive commentors
|
| enum kinds { writers; readers; updaters; }
| akira2501 wrote:
| Trying to understand what I previously wrote and why I
| wrote it takes more time than I ever care to spend. I'd
| much rather have the comments, plus at this point, by
| making them a "first class" part of my code, I find them
| much easier to write and I find the narrative style I use
| incredibly useful in laying out a new structure but also in
| refactoring old ones.
| cmrdporcupine wrote:
| It sounds like the specific concerns here are actually around
| buffer pool management performance in and around the TLB: _" Once
| you have a significant number of connections we end up spending a
| *lot* of time in TLB misses, and that's inherent to the process
| model, because you can't share the TLB across processes. "_
|
| Many of the comments here seem to be missing this and talking
| about CPU-boundedness generally and thread-per-request vs process
| etc models, but this seems orthogonal to that, and is actually
| quite specific about the VM subsystem and seems like a legitimate
| bottleneck with the approach Postgres has to take for buffer/page
| mgmt with the process model it has now.
|
| I'm no Postgres hacker (or a Linux kernel hacker), and I only did
| a 6 month stint doing DB internals, but it _feels_ to me like
| perhaps the right answer here is that instead of Postgres getting
| deep down in the weeds refactoring and rewriting to a thread
| based model -- with all the risks in that that people have
| pointed out -- some assistance could be reached for by working on
| specific targeted patches in the Linux kernel?
|
| The addition of e.g. userfaultfd shows that there is room for
| innovation and acceptance of changes in and around kernel re:
| page management. Some new flags for mmap, shm_open, etc. to
| handle some specific targeted use cases to help Postgres out?
|
| Also wouldn't be the first time that people have done custom
| kernel patches or tuning parameters to crank performance out of a
| database.
| thesnide wrote:
| Exactly my thinking. If the problem is TLB evictions, why not
| improve these? PG isn't the only software that is hit by those.
|
| And if you carefully craft your CPU scheduling and address
| spaces mapping, you can reduce them by a lot.
|
| Yet rewrites are always easier and more sexy. At first.
| anarazel wrote:
| The TLB issue are more a hardware issue than a software/OS
| one. To my knowledge neither x86 nor arm provide a way to
| _partially_ share TLB contents between processes /entities.
| Tlb entries can be tagged with a process context identifier,
| but that's an all or nothing thing. Either the entire address
| space is shared, or it's not.
|
| > Yet rewrites are always easier and more sexy. At first.
|
| Moving to threads would not at all be a rewrite.
| eclipticplane wrote:
| For the record, I think this will be a disaster. There is far
| too much code that will get broken, largely silently, and
| much of it is not under our control. regards,
| tom lane
|
| (via https://lwn.net/ml/pgsql-
| hackers/4178104.1685978307@sss.pgh....)
|
| If Tom Lane says it will be a disaster, I believe it will be a
| disaster.
| abhibeckert wrote:
| Reminds me of PHP 6...
|
| For those who don't follow PHP closely - that version was an
| attempted refactor of the string implementation which
| essentially shut down nearly all work on PHP for a decade,
| stagnating the language until it became pretty terrible
| compared to other options. They finally gave up and started
| work on PHP 7 which uses the (perfectly good) PHP 5 strings.
|
| Ten years of wasted time by the best internal PHP developers
| crippled the project - I'm amazed it survived at all.
| progmetaldev wrote:
| I've used PHP in the past (PHP 4 and 5), as well as some
| simple templated projects in PHP 7. I try to keep up on news
| with what is happening in the PHP world, and it's difficult
| because of the hate for the language. Is the solution to
| Unicode strings still to just use the "mb_*" functions?
|
| I got my real professional start using PHP, and have built
| even financial systems in the language (since ported to .NET
| 6 for my ease of maintenance, and better number handling).
| I'm still very interested in the language itself, in case I
| ever have the need to freelance or provide a solution to a
| client that can't afford what I can build in .NET (although
| to be honest, at this point I'm roughly able to code at the
| same speed in .NET as in PHP, but with the added type-safety,
| although I know PHP has really stepped up in providing this).
| Hayvok wrote:
| I believe so - most (all?) string functions have an mb_
| equivalent, for working on multibyte strings.
|
| Regular PHP strings are actually pretty great, since you
| can treat them like byte arrays. Fun fact: PHPs streaming
| API has an "in-memory" option and it's... just a string
| under the hood.
|
| Just don't forget to use multibyte functions when you're
| handling things like user input.
| robomc wrote:
| I have the "Professional PHP6" book which I feel like should
| be a collectors item or something.
|
| Weird book IMO, because it has _a lot_ of content that 's
| just about general software development, rather than anything
| to do with PHP specifically, or the theoretical PHP6 APIs in
| particular.
| pmontra wrote:
| PHP used to be the first computer language learned by
| people wanting to create a scripted web page. This was more
| true in the 90s but maybe it stuck. So it would be OK to
| add some general guidance about writing software and
| organizing projects.
| SoftTalker wrote:
| _Things You Should Never Do_
|
| https://www.joelonsoftware.com/2000/04/06/things-you-
| should-...
|
| An oldie but a goodie
| h0l0cube wrote:
| > Well, yes. They did. They did it by making the single
| worst strategic mistake that any software company can make:
|
| > They decided to rewrite the code from scratch.
|
| Absolutely not what's being proposed for Postgres.
| yxre wrote:
| Process isolation affects so many things in C. The
| strategy change is going to require changes to so many
| modules that it will either be a re-write or buggy.
|
| In practical terms, if every line needs to be audited and
| updated, it is a re-write
| h0l0cube wrote:
| > either be a re-write or buggy
|
| A large refactor at best. It will touch lots of parts of
| the code base, but the vast majority of the source code
| would remain intact. Otherwise they could just Rewrite it
| in Rust(tm) while they're at it
|
| > if every line needs to be audited and updated, it is a
| re-write
|
| I'm not sure why you believe every line needs to be
| updated. Most code is thread agnostic.
| anarazel wrote:
| What makes you think that it will require that many
| changes? There will be some widespread mechanical changes
| (which can be verified to be complete with a bit of low
| level work, like a script using objdump/nm to look for
| non-TLS mutable variables) and some areas changing more
| heavily (e.g. connection establishment, crash detection,
| signal handling, minor details of the locking code). But
| large portions of the code won't need to change. Note
| that we/postgres already shares a lot of state across
| processes.
| giraffe_lady wrote:
| I'm not the person you asked and I don't have any
| particular knowledge of postgres internals.
|
| Experience with other systems has taught me that in a
| system that's been in active use and development for
| decades, entanglement will be deep, subtle, and
| pervasive. If this isn't true of postgres then it's an
| absolute freak anomaly of a codebase. It is that in other
| ways, so it's possible.
|
| But the article mentions there being thousands of global
| variables. And Tom Lane himself says he considers it
| untenable for exactly this reason. That's a _very_ good
| reason to think that it will require that many changes
| imo.
| h0l0cube wrote:
| > that many changes
|
| The 'that many changes' in question is a complete
| rewrite. Many changes across many files, yes, but nothing
| even approaching a rewrite.
|
| > I don't have any particular knowledge of postgres
|
| Judging by their bio, the person you're replying to does.
| bandrami wrote:
| The lesson that "cruft is problems someone solved before
| you" is unfortunately entirely lost on most devs today
| HeavyStorm wrote:
| Love this article. Completely changed the way I think about
| certain projects.
| IgorPartola wrote:
| I think this is a great article that takes a maximalist
| point and that's its flaw.
|
| You should rewrite code only when the cost of adding a new
| feature (one that is actually necessary) to the old
| codebase becomes comparable to designing your entire system
| from scratch to allow for that feature to be added easily.
| That is to say that the cost of the rewrite should become
| comparable to the cost of continuing development. I have
| been a part of a couple of rewrites like that, one of them
| quite complex, and yes they were warranted and yes they
| worked.
|
| But having said that you should absolutely be conservative
| with rewriting code. It's a bad habit to always jump to a
| rewrite.
| devjab wrote:
| I think it's very dependent on how you use words like
| "rewrite" or "refactor". The point the author makes about
| the two page function, and all the bug-fixes (lessons
| learned) makes sense only if you "rewrite" from scratch
| without looking at the history. You can absolutely
| "rewrite" the function in a manner that is "refactoring",
| but will often get called "rewrite" in the real world.
| This may be because "refactor" is sort of this English CS
| term that doesn't have a real translation or usage in
| many languages and "rewrite" is sort of universal for
| changing text, but in CS is sort of "rebuilding" things.
|
| I don't think you necessarily need to be conservative
| about rewriting things. We do it all the time in fact. We
| build something to get it out there and see the usage,
| and then we build it better and then we do it again.
| Which often involves a lot of "rewriting" but thanks to
| principles like SOLID's single responsibility makes this
| rather easy to both do and maintain (we write a lot of
| semi-functional code and try to avoid using OOP unless
| necessary, so we don't really use all the parts of SOLID
| religiously).
|
| I do agree that it's never a good idea to get into things
| with the mind-set of "we can do this better if we start
| from scratch" because you can't.
| giraffe_lady wrote:
| There's currently a trend towards shitting on
| microservices-everything, imo largely justified. But
| missing from that is that identifying a logical feature
| and moving it to a microservice is one of the safer ways
| to begin a gradual rewrite of a critical system. _And_
| usually possible to get cross-dept buyin for various not
| always wholesome reasons. It may not always be the best
| technical solution but it 's often the best political one
| when a rewrite is necessary.
| pphysch wrote:
| The issue here isn't "rewriting" per se but "stopping
| development".
|
| You shouldn't stop development on your important products.
|
| Letting a couple of your talented programmers loose on a
| greenfield reimplementation is a perfectly sane strategic
| move.
|
| Stopping development on important products because you are
| 100% certain that the reimplementation will be successful
| by $DEADLINE is a foolish gamble.
| devjab wrote:
| Isn't that how you end up with Python 2 and 3 though?
| pphysch wrote:
| Yes, that was a rough migration process, but the long-
| term result is we have an improved language and growing
| community instead of Python going the way of PHP and
| Perl.
|
| Between the 3 P's, Python's strategic decisions in 2000s
| were clearly the most successful.
|
| And it wasn't a total rewrite.
| wiz21c wrote:
| Python3 changes were many "little" things; some more
| fundamental than other (unicode str). So I guess they
| were able to split the work in tiny pieces and,
| ultimately, were able to manage the project...
| paulddraper wrote:
| The problem is if it snowballs
| pphysch wrote:
| Any project, whether it's a "rewrite" or not, can succumb
| to scope creep and poor project management.
| hurril wrote:
| The big problem there is that the people you are letting
| loose on the alternative, are lost from the original, so
| O loses steam that A gains. You still have to produce bug
| fixes and features to _both_ O and A to keep them in
| sync. So you essentially have a doubled required
| production rate to be delivered using the same staff.
|
| So in order for there to be a net gain, the gang working
| on the alternative have to be able to find such big wins
| as to being neigh impossible.
|
| This is a very very hard problem in our domain. 99% of
| the time, we have to simply resist the urge to _just
| rewrite the sucker_. No! Don't do it! (And this is
| incredibly hard because we all want to.)
| pphysch wrote:
| Getting some _Mythical Man Month_ vibes here.
| Productivity isn 't a zero-sum game.
| zilti wrote:
| Implying it had ever not been terrible compared to other
| options
| kqr wrote:
| On the other hand, there's also the case of the Lunar Module
| guidance software that was hard-coded to run exactly every
| two seconds. If the previous subroutine call was still
| running when the next one was due, the previous one was
| harshly terminated (with weird side effects).
|
| One of the main programmers suggested making it so that the
| next guidance routine wouldn't run until the previous one was
| done. This would make the code less sensitive to race
| conditions and allow more useful functionality for the pilots
| (who were the actual users and did seem to want it). However
| everyone assumed the two-second constant was implicitly
| embedded everywhere.
|
| It wasn't -- only in a few places -- and with that fixed the
| code got more general and the proof of concept ran better
| than ever in about every simulator available. The amount of
| control it gave pilots was years ahead of the curve. But it
| never got a chance to fly on a real mission because what was
| there was "good enough" and nobody bothered to try.
|
| In our combined comments there's a lesson about growing
| experiments and figuring out how to achieve failure quickly.
| idiomaticrust wrote:
| He is right. Such rewrites cause a lot of problems if your
| compiler doesn't help you with avoiding data races.
|
| But there is another way.
| zilti wrote:
| Indeed, Zig is a nice language for this
| hnarn wrote:
| > But there is another way.
|
| Ok?
| mycall wrote:
| Microsoft SQL Server has SQLOS which is another way [0].
|
| [0] https://www.thegeekdiary.com/what-is-sql-server-
| operating-sy...
| carstenhag wrote:
| The person probably implied that Postgres should switch to
| another toolchain that guarantees more things at compile
| time, so probably Rust.
| bb88 wrote:
| You can take a chunk of code and just rewrite it in Rust.
| You'll learn a lot quickly by this.
| steve_adams_86 wrote:
| It's sort of like the inverse of the Matrix when Neo
| learns kung fu. You realize that you actually don't know
| how to program :)
| tylerhou wrote:
| The boundaries within database code are not clear. There
| are too many interlocking parts to take a nontrivial
| chunk and rewrite it Rust.
| blincoln wrote:
| If the existing code is old-school enough to use
| thousands of global variables in a thread-unsafe way,
| seems like changing it enough to compile as safe Rust
| code would push the "non-trivial" envelope pretty far.
| chc wrote:
| I think it's meant to imply the solution given in their
| username ("idiomatic Rust").
| lelanthran wrote:
| > I think it's meant to imply the solution given in their
| username ("idiomatic Rust").
|
| I think "Idiom: a tic (Rust)" can also fit if I squint
| hard enough and decide it looks like a definition from an
| online dictionary :-)
| avgcorrection wrote:
| Don't mind the gimmick gallery (username).
| [deleted]
| lenkite wrote:
| Feel like the PostgreSQL Core Team should just build a new
| database from scratch using what they have learned from
| experience instead of attempting such a fundamental
| architectural migration. It would give them more freedom to
| change things also. Call it "postgendb" and provide a data
| migrator.
| rdevsrex wrote:
| That's a great idea. I've been considering whether or not to
| use Cockroach Db at work, and I love the fact that it's
| distributed from the get go.
|
| Why not work on something like that instead of changing
| something that works? Especially since they the process model
| really only runs into trouble on large systems.
| kristiandupont wrote:
| Yeah.
|
| Without being familiar with the Postgres source, this seems to
| be what I call a "somersault problem": hard to break down into
| sub-goals. I have heard that the Postgres codebase is solid
| which makes it easier but it's still mature and highly complex.
| It doesn't sound feasible to me.
|
| https://kristiandupont.medium.com/somersault-problems-69c478...
| seedless-sensat wrote:
| The original post does describe several sub-problems. The
| group could first chip away at global state, signals,
| libraries. They can do this before changing the process model
| in any way.
| kristiandupont wrote:
| Good point.
| osigurdson wrote:
| Heikki Linnakangas has a good understanding of Postgres as well
| however. We all want Postgres to be competitive with numbers of
| connections, don't we?
| gremlinsinc wrote:
| Maybe a better option would be finding a team to create nugres,
| aka a fork for this and other experiments. So that mainline
| remains stable.
| mattashii wrote:
| There are several forks of PostgreSQL, in various levels of
| license, additional features and activity. However,
| maintaining a fork in addition to a main project is
| inherently more expensive than maintaining just a single
| project, so adding features to new major releases of the main
| project is generally preferred over forking every release
| into its own, newly named, project. After all, that is what
| we have major (feature) releases and stabalization windows
| (beta releases) for.
| j16sdiz wrote:
| This won't work well for a multiyear project.. Either you
| have to stall the release process, divide it into smaller
| parts or fork.
| datavirtue wrote:
| This should be considered a research effort, assuming it will
| be a complete rewrite. In light of that, you should not draw
| down resources from the established code base to work on it.
|
| Ignoring the above, first state the explicit requirements
| driving this change and let people weigh in on those. This
| sounds like a geeky dev itch.
| duped wrote:
| That's an awful message with the only sensible reply.
| kccqzy wrote:
| I don't expect you or others to buy into any particular code
| change at this point, or to contribute time into it.
| Just to accept that it's a worthwhile goal. If the
| implementation turns out to be a disaster, then it
| won't be accepted, of course. But I'm optimistic.
|
| The reply is much more reasonable than this blanket assertion
| of a disaster.
| giraffe_lady wrote:
| As an outsider it doesn't sound like something a few people
| could spin off in a branch in a couple months and see how
| code review goes. They're talking about doing it over
| multiple (yearly?) releases. It seems like it'll take a lot
| of expert attention, which won't be available for other work
| and the changes themselves will impact all other ongoing
| work.
|
| I'm not trying to naysay it per se, bc again I don't have
| technical knowledge of this codebase. But that's exactly the
| sort of scenario that can cause a large project to splinter
| or stall for years. Talking about "the implementation" absent
| the context that would be necessary to _create_ that
| implementation seems naively optimistic, or at worst
| irresponsible.
| lbriner wrote:
| You are talking about implementation, the OP was talking
| about raising the concept with interested parties and
| seeing whether it is worth even _starting_ to think about
| it.
|
| They could fork, they could add threading to some sub
| systems and roll it out over several versions.
|
| I don't know enough about the code but, of course, it is a
| hard problem but the solution might be to build it from the
| ground up as a threaded system, using the skills learned
| over 30 years and taking the hit on the rebuild instead of
| reworking what is there.
|
| I am most interested because I didn't realise there was a
| performance problem in the first place.
| axman6 wrote:
| Am I going crazy, or has the obvious implementation of
| such a change been missed on people? If they were
| proposing taking a multi-threaded app and splitting it
| into a multi-process one, I would predict they would find
| a hell of a lot of unexpected or unknown implicit
| communication between threads, which would be a nightmare
| to untangle.
|
| Going the other way, there is an extremely well
| understood interface between all the processes which run
| in isolation: shared memory. Nearly by definition this
| must be well coordinated between the processes.
|
| So the first step in moving to a multi-threaded
| implementation would be to change nearly nothing about
| each process, and then just run each process in its own
| pthread, keeping all the shared memory 'n all.
|
| You would expect performance to be about the same, maybe
| a little better with the reduces TLB churn, but the
| architecture is basically unchanged. At that point, you
| can start to look at what are more appropriate
| communication/synchronisation mechanisms now you're
| working in the same address space.
|
| I just don't understand why so many people seem to think
| this requires an enormous rewrite - having developed as a
| multi-process system means you've had to make so much of
| the problematic things explicit and control for them, and
| none of these threads would know anything at all about
| each other's internals.
| jasonhansel wrote:
| I'm rather surprised that their focus is on improving vertical
| scalability, rather than on adding more features for scaling
| Postgres horizontally.
| tracker1 wrote:
| If you're more interested in horizontal scaling, you may want
| to look into CockroachDB, which has a Postgres compatible
| protocol, but still quite different. There are a lot more
| limitations with CDB over Pg though.
|
| With the changes suggested, I'm not sure it's the best idea
| from where Postgres is... if might be an opportunity to rewrite
| bits in Rust, but even then, there is a _LOT_ that can go
| wrong. The use of shared memory is apparently already in place,
| and the separate process and inter-process communication isn 't
| the most dangerous part... it's the presumption, variables and
| other contextual bits that are currently process globals that
| wouldn't be in the "after" version.
|
| The overall surface is just massive... That doesn't even get
| into plugin compatibility.
| mynonameaccount wrote:
| "the benefits would not justify the cost". PostgreSQL, like any
| software, at some point in it's life need to be refactored. Why
| not refactor with a thread model. Of course there will be bugs.
| Of course it will be difficult. But I think it is a worthwhile
| endeavor. Doesn't sound like this will happen but a new project
| would be cool.
| timtom39 wrote:
| > like any software, at some point in it's life need to be
| refactored.
|
| This is simply not true for most software. Software has a
| product life cycle like everything else and major
| refactors/rewrites should be weighed carefully against
| cost/risk of the refactor. Many traditional engineering fields
| do much better at this analysis.
|
| Although, because I run a contracting shop, I have personally
| profited greatly by clients thinking this is true and being
| unable to convince them otherwise.
| smsm42 wrote:
| "Difficult" doesn't even begin to do it justice. Making a code
| which has 2k global variables and probably order of magnitude
| as many underlying assumptions (the code should know that now
| every time you touch X you may be influenced or influence all
| other threads that may touch X) is a gargantuan task, and will
| absolutely for sure involve many iterations which any sane
| person would never let anywhere near valuable data (and how
| long would it take until you'd consider it safe enough?). And
| making this all performant - given that shared-state code
| requires completely different approach to thinking about
| workload distribution, something that performs when running in
| isolated processes may very well get bogged down in locking or
| cache races hell when sharing the state - would be even harder.
| I am not doubting Postgres has some very smart people - much,
| much smarter than me, in any case - but I'd say it could be
| more practical to write new core from scratch than trying to
| "refactor" the core that organically grew for decades with
| assumptions of share-nothing model.
| djur wrote:
| What you're talking about is a rewrite, not a refactor.
| gremlinsinc wrote:
| a better option would just create an experimental fork that has
| a different name and is obviously a different product but based
| on the original source. That way pg gets updates and remains
| stable and if they fail, they fail and it doesn't hurt all the
| pg in production.
| wielebny wrote:
| Having been using and administering a lot of PostgreSQL servers,
| I hope they don't lose any stability over this.
|
| I've seen (and reported) bugs that caused panics/segfaults in
| specific psql processes. Not just connections, also processes
| related to wal writing or replication. The way it's built right
| now, a child process can be just forced to quit and it does not
| affect other processes. Hopefully switching into thread won't
| force whole PostgreSQL to panic and shut down.
| tracker1 wrote:
| Most likely, the postmaster will maintain a separate process,
| much like today with pg, or similar to Firefox or Chrome's
| control process that can catch the panic'd process, cleanup and
| restart them. The WAL can be recovered as well if there were
| broken transactions in flight.
| jtc331 wrote:
| Because of shared memory most panics and seg faults in a worker
| process take down the entire server already (this wasn't always
| the case, but not doing so was a bug).
| vbezhenar wrote:
| Of course it will. That's better than continue working with
| damaged memory structures and unpredictable consequences. For
| database it's more important than ever. Imagine writing
| corrupted data because other thread went crazy.
| wizofaus wrote:
| You're implying that only an OS can provide memory separation
| between units of execution - at least in .NET AppDomains give
| you the same protection within a single process, so why
| couldn't postgres have its own such mechanism? I'd also think
| with a database engine shared state is not just in-memory -
| i.e. one process can potentially corrupt the behaviour of
| another by what it writes to disk, so moving to a single-
| process model doesn't necessarily introduce problems that
| could never have existed previously (but, yes, would arguably
| make them more likely)
| vbezhenar wrote:
| I don't know .NET enough to comment here, but I'm pretty
| sure that if you would manage to run bare metal C inside
| your .NET app (should be possible), it'll destroy all your
| domains easily. RAM is RAM. The only memory protection that
| we have is across process boundary (even that protection is
| not perfect with shared memory, but at least it allows to
| protect private memory).
|
| At least I'm not aware of any way to protect private thread
| memory from other threads.
|
| Postgres is C and that's not going to change ever.
| wizofaus wrote:
| I certainly wasn't suggesting it would make sense to
| rewrite Postgres to run on .NET (using any language, even
| managed C++, assuming anyone still uses that). Yes, it's
| inherent in the C/C++ language that it's able to randomly
| access any memory that a process has access to, and
| obviously on that basis OS-provided process-separation is
| the "best" protection you can get, just pointing out that
| it's not the only possibility.
| szundi wrote:
| For a decades old codebase probably only the OS can.
|
| Point is it getting worse if this is changed.
| SigmundA wrote:
| No AppDomains are not as good as processes, I have tried to
| go that route before, you cannot stop unruly code reliably
| in an app domain (you must use thread.abort() which is not
| good) and memory can still leak in any native code used
| there.
|
| The only reliable way to stop bad code like say an infinite
| loop is to run in another process even in .Net.
|
| They also removed Appdomain in later versions of .Net
| because they had little benefit and weak protections
| compared to a a full process.
| wizofaus wrote:
| Not claiming they're as good, just noting that there are
| alternative ways to provide memory barriers, though
| obviously if it's not enforced at the language/runtime
| level, it requires either super strong developer disciple
| or the use of some other tool to do so. I can't find
| anything suggesting AppDomains have been removed
| completely though, just they're not fully supported on
| non-Windows platforms, which is interesting, I wonder if
| that means they do have OS-level support.
| SigmundA wrote:
| https://learn.microsoft.com/en-
| us/dotnet/api/system.appdomai...
|
| "On .NET Core, the AppDomain implementation is limited by
| design and does not provide isolation, unloading, or
| security boundaries. For .NET Core, there is exactly one
| AppDomain. Isolation and unloading are provided through
| AssemblyLoadContext. Security boundaries should be
| provided by process boundaries and appropriate remoting
| techniques."
|
| AppDomains pretty much only allowed you to load unload
| assemblies and provided little else. If you wanted to
| stop bad code you still used Thread.Abort which left your
| runtime in a potentially bad state due to no isolation
| between threads.
|
| The only way to do something like an AppDomain to replace
| process isolation would be to re-write the whole OS in a
| memory safe language similar to
| https://en.wikipedia.org/wiki/Midori_(operating_system) /
| https://en.wikipedia.org/wiki/Singularity_(operating_syst
| em)
| wizofaus wrote:
| Is that saying global variables are shared between
| AppDomains on .NET core then? Scary if so, we have a
| bunch of .NET framework code we're looking at porting to
| .NET core in the near future, and I know it relies on
| AppDomain separation currently. It's not the first
| framework->Core conversation I've done, but I don't
| remember changes in AppDomain behaviour causing any
| issues the first time.
|
| As it happens I already know there are bits of code
| currently not working "as expected" exactly because of
| AppDomain separation - i.e. attempting to use a shared-
| memory cache to improve performance and in one or two
| cases in an attempt to share state, and I got the
| impression whoever wrote that code didn't understand that
| there even were two AppDomains involved, and used various
| ugly hacks to "fall back" to alternative means of state-
| sharing, but in fact the fall-back is the only thing that
| actually ever works.
| electroly wrote:
| > Is that saying global variables are shared between
| AppDomains on .NET core then?
|
| No, you can't create a second AppDomain _at all_.
| AppDomains are dead and buried; you would need to remove
| all of that from your code in order to migrate to current
| .NET. The class only remains to serve a couple ancillary
| functions that don 't involve actually creating
| additional AppDomains.
| wizofaus wrote:
| We're not creating them ourselves, they're created by
| IIS.
| dikei wrote:
| .NET is a managed-language with a VM. In such language, a
| memory error in managed-code will often trigger a jump back
| to the VM, where they can attempt to recover from there.
|
| For native code, there's no such safety net. Likewise, even
| for managed language, an error in the interpreter code will
| still crash the VM, since there's nothing to fallback to
| anymore.
| wizofaus wrote:
| True, if you're talking unrestricted native code, I'd
| essentially agree with the OP's implication that only the
| OS (and the CPU itself) is capable of providing that sort
| of memory protection. I guess I was just wondering what
| something like AppDomains in C might even look like (e.g.
| all global variables are implicitly "thread_local"), and
| how much could be done at compile-time using tools to
| prevent potentially "dangerous" memory accesses. I've
| never looked at the postgres source in any detail so I'm
| likely underestimating the difficulty of it.
| eastern wrote:
| 100%. Same here. There's a lot of baby in the processes, not
| just bathwater.
|
| As a longstanding PG dev/DBA who doesn't know much about its
| internals, I would say that they should just move connection
| pooling into the main product.
|
| Essentially, pgbouncer should be part of PG and should be able
| to manage connections with knowledge of what each connections
| is doing. That, plus, some sort of dynamic max connection
| setting based on what's actually going on.
|
| That'll remove almost all the dev/DBA pain from separate
| processes.
| [deleted]
| rbancroft wrote:
| Changing something so fundamental seems like it should be a
| rewrite.
| MR4D wrote:
| Changing the entire architecture of PG to suit the 0.1% of edge
| cases seems like a poor trade off.
|
| Are there a much larger percentage of users that really need
| this?
| doctor_eval wrote:
| I know I'm probably being naive about this, but is it stupid to
| ask if there's a way to make multi process work better on Linux -
| rather than "fixing" PG?
|
| I feel like the thread vs process thing is one of those
| pendulums/fads that comes and goes. I'd hate to see PG go down a
| rabbit hole only to discover the OS could be modified to make
| things go better.
|
| (I understand not all PG instances run on Linux, just using it as
| an example)
| chmod775 wrote:
| > I feel like the thread vs process thing is one of those
| pendulums/fads that comes and goes.
|
| In this context threads can be understood as processes that
| share the same address space and vice-versa processes as
| threads with separate address space.
|
| One gives you isolation, the other convenience and performance.
| Either can be desirable.
|
| What would you change about this?
| dikei wrote:
| That'll likely be an even bigger task, and harder to get into
| mainline kernel.
|
| Linux multi-process is already pretty efficient compared to
| Windows. However, multi-process is inherently less efficient
| than multi-thread due to more safety predicates / isolation
| guaranteed by the kernel, I feel lowering it might lead to more
| security issues, similar to how Hyper Threading triggered a
| bunch of issues with Intel Processors.
| doctor_eval wrote:
| Right - yeah I was really just wondering if some of the
| safety predicates could be reduced when there is a
| relationship between processes, such as the mitigations
| against cache attacks. I think the cache misses caused by
| multi-process were one of the reasons given that it's slower
| than threading. But I don't understand why this is
| necessarily the case given that the shared memory and
| executable text ultimately refer to the same data. But I
| suppose this would need to work with processor affinity and
| other elements to prevent the cache being knocked around by
| non-PG processes, and I guess this is one place where it
| starts getting complicated.
|
| That said, please understand that I'm just being curious - I
| really don't know what I'm talking about, I haven't built a
| Linux kernel or dabbled in Unix internals in like 20 years,
| but thanks for replying :) Postgresql is my favourite open
| source project and I'm spooked by the threading naysayers.
| anarazel wrote:
| The TLB is basically keyed by (address space, virtual
| address % granularity), or needs to be flushed entirely
| when switching between different views of the address space
| (e.g. switching between processes). Unless your address
| space is exactly the same, you're at least going to
| duplicate TLB contents. Leading to a lower hit rate.
|
| This isn't really an OS issue, more a hardware one,
| although potential hardware improvements would likely have
| to be explicitly utilized by operating systems.
|
| Note that the TLB issue is different from the data /
| instruction cache situation.
| papito wrote:
| This has Python 3 vibes.
| newaccount74 wrote:
| A big advantage of the process-based model is its resilience
| against many classes of errors.
|
| If a bug in PostgreSQL (or in an extension) causes the server to
| crash, then only that process will crash. Postmaster will detect
| the child process termination, and send an error message to the
| client. The connection will be lost, but other connections will
| be unaffected.
|
| It's not foolproof (there are ways to bring the whole server
| down), but it does protect against many error conditions.
|
| It is possible to trap on some exceptions in a threaded
| environment, but cleaning up after eg. an attempted NULL pointer
| dereference is going to be very difficult or impossible.
| anarazel wrote:
| We would still have a separate supervisor process of we moved
| connections to threads.
| baggy_trough wrote:
| I hope they are conservative about this, because even the
| smartest and best programmers in the world cannot create bug free
| multithreaded code.
| jerf wrote:
| I mentally snarked to myself that "obviously they should
| rewrite it in Rust first".
|
| Then, after more thought, I'm not entirely sure that would be a
| bad approach. I say this not to advocate for actually rewriting
| it in Rust, but as a way of describing how difficult this is.
| I'm not actually sure rewriting the relevant bits of the system
| in Rust _wouldn 't_ be easier in the end, and obviously, that's
| really, really hard.
|
| This is _really_ hard transition.
|
| I don't think multithread code quality should be measured in
| absolutes. There are things that are so difficult as to be
| effectively impossible, which is the lock-based approach that
| was dominant in the 90s, and convinced developers that it's
| just impossible difficult, but it's not multithreaded code
| that's impossibly difficult, it's lock-based multithreading.
| Other approaches range from doable to even not that hard once
| you learn the relevant techniques (Haskell's full immutability
| & Rust's borrow checker are both very solid), but of course
| even "not that hard" becomes a lot of bugs when scaled up to
| something like Postgres. But it's not like the current model is
| immune to that either.
| xwdv wrote:
| Nonsense, multithreaded code can be written as bug free as
| regular code. No need to fear.
| preordained wrote:
| It _can_ be. Anything can be. It is far more treacherous,
| though.
| taeric wrote:
| I think the point is that some mistakes in process based code
| are not realized as the bugs that they will be in threaded
| code?
| dboreham wrote:
| This is true. However, the blast radius may be smaller with a
| process model. Also recovering from a fatal error in one
| session could possibly be easier. I say this as a 30-year
| threading proponent.
| PhilipRoman wrote:
| I'm assuming you're referring to formally proven programs. If
| that's the case, do you have any pointers?
|
| Aside from the trivial while(!transactionSucceeded){retry()}
| loop, I have trouble proving the correctness of my programs
| when the number of threads is not small and finite.
| baggy_trough wrote:
| In theory, yes. In practice, no.
| ajkjk wrote:
| It is just harder.
| mmphosis wrote:
| _Concurrency isn't a "nice layer over pthreads" - the most
| important thing is isolation - anything that mucks up isolation
| is a mistake.
|
| -- Joe Armstrong_
|
| Threads are evil. https://www.sqlite.org/faq.html#q6
| https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-...
|
| Nginx uses an asynchronous event-driven approach, rather than
| threads, to handle requests.
| https://aosabook.org/en/v2/nginx.html
| http://www.kegel.com/c10k.html
| jrott wrote:
| The fact that they are planning on doing this across multiple
| releases gives me hope that they'll be cautious with this.
| johannes1234321 wrote:
| The code already is multithreaded. They have shared state just
| across multiple processes instead of threads within a process.
|
| They might even reduce complexity that way.
| usefulcat wrote:
| It's not the same at all for global variables, of which pgsql
| apparently has around a couple thousand.
|
| If every process is single threaded, you don't have to
| consider the possibility of race conditions when accessing
| any of those ~2000 global variables. And you can pretty much
| guarantee that little if any of the existing code was written
| with that possibility in mind.
| ants_a wrote:
| Those global variables would be converted to thread locals
| and most of the code would be oblivious of the change. This
| is not the hard part of the change.
| anarazel wrote:
| Postgres is already concurrent today. There's a _lot_ of shared
| state between the processes (via shared memory).
| mastax wrote:
| It would be interesting to have something between threads and
| processes. I'll call them heavy-threads for sake of discussion.
|
| Like light-threads, heavy-threads would share the same process-
| security-boundary and therefore switching between them would be
| cheap. No need to flush TLB, I$, D$.
|
| Like processes, heavy-threads would have mostly-separate address
| spaces by default. Similar to forking a process, they could share
| read-only mappings for shared libraries, code, COW global
| variables, and explicitly defined shared writable memory regions.
|
| Like processes, heavy-threads would isolate failure states. A C++
| exception, UNIX signal, segfault, etc. would kill only the heavy-
| thread responsible.
| mike_hearn wrote:
| There are some problems.
|
| 1. Mostly separate address spaces requires changing the TLB on
| context switch (modern hw lets it be partial). You could use
| MPKs to share a single address space with fast protection
| switches.
|
| 2. Threads share the global heap, but your heavy threads would
| require explicitly defined shared writeable memory regions, so
| presumably each one has its own heap. That's a fair bit of
| overhead.
|
| 3. Failure isolation is more complicated than deciding what to
| kill.
|
| The expand on the last point, Postgres _doesn 't_ isolate
| failures to a single process because they do share memory and
| might corrupt those shared memory regions. But even if you
| don't have shared memory failure recovery isn't always easy.
| Software has to be written specifically to plan for it. You can
| kill processes because everything in the OS is written around
| allowing for that possibility, for example, shells know what to
| do if a sub-process is killed unexpectedly. Killing a heavy
| thread (=process) is no good if the parent process is going to
| wait for a reply from it forever because it wasn't written to
| handle the process going away.
| anarazel wrote:
| > Like light-threads, heavy-threads would share the same
| process-security-boundary and therefore switching between them
| would be cheap. No need to flush TLB, I$, D$.
|
| > Like processes, heavy-threads would have mostly-separate
| address spaces by default. Similar to forking a process, they
| could share read-only mappings for shared libraries, code, COW
| global variables, and explicitly defined shared writable memory
| regions.
|
| I don't think you realistically can have separate address
| spaces and not have TLB etc impact. If they're separate address
| spaces, you need separate TLB entries => lower TLB hit ratio.
| dasyatidprime wrote:
| So what would be different between those and forked processes?
| ShroudedNight wrote:
| I've been pondering / ruminating with this too; I've been
| somewhat surprised that few operating systems have played with
| reserving per-thread address space as thread-local storage, or
| requiring something akin to a 'far' pointer to access commonly-
| addressed shared memory.
| wbl wrote:
| You cannot COW and share the TLB state. The caches aren't
| flushed in process changes either: it's that the data is
| different so evictions happen.
| mattashii wrote:
| > No need to flush TLB
|
| TLB isn't "flushed" so much as it is useless across different
| memory address spaces. Switching processes means switching
| address spaces, which means you have to switch the contents of
| the TLB to the new process' TLB entries, which eventually
| indeed flushes the TLB, but that is only over time, not
| necessarily the moment you switch processes.
|
| > Like processes, heavy-threads would have mostly-separate
| address spaces by default.
|
| This thus conflicts with the need to not flush TLBs. You can't
| not change TLB contents across address spaces.
| lukeschlather wrote:
| This sounds like a problem that would border on the complexity of
| replacing the GIL in Ruby or Python. The performance benefits are
| obvious but it seems like the correctness problems would be
| myriad and a constant source of (unpleasant) surprises.
| narrator wrote:
| The correctness problem should be handled by a suite of
| automated tests which PostgreSQL has. If all tests pass, the
| application must work correctly. The project is too big, and
| has too many developers to make much progress without full test
| coverage. Where else would up-to-date documentation regarding
| the correct behavior of PostgreSQL exist? In some developers
| head? SQLite is pretty famous for there extreme approach to
| testing including out of memory conditions, and other rare
| circumstances: https://www.sqlite.org/testing.html
| abalashov wrote:
| > If all tests pass, the application must work correctly.
|
| These are "famous last words" in many contexts, but when
| talking about difficult-to-reproduce parallelism issues, I
| just don't think it's a particularly applicable viewpoint at
| all. No disrespect. :)
| lukeschlather wrote:
| Parallelism is often incredibly hard to write automated tests
| for, and this will most likely create parallelism issues that
| were not dreamed of by the authors of the test suite.
| MuffinFlavored wrote:
| Does GIL stand for Global Interpreter Lock?
| Yujf wrote:
| yes
| dialogbox wrote:
| Even the performance benefits are not big enough compare to the
| GIL.
|
| Biggest problem of the process model might be the cost of
| having too many DB connections. Each client need a dedicated
| server process. Memory usage and the context switching
| overhead. Or if there is no connection pool, connection time
| overhead is very high.
|
| This problem has been well addressed with a connection pool. Or
| having a middle ware instead of exposing the DB directly. That
| works very well so far.
|
| Oracle has been supporting the thread based model and it's been
| usable for decades. I remember I tried the thread based
| configuration option (MTS or shared server) in 1990s. But no
| one likes that at least within my Oracle DBA network.
|
| It would be a great research project but it would be a big
| problem if the community pushs this too early.
| cactusfrog wrote:
| This is different because there isn't a whole ecosystem of
| packages that depend on access to a thread unsafe C API.
| Getting the GIL out of core Python isn't too challenging.
| Getting all of the packages that depend on Python's C API
| working is.
| masklinn wrote:
| An other component of the Gil story is that removing the Gil
| require adding fine grained locks, which (aside from making
| VM development more complicated) significantly increases lock
| traffic and thus runtime costs, which noticeably impacts
| single-threaded performance, which is of major import.
|
| Postgres starts from a share-nothing architecture, it's quite
| a bit easier to evaluate the addition of sharing.
| bsder wrote:
| > which noticeably impacts single-threaded performance,
| which is of major import.
|
| 1) I don't buy this a priori. Almost everybody who removed
| a gigantic lock suddenly realizes that there was more
| contention than they thought and that atomizing it made
| performance improve.
|
| 2) Had Python bitten the bullet and removed the GIL back at
| Python 3.0, the performance would likely already be back to
| normal or better. You can't optimize hypothetically.
| Optimization on something like Python is an accumulation of
| lots of small wins.
| masklinn wrote:
| > I don't buy this a priori.
|
| You don't have to buy anything, that's been the result of
| every attempt so far and a big reason for their
| rejection. The latest effort only gained some traction
| because the backers also did optimisation work which
| compensated (and then was merged separately).
|
| > Almost everybody who removed a gigantic lock
|
| See that's the issue with your response, you're not
| actually reading the comment you're replying to.
|
| And the "almost" is a big tell.
|
| > suddenly realizes that there was more contention than
| they thought and that atomizing it made performance
| improve.
|
| There is no contention on the gil in single threaded
| workloads.
|
| > Had Python bitten the bullet and removed the GIL back
| at Python 3.0
|
| It would have taken several more years and been
| completely DOA.
| anarazel wrote:
| Postgres already shares a lot of state between processes
| via shared memory. There's not a whole lot that would
| initially change from a concurrency perspective.
| ComputerGuru wrote:
| > which (aside from making VM development more complicated)
| significantly increases lock traffic and thus runtime
| costs, which noticeably impacts single-threaded
| performance, which is of major import.
|
| I don't think that's a fair characterization of the trade
| offs. Acquiring uncontended mutexes is basically free (and
| fairly side-effect free) so single-threaded performance
| will not be noticeably impacted.
|
| Every large C project I'm aware of (read: kernels) that has
| publicly switched from coarse locks to fine-grained locks
| has considered it to be a huge win with little to no impact
| on single-threaded performance. You can even gain
| performance if you chop up objects or allocations into
| finer-grained blobs to fit your finer-grained locking
| strategy because it can play nicer with cache friendliness
| (accessing one bit of code doesn't kick the other bits of
| code out of the cache).
| erikpukinskis wrote:
| > there isn't a whole ecosystem of packages that depend on
| access to a thread unsafe C API
|
| They mentioned a similar issue for Postgres extensions, no?
|
| > Haas, though, is not convinced that it would ever be
| possible to remove support for the process-based mode.
| Threads might not perform better for all use cases, or some
| important extensions may never gain support for running in
| threads.
| scolby33 wrote:
| I question how important an extension is if there's not
| enough incentive to port it to the new paradigm, at least
| eventually.
| abalashov wrote:
| Well. The thing with that is just that there are a lot of
| extensions. Like, a lot!
| gjvc wrote:
| surely this is "Some guy reconsiders the process-based model of
| PostgreSQL"
| Icathian wrote:
| Uh. Heikki is definitely not just "some guy". Dude is one of
| the top contributors to Postgres.
| gjvc wrote:
| How does that make him immune to having dumb ideas? See, I'm
| judging the idea on merit.
|
| You're just defending your hero who has gone rogue.
| Icathian wrote:
| I must have missed all the nuanced judgment in your
| original post. Maybe you can quote some for me.
| gjvc wrote:
| not rising to this sarcastic bait.
| anarazel wrote:
| Heikki is far from the only "senior" postgres contributor
| thinking that this is the right direction in the long term.
|
| > You're just defending your hero who has gone rogue.
|
| That doesn't sound like judging an idea on its merit, it
| sounds like judging a person for something you haven't
| analyzed yourself.
| gjvc wrote:
| > Heikki is far from the only "senior" postgres
| contributor thinking that this is the right direction in
| the long term.
|
| sounds like groupthink
| jpgvm wrote:
| A close to impossible task, if anyone can do it's probably Heikki
| though.
|
| Unfortunately I expect this to go the way of zheap et al.
| Fundamental design changes like this have just had such a rough
| time of succeeding thus far.
|
| I think for such a change to work it probably needs not just the
| support of Neon but also of say Microsoft (current stewards of
| Citus) that have larger engineering resources to throw at the
| problem and grind out all the bugs.
| rashidujang wrote:
| Hey I'm fairly new to the who's who in the PostgreSQL world,
| would you mind telling why Heikki might be able to pull this
| off?
| anarazel wrote:
| Not who you asked, but: He is a longtime contributor who has
| written/resigned important parts of postgres (WAL format,
| concurrent WAL insertion, 2PC support, parts of SSI support,
| much more). And he is just a nice person to work with.
| rashidujang wrote:
| Cool! He seems like a powerhouse in this space - thank you
| for the answer
| anarazel wrote:
| I know that at least people from EDB (Robert) and Microsoft
| (Thomas, me) are quite interested in eventually making this
| transition, it's not just Heikki. Personally I won't have a lot
| of cycles for the next release or two, but after that...
| jpgvm wrote:
| That gives me some faith, I hope everyone is able to come
| together to make it happen.
| sargun wrote:
| I'm curious if they can take advantage of vfork / CLONE_VM, to
| get the benefits of sharing memory and lower overhead context
| switches, with the trade of still getting benefits from the
| scheduler, and sysadmin-friendliness.
|
| The other thing that might be interesting is FUTEX_SWAP / UMCG.
| Although it doesn't remove the overhead induced by context
| switches entirely (specifically, you would still deal with TLB
| misses), you can avoid dealing with things like speculative
| execution exploit mitigations.
| nneonneo wrote:
| Per the article, Postgres has many, many global variables, many
| of which track per-session state; much session state is "freed"
| via process exit rather than being explicitly cleaned up.
| Switching to CLONE_VM requires these problems to all be solved.
| why-el wrote:
| what about support for Windows?
| jupp0r wrote:
| Please don't use mutable global state in your work. Global
| variables are universally bad and don't provide much of a
| benefit. The number of desirable architectural refactoring that
| I've witnessed turning into a muddy mess because of them is
| daunting. This is one more example of this.
| orthoxerox wrote:
| You know what a database is, do you? It is the place where you
| store your mutable global state. You can't kick the can down
| the road forever, _someone_ has to tackle the complexity of
| managing state.
| jupp0r wrote:
| Databases are great, especially those who do not use global
| variables in their implementation.
| slashdev wrote:
| Thank you for sharing your ideological views, but this is not
| the appropriate venue for that. If you want to have a software
| _engineering_ discussion about the trade offs involved in
| sharing global mutable state, this is a good venue for that.
| All engineering is trade offs. As soon as you make blanket
| statements that X is always bad, you've transitioned into the
| realm of ideology. Now presumably you mean to say it's almost
| always bad. But that really depends on the context. It may well
| be almost always bad in average software projects, but
| PostgreSQL is not your average software project. Databases are
| a different realm.
| refulgentis wrote:
| Global mutable state being a poor choice in software
| architecture isn't an ideology. There is no ideology that
| argues it is awesome.
|
| If you want to have a software _engineering_ discussion about
| the trade offs involved in sharing global mutable state, this
| is a good venue for that.
|
| All engineering is trade offs. As soon as you start telling
| people they're making blanket statements that X is always
| bad, you've transitioned into the realm of nitpicking.
| slashdev wrote:
| It's awesome where performance considerations are
| paramount. It's awesome in databases. It's awesome in
| embedded software. It's awesome in operating system
| kernels.
|
| The fact is sometimes it's good. Saying it's universally
| bad is going beyond the realm of logic and evidence and
| into the realm of ideology.
| jupp0r wrote:
| Can you explain how having a global variable is more
| performant than passing a pointer to an object as a
| function argument in practice?
| megous wrote:
| Using globals is simpler, it's also pretty natural in event
| driven architectures. Passing everything via function
| arguments is welcome for library code, but there's little
| point to using it in application code. It just complicates
| things.
| jupp0r wrote:
| The problems it causes for Postgres are outlined in the
| article on LWN.
| megous wrote:
| > Globals work well enough when each server process has
| its own set...
|
| PostgreSQL uses a process model. So the article just
| states that globals work fine for PostgreSQL
|
| > Knizhnik has already done a threads port of PostgreSQL.
| The global-variable problem, he said, was not that
| difficult.
|
| I see no big problem based on information from person who
| did some porting already.
| jupp0r wrote:
| Knizhnik made these variables thread local, which is fine
| if you have a fixed association of threads to data. This
| looses some flexibility if your runtime needs to
| incorporate multiple sessions on one thread (for example
| to hide IO latency) in the future. In the end, the best
| solution is to associate the data that belongs to a
| session with the session itself, making it independent on
| which thread it's running on. This is described by
| Knizhnik as "cumbersome", which is exactly why people
| should have not started with global variables in the
| first place. (No blame, Postgres is from 1986 and times
| were very different back then).
| jupp0r wrote:
| Discrediting my argument by labeling it as ideology and by
| implying that "blanket statements are always bad" is a
| logical fallacy that does not touch the merits of what is
| discussed and I would argue that your argument instead of
| mine is the one that does not belong here.
|
| If you want to contribute to the discussion, I'd be happy to
| be given an example of successful usage of global variables
| that made a project a long term success under changing
| requirements compared to the alternatives.
| [deleted]
| CodeWriter23 wrote:
| "no objections" <> "consent"
| ed25519FUUU wrote:
| Have you ever tried to move a large organization forward in a
| certain direction? It's really hard. At some point you have to
| make a decision.
| timcobb wrote:
| Not in something like Postgres, I hope
| CodeWriter23 wrote:
| I have. What I've observed more is outside attackers with
| their own agenda use the "nobody objected because they were
| unprepared and unable to respond in the 2 minutes I gave them
| to object" as proof their agenda is supported.
| neilwilson wrote:
| It's always amazed me with databases why they don't go the other
| way.
|
| Create an operating system specifically for the database and make
| it so you boot the database.
|
| Databases seem to spend most of their time working around the
| operating system abstractions. So why not look at the OS, and
| streamline it for database use - dropping all the stuff a
| database will never need.
|
| That then is a completely separate project which is far easier to
| get started rather than shoehorning the database into an
| operating system thread model that is already a hack of the
| process model.
| girvo wrote:
| That was/is part of the promise of the whole unikernel thing,
| no?
|
| https://mirage.io/ or similar could then let you boot your
| database. That said, it's not really taken off from what I can
| tell, so I'm guessing there's more to it than that.
| anarazel wrote:
| Imo unikernels are a complicated solution in search of a
| problem, which turns out to not exist.
|
| There certainly are times OSs get in the way. But it's hard
| enough to write a good database, we don't need to maintain a
| third of an OS in addition.
| girvo wrote:
| Yeah indeed, that was my feeling on it as well. As much as
| Linux et al might get in ones way at times, what we get for
| free by relying on them is too useful to ignore for most
| tasks I think.
|
| That said, perhaps at AWS or Google scale that would be
| different? I wonder if they've looked at this stuff
| internally.
| lelanthran wrote:
| > Create an operating system specifically for the database and
| make it so you boot the database.
|
| (Others downthread have pointed out unikernels and I agree with
| the criticisms)
|
| This proposal is an excellent Phd project for someone like me
| :-)
|
| It ticks all of the things I like to work on the most[1]:
|
| Will involve writing low-level OS code
|
| Get to hyper-focus on performance
|
| Writing a language parser and executor
|
| Implement scheduler, threads, processes, etc.
|
| Implement the listening protocol in the kernel.
|
| I have to say, though, it might be easier to start off with a
| rump kernel (netBSD), then add in a specific RAW disk access
| that bypasses the OS (no, or fewer, syscalls to use it), create
| a kernel module for accepting a limited type of task and
| executing that task in-kernel (avoiding a context-swtich on
| every syscall)[2].
|
| Programs in userspace must have the lowest priority (using
| starvation-prevention mechanisms to ensure that user input
| would _eventually_ get processed).
|
| I'd expect a non-insignificant speedup by doing all the work in
| the kernel.
|
| The way it is now,
|
| userspace requests read() on a socket (context-switch to
| kernel),
|
| gets data (context-switch to userspace),
|
| parses a query,
|
| requests read on disk (multiple context-switches to kernel for
| open, stat, etc, multiple switches back to userspace after each
| call is complete). This latency is probably fairly well
| mitigated with mmap, though.
|
| logs diagnostic (multiple context-switches to and from kernel)
|
| requests write on client socket (context switch to kernel back
| and forth until all data is written).
|
| The goal of the DBOS would be to remove almost all the context-
| switching between userspace and kernel.
|
| [1] My side projects include a bootable (but unfinished) x86
| OS, various programming languages, performant (or otherwise) C
| libraries.
|
| [2] Similar to the way RealTime Linux calls work (caller shares
| a memory buffer with rt kernel module, populates the buffer and
| issues a call, kernel only returns when that task is complete).
| The BPF mechanism works the same. It's the only way to reduce
| latency to the absolute physical minimum.
| mrweasel wrote:
| You mean like Microsoft SQL Server, which basically runs a
| small OS on top of Windows or Linux?
|
| This is actually part of the reason why Microsoft was able to
| port SQL Server to Linux fairly easily, if I recall correctly.
| pizza234 wrote:
| > Create an operating system specifically for the database and
| make it so you boot the database.
|
| I have the impression that this is similar to the adhoc
| filesystem idea; this seems in principle very advantageous (why
| employing two layers that do approximately the same thing on
| top of each other?), but in reality, when implemented (by
| Oracle), it lead to only a minor improvement (a few % points,
| AFAIR).
| samus wrote:
| You can get most of these speedups by using advanced APIs like
| IO_uring and friends, while still benefiting of using an OS,
| which is taking care of the messy and thankless task of
| hardware support.
| orthoxerox wrote:
| Sounds like IBM OS/400.
| dialogbox wrote:
| I'm not sure what you mean by OS. If you mean a whole new
| kernel, it will take decades. They can support only small
| number of HW. If you mean a specialized linux distro, many
| companies does that already.
|
| I don't know how that can make it easier the process based /
| thread based problem.
| rossmohax wrote:
| This project could borrow a lot from unikernels. If they
| mandate running it as a VM, there is no HW to support.
| formerly_proven wrote:
| ngmi
| c00lio wrote:
| Why should TLB flush performance ever be a problem on big
| machines? You can have one process per core with 128 or more
| cores, never flush any TLB if you pin those processes. And as it
| is a database, shoveling data from/to disk/SSD is your main
| concern anyways.
| scottlamb wrote:
| PostgreSQL uses synchronous IO, so you won't saturate the CPU
| with one process (or thread) per core.
|
| That said, I think there have been efforts to use io_uring on
| Linux. I'm not sure how that would work with the process per
| connection model. Haven't been following it...
| c00lio wrote:
| Problem with all kinds of asynchronous I/O is that your
| processes then need internal multiplexing, akin to what
| certain lightweight userspace thread models are doing. In the
| end, it might be harder to introduce than just using OS
| threads.
| anarazel wrote:
| > That said, I think there have been efforts to use io_uring
| on Linux. I'm not sure how that would work with the process
| per connection model. Haven't been following it...
|
| There's some minor details that are easier with threads in
| that context, but on the whole it doesn't make much of a
| difference.
| scottlamb wrote:
| I don't understand how it works with thread per connection
| either. io_uring is designed for systems that have a thread
| and ring per core, for you to give it a bunch of IO to do
| at once (batches and chains), and your threads to do other
| work in the meantime. The syscall cost is amortized or even
| (through IORING_SETUP_SQPOLL) eliminated. If your code is
| instead designed to be synchronous and thus can only do one
| IO at a time and needs a syscall to block on it, I don't
| think there's much if any benefit in using io_uring.
|
| Possibly they'd have a ring per connection and just get an
| advantage when there's parallel IO going on for a single
| query? or these per-connection processes wouldn't directly
| do IO but send it via IPC to some IO-handling
| thread/process? Not sure either of those models are
| actually an improvement over the status quo, but who knows.
| anarazel wrote:
| > io_uring is designed for systems that have a thread and
| ring per core
|
| That's not needed to benefit from io_uring
|
| > for you to give it a bunch of IO to do at once (batches
| and chains), and your threads to do other work in the
| meantime.
|
| You can see substantial gains even if you just submit
| multiple IOs at once, and then block waiting for any of
| them to complete. The cost of blocking on IO is amortized
| to some degree over multiple IOs. Of course it's even
| better to not block at all...
|
| > If your code is instead designed to be synchronous and
| thus can only do one IO at a time and needs a syscall to
| block on it, I don't think there's much if any benefit in
| using io_uring.
|
| We/I have done the work to issue multiple IOs at a time
| as part of the patchset introducing AIO support (with
| among others, an io_uring backend). There's definitely
| more to do, particularly around index scans, but ...
| scottlamb wrote:
| Oh, I hadn't realized until now I was talking with
| someone actually doing this work. Thanks for popping into
| this discussion!
|
| > > io_uring is designed for systems that have a thread
| and ring per core
|
| > That's not needed to benefit from io_uring
|
| 90% sure I read Axboe saying that's what he designed
| io_uring for. If it helps in other scenarios, though,
| great.
|
| > Of course it's even better to not block at all...
|
| Out of curiosity, is that something you ever want/hope to
| achieve in PostgreSQL? Many high-performance systems use
| this model, but switching a synchronous system in plain C
| to it sounds uncomfortably exciting, both in terms of the
| transition itself and the additional complexity of
| maintaining the result. To me it seems like a much
| riskier change than the process->thread one discussed
| here that Tom Lane already stated will be a disaster.
|
| > We/I have done the work to issue multiple IOs at a time
| as part of the patchset introducing AIO support (with
| among others, an io_uring backend). There's definitely
| more to do, particularly around index scans, but ...
|
| Nice.
|
| Is the benefit you're getting simply from adding IO
| parallelism where there was none, or is there also a CPU
| reduction?
|
| Is having a large number of rings (as when supporting a
| large number of incoming connections) practical? I'm
| thinking of each ring being a significant reserved block
| of RAM, but maybe in this scenario that's not really
| true. A smallish ring for a smallish number of IOs for
| the query is enough.
|
| Speaking of large number of incoming connections,
| would/could the process->thread change be a step toward
| having a thread per active query rather than per
| (potentially idle) connection? To me it seems like it
| could be: all the idle ones could just be watched over by
| one thread and queries dispatched. That'd be a nice
| operational improvement if it meant folks no longer
| needed a pooler [1] to get decent performance. All else
| being equal, fewer moving parts is more pleasant...
|
| [1] or even if they only needed one layer of pooler
| instead of two, as I read some people have!
| anarazel wrote:
| > > Of course it's even better to not block at all...
|
| > Out of curiosity, is that something you ever want/hope
| to achieve in PostgreSQL? Many high-performance systems
| use this model, but switching a synchronous system in
| plain C to it sounds uncomfortably exciting, both in
| terms of the delta and the additional complexity of
| maintaining the result. To me it seems like a much
| riskier change than the process->thread one discussed
| here that Tom Lane already stated will be a disaster.
|
| Depends on how you define it. In a lot of scenarios you
| can avoid blocking by scheduling IO in a smart way - and
| I think we can quite far towards that for a lot of
| workloads and the wins are _substantial_. But that
| obviously cannot alone guarantee that you never block.
|
| I think we can get quite far avoiding blocking, but I
| don't think we're going to a complete asynchronous model
| in the foreseeable future. But it seems more feasible to
| incrementally make common blocking locations support
| asynchronicity. E.g. when a query scans multiple
| partitions, switch to processing a different partition
| while waiting for IO.
|
| > Is having a large number of rings (as when supporting a
| large number of incoming connections) practical? I'm
| thinking of each ring being a significant reserved block
| of RAM, but maybe in this scenario that's not really
| true. A smallish ring for a smallish number of IOs for
| the query is enough.
|
| It depends on the kernel version etc. The amount of
| memory isn't huge but initially it was affected by
| RLIMIT_MEMLOCK... That's one reason why the AIO patchset
| has a smaller number of io_uring "instances" than the
| allowed connections. The other reason is that we need to
| be able to complete IOs that other backends started
| (otherwise there would be deadlocks), which in turn
| requires having the file descriptor for each ring
| available in all processes... Which wouldn't be fun with
| a high max_connections.
|
| > Speaking of large number of incoming connections,
| would/could the process->thread be a step toward having a
| thread per active query rather than per (potentially
| idle) connection?
|
| Yes. Moving to threads really mainly would be to make
| subsequent improvements more realistic...
|
| > That'd be a nice operational improvement if it meant
| folks no longer needed a pooler [1] to get decent
| performance. All else being equal, fewer moving parts is
| more pleasant...
|
| You'd likely often still want a pooler on the
| "application server" side, to avoid TCP / SSL connection
| establishment overhead. But that can be a quite simple
| implementation.
| chasil wrote:
| Oracle has similar problems.
|
| On UNIX systems, Oracle uses a multi-process model, and you can
| see these: $ ps -ef | grep smon USER
| PID PPID STARTED TIME %CPU %MEM COMMAND oracle 22131
| 1 Mar 28 3:09 0.0 4.0 ora_smon_yourdb
|
| Windows forks processes about 100x slower than Linux, so Oracle
| runs threaded on that platform in one great big PID.
|
| Sybase was the first major database that fully adopted threads
| from an architectural perspective, and Microsoft SQL Server has
| certainly retained and improved on that model.
| EvanAnderson wrote:
| > Windows forks processes about 100x slower than Linux...
|
| I work with a Windows-based COTS webapp that uses Postgres w/o
| any connection pooling. It's nearly excruciating to use because
| it spins-up new Postgres processes for each page load. If not
| for the fact that the Postgres install is "turnkey" with the
| app I'd just move Postgres over to a Linux machine.
| devit wrote:
| Use pgbouncer
| ethbr0 wrote:
| Was curious about this as an architectural solution as
| well.
|
| We're really talking about X-per-client as the primary
| reason to move away from processes, right?
|
| So if you can get most of the benefit via pooling... why
| inherit the pain of porting?
|
| Presumably latency jitter would be a difficult problem with
| pools, but it seems easier (and safer) than porting
| processes -> threads.
|
| Disclaimer: High performance / low latency DB code is
| pretty far outside my wheelhouse.
| anarazel wrote:
| pgbouncer is not transparent, you loose features,
| particularly when using the pooling mode actually
| allowing a larger number of active concurrent
| connections. Solving those issues is a _lot_ easier with
| threads than with processes.
| ddorian43 wrote:
| > We're really talking about X-per-client as the primary
| reason to move away from processes, right?
|
| Many other things too. Like better sharing of caches.
| Lower overhead of thread instead of process. Etc. (read
| the thread)
| ilyt wrote:
| The reasons are explained in article. Read the article
| ethbr0 wrote:
| I appear to have missed them, then.
|
| Could you point out, aside from the large numbers of
| clients I mentioned (and the development overhead of
| implementing multi-process memory management code), what
| the article mentions is a primary drawback of using
| processes over threads?
| ilyt wrote:
| > The overhead of cross-process context switches is
| inherently higher than switching between threads in the
| same process - and my suspicion is that that overhead
| will continue to increase. Once you have a significant
| number of connections we end up spending a _lot_ of time
| in TLB misses, and that 's inherent to the process model,
| because you can't share the TLB across processes.
| ethbr0 wrote:
| Yes, that's per-client performance scaling ("significant
| number of connections"), which indicates a pooled
| connection model might mitigate most of the performance
| impact while allowing some core code to remain process-
| oriented (and thus, not rewritten).
| treis wrote:
| That helps a lot but it's not a replacement for large
| number of persistent connections. If you had that you could
| simplify things in the application layer and do interesting
| things with the DB.
| ComputerGuru wrote:
| If you run postgres under WSLv1 (now available on Server
| Edition as well), the WSL subsystem handles processes and
| virtual memory in a way that has been specifically designed
| to optimize process initialization as compared to the
| traditional Win32 approach.
| chasil wrote:
| It would not be difficult to simply "pg_dump" all the data to
| Postgres on a Linux machine, then quietly set the clients to
| use the new server.
| blinkingled wrote:
| Didn't Oracle switch to threaded model in 12c - at least on
| Linux I remember there being a parameter to do that - it
| dropped the number of processes significantly.
| chasil wrote:
| No, I ran that on v19. $ ps -ef | grep smon
| UID PID PPID C STIME TTY TIME CMD
| oracle 22131 1 0 Mar28 ? 00:03:09
| ora_smon_yourdb $ $ORACLE_HOME/bin/sqlplus -silent
| '/ as sysdba' select version_full from v$instance;
| VERSION_FULL ----------------- 19.18.0.0.0
| blinkingled wrote:
| https://oracle-base.com/articles/12c/multithreaded-model-
| usi...
|
| Probably still requires the parameter to be set.
| chasil wrote:
| Contrast this to Microsoft SQL Server: $
| systemctl status mssql-server * mssql-
| server.service - Microsoft SQL Server Database Engine
| Loaded: loaded (/usr/lib/systemd/system/mssql-
| server.service; disabled; vendor preset: disabled)
| Active: active (running) since Mon 2023-06-19 15:48:05
| CDT; 1min 18s ago Docs:
| https://docs.microsoft.com/en-us/sql/linux Main
| PID: 2125 (sqlservr) Tasks: 123
| CGroup: /system.slice/mssql-server.service
| +-2125 /opt/mssql/bin/sqlservr +-2156
| /opt/mssql/bin/sqlservr
| CoolCold wrote:
| I'm not sure what I wonder on more - seeing its not
| enabled on boot or seeing mssql under systemd
| blinkingled wrote:
| Yeah multiprocess isn't Microsoft's style given how
| expensive creating processes is on Windows.
|
| Oracle - never had a scalability issue on very big Linux,
| Solaris and HPUX systems though - they do it well in my
| experience.
| hans_castorp wrote:
| > Didn't Oracle switch to threaded model in 12c
|
| It's optional, and the default is still a process model on
| Linux.
| 0xbadcafebee wrote:
| So compromise. Take the current process model, add threading and
| shared memory, with feature flags to limit number of processes
| and number of threads.
|
| Want to run an extension that isn't threadsafe? Run with 10
| processes, 1 threads. Want to run high-performance? Run with 1
| process, 10 threads. Afraid of "stability issues"? Run with 1
| process, 1 thread.
|
| Will it be hard to do? Sure. Impossible? Not at all. Plan for it,
| give a very long runway, throw all your new features into the
| next major version branch, and tell people everything else is off
| the table for the next few years. If you're _really sure_
| threading is going to be increasingly necessary, better to start
| now than to wait until it 's too late. But this idea of "oh it's
| hard", "oh it's dangerous", "too complicated", etc is bullshit.
| We've built fucking spaceships that visit other planets. We can
| make a database with threads that doesn't break. Otherwise we
| admit that basic software development using practices from the
| past 30 years is too much for us to figure out.
| MichaelMoser123 wrote:
| i wonder if gpt-11 will be able to do this kind of project on its
| own...
| est wrote:
| Not sure what's going on here, but one connection per process
| seems... ancient.
|
| Using threaded model is difficult, how about pre-fork? Some
| connections per process is a good improvement.
| anarazel wrote:
| The issue is costlier (runtime, complexity, memory) resource
| sharing, not the cost of the fork itself. Pre-forking isn't
| going to help with any of that.
| EGreg wrote:
| I hope they don't do it.
|
| I've had a similar situation with PHP, where we had written quite
| a large engine (https://github.com/Qbix/Platform) with many
| features (https://qbix.com/features.pdf) . It took advantage of
| the fact that PHP isolated each script and gave it its own global
| variables, etc. In fact, much of the request handling did stuff
| like this: Q_Request::requireFields(['a', 'b',
| 'c']); $uri = Q_Dispatcher::uri();
|
| instead of stuff like this:
| $this->getContext()->request()->requireFields(['a', 'b', 'c']);
| $this->getContext()->dispatcher()->uri();
|
| Over the last few years, I have run across many compelling
| things: amp reactPHP Swoole (native
| extension) Fibers (inside PHP itself)
|
| It seemed so cool! PHP could behave like Node! It would have an
| event loop and everything. Fibers were basically PHP's version of
| Swoole's coroutines, etc. etc.
|
| Then I realized... we would have to go through the entire code
| and redo how it all works. We'd also no longer benefit from PHP's
| process isolation. If one process crapped out or had a memory
| leak, it could take down everything else.
|
| There's a reason PHP still runs 80% of all web servers in the
| world (https://kinsta.com/blog/is-php-dead/) ... and one of the
| biggest is that commodity servers can host terrible PHP code and
| it's mostly isolated in little processes that finish "quickly"
| before they can wreak havoc on other processes or on long-running
| stuff.
|
| So now back to postgres. It's been praised for its rock-solid
| reliability and security. It's got so many features and the MVCC
| is very flexible. It seems to use a lot of global variables. They
| can spend their time on many other things, like making it
| byzantine-fault-tolerant, or something.
|
| The clincher for me was when I learned that php-fpm (which spins
| up processes which sleep when waiting for I/O) is only 50% slower
| than all those fancy things above. Sure, PHP with Swoole can
| outperform even Node.js, and can handle twice as many requests.
| But we'd rather focus on soo many other things we need to do :)
| zackmorris wrote:
| I've been using PHP for decades and have found its isolated
| process model to be about the best around, certainly for any
| mainstream language. Also Symfony's Process component
| encapsulates most of the errata around process management in a
| cross-platform way:
|
| https://symfony.com/doc/current/components/process.html
|
| Going from a working process implementation to async/threads
| with shared memory is pretty much always a mistake IMHO,
| especially if it's only done for performance reasons. Any speed
| gains will be eclipsed by endless whack-a-mole bug fixes, until
| the code devolves into something unrecognizable. Especially
| when there are other approaches similar to map-reduce and
| scatter-gather arrays where data is processed in a distributed
| fashion and then joined into a final representation through
| mechanisms like copy-on-write, which are supported by very few
| languages outside of PHP and the functional programming world.
|
| The real problem here is the process spawning and context-
| switching overhead of all versions of Windows. I'd vote to
| scrap their process code in its entirety and write a new
| version based on atomic operations/lists/queues/buffers/rings
| with no locks and present an interface which emulates the
| previous poor behavior, then run it through something like a
| SAT solver to ensure that any errata that existing software
| depends on is still present. Then apps could opt to use the
| direct unix-style interface and skip the cruft, or refactor
| their code to use the new interface.
|
| Apple did something similar to this when OS X was released,
| built on a mostly POSIX Darwin, NextSTEP, Mach and BSD Unix. I
| have no idea how many times Microsoft has rewritten their
| process model or if they've succeeded in getting performance on
| par with their competitors (unlikely).
|
| Edit: I realized that the PHP philosophy may not make a lot of
| sense to people today. In the 90s, OS code was universally
| terrible, so for example the graphics libraries of Mac and
| Windows ran roughly 100 times slower than they should for
| various reasons, and developers wrote blitters to make it
| possible for games to run in real time. That was how I was
| introduced to programming. PHP encapsulated the lackluster OS
| calls in a cross-platform way, using existing keywords from
| popular languages to reduce the learning curve to maybe a day
| (unlike Perl/Ruby, which are weird in a way that can be fun but
| impractical to grok later). So it's best to think of PHP more
| like something like Unity, where the nonsense is abstracted and
| developers can get down to business. Even though it looks like
| Javascript with dollar signs on the variables. It's also more
| like the shell, where it tries to be as close as possible to
| bare-metal performance, even while restricted to the 100x
| interpreter slowdown of languages like Python. I find that PHP
| easily saturates the processor when doing things in a data-
| driven way by piping bytes around.
| bob1029 wrote:
| I think an interesting point of comparison is the latest
| incarnation of SQL Server. You can't even point at 1 specific
| machine anymore with their hyperscale architecture.
|
| https://learn.microsoft.com/en-us/azure/azure-sql/database/h...
| js4ever wrote:
| Finally! This and a good multi master story and I'll finally
| start to love Postgres
| waselighis wrote:
| It sounds to me like migrating to a fully multi-threaded
| architecture may not be worth the effort. Simply reducing the
| number of processes from thousands to hundreds would be a huge
| win and likely much more feasible than a complete re-
| architecture.
| sbr464 wrote:
| Unpopular idea: age limit on votes/contributions.
|
| We are in a unique time period and generation that has strong
| opinions based on a history that only exists within itself.
|
| At some point it needs to rewrite.
| chucky_z wrote:
| I wish they would do some kind of easy shared storage instead, or
| in addition too. This sounds like an odd solution, however I've
| scaled pgsql since 9 on very, very large machines and doing 1
| pgsql cluster per physical socket ended up doing near-linear
| scaling even on 100+ total core machines with TB+ of memory.
|
| The challenge with this setup is that you need to do 1 writer and
| multiple reader clusters so you end up doing localhost
| replication which is super weird. If that requirement was somehow
| removed that'd be awesome for scaling really huge clusters.
| t43562 wrote:
| Calm down guys! Threading is tricky but they can rewrite it all
| in Rust so it'll be completely ok........
|
| ;-)
___________________________________________________________________
(page generated 2023-06-20 23:02 UTC)