[HN Gopher] PostgreSQL reconsiders its process-based model
___________________________________________________________________
PostgreSQL reconsiders its process-based model
Author : todsacerdoti
Score : 459 points
Date : 2023-06-19 16:33 UTC (6 hours ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| mihaic wrote:
| I'm honestly surprised it took them so long to reach this
| conclusion.
|
| > That idea quickly loses its appeal, though, when one considers
| trying to create and maintain a 2,000-member structure, so the
| project is unlikely to go this way.
|
| As repulsive as this might sound at first, I've seen structures
| of hundreds of fields work fine if the hierarchy inside them is
| well organized and they're not just flat. Still, I have no real
| knowledge of the complexity of the code and wish the Postgres
| devs all the luck in the world to get this working smoothly.
| rsaxvc wrote:
| This is how I made my fork of libtcc lock-free.
|
| Mainline has a lock so that all backends can use global
| variables, but only one instance can do codegen at a time.
|
| It was a giant refactoring Especially fun was when multiple
| compilation units used the same static variable name, but it
| all worked in the end.
| paulddraper wrote:
| > I'm honestly surprised it took them so long to reach this
| conclusion.
|
| On the contrary, it's been discussed for ages. But it's a huge
| change, with only modest advantages.
|
| I'm skeptical of the ROI to be honest. Not that is doesn't have
| value, but that it has more value than the effort.
| 36364949thrw wrote:
| > it's a huge change, with only modest advantages
|
| +significant and unknown set of new problems, including new
| bugs.
|
| This reminds me of the time they lifted entire streets in
| Chicago by 14 feet to address new urban requirements.
| Chicago, we can safely assume, did not have the option of
| just starting a brand new city a few miles away.
|
| The interesting question here is should a system design that
| works quite well upto a certain scale be abandoned in order
| to extend its market reach.
| datavirtue wrote:
| Yeah, and you will run headlong into other unforseen real
| world issues. You may never reach the performance goals.
| loeg wrote:
| Yeah. I think as a straightforward, easily correct transition
| from 2000 globals, a giant structure isn't an awful idea. It's
| not like the globals were organized before! You're just making
| the ambient state (awful as it is) explicit.
| stingraycharles wrote:
| Yes, it's the most pragmatic and it's only "awful" because it
| makes the actual problem visible. And would likely encourage
| slowly refactoring code to handle its state in a more sane
| way, until you're only left with the really gnarly stuff,
| which shouldn't be too much anymore and you can put them in
| individual thread local storages.
|
| It's an easy transition path.
| mihaic wrote:
| Exactly, if you're now forced to put everything in one place
| you're forced to acknowledge and understand the complexity of
| your state, and might have incentives to simplify it.
| Sesse__ wrote:
| Here's MySQL's all-session-globals-in-one-place-class:
| https://github.com/mysql/mysql-
| server/blob/8.0/sql/sql_class...
|
| I believe I can safely say that nobody acknowledges and
| understands the complexity of all state within that class,
| and that whatever incentives there may be to simplify it
| are not enough for that to actually happen.
|
| (It ends on line 4692)
| IshKebab wrote:
| Right but that would still be true if they were globals
| instead. Putting all the globals in a class doesn't make
| any difference to how much state you have.
| cakoose wrote:
| > I think as a straightforward, easily correct transition
| from 2000 globals, a giant structure isn't an awful idea.
|
| Agree.
|
| > It's not like the globals were organized before!
|
| Using a struct with 2000 fields loses some encapsulation.
|
| When a global is defined in a ".c" file (and not exported via
| a ".h" file), it can only be accessed in that one ".c" file,
| sort of like a "private" field in a class.
|
| Switching to a single struct would mean that all globals can
| be accessed by all code.
|
| There's probably a way to define things that allows you to
| regain some encapsulation, though. For example, some spin on
| the opaque type pattern:
| https://stackoverflow.com/a/29121847/163832
| pasc1878 wrote:
| No that is what a static in a .c file is for.
|
| A plain global can be accessed from other compiled units -
| agreed with no .h entry it is my=uch more error prone e.g.
| you don't know the type but the variables name is exposed
| to other objects
| remexre wrote:
| Wouldn't those statics also be slated for removal with
| this change?
| cogman10 wrote:
| I think my bigger fear is around security. A process per
| connection keeps things pretty secure for that connection
| regardless of what the global variables are doing (somewhat
| hard to mess that up with no concurrency going on in a
| process).
|
| Merge all that into one process with many threads and it
| becomes a nightmare problem to ensure some random addon
| didn't decide to change a global var mid processing (which
| causes wrong data to be read).
| dfox wrote:
| All postgres processes run under the same system user and
| all the access checking happens completely in userspace.
| fdr wrote:
| Access checking, yes, but the scope of memory corruption
| does increase unavoidably, given the main thing the
| pgsql-hackers investigating threads want: one virtual
| memory context when toggling between concurrent work.
|
| Of course, there's a huge amount of shared space already,
| so a willful corruption can already do virtually
| anything. But, more is more.
| magicalhippo wrote:
| We did this with a project I worked on. I came on after the
| code was mature.
|
| While we didn't have 2000 globals, we did have a non-trivial
| amount, spread over about 300kLOC of C++.
|
| We started by just stuffing them into a "context" struct, and
| every function that accessed a global thus needed to take a
| context instance as a new parameter. This was tedious but
| easy.
|
| However the upside was that this highlighted poor
| architecture. Over time we refactored those bits and the main
| context struct shrunk significantly.
|
| The result was better and more modular code, and overall well
| worth the effort in our case, in my opinion.
| MuffinFlavored wrote:
| > if the hierarchy inside them is well organized
|
| is this another way to say "in a 2000 member structure, only 10
| have significant voting power"?
| Ankhers wrote:
| This statement is not about people, it is about a C struct.
| FooBarWidget wrote:
| I don't get it. How is a 2000-member structure any different
| from having 2000 global variables? How is maintaining the
| struct possibly harder than maintaining the globals?
| Refactoring globals to struct members is semantically nearly
| identical, it may as well just be a mechanical, cosmetic
| change, while also giving the possibility to move to a threaded
| architecture.
| ComputerGuru wrote:
| Because global variables can be confined to individual cpp
| files, exclusively visible in that compilation unit. It makes
| them far easier to reason with than hoisting them to the
| "global and globally visible" option if you just use a
| gargantuan struct. Which is why a more invasive refactor
| might be required.
| imtringued wrote:
| Just use thread local variables.
|
| I abuse them for ridiculous things.
| comboy wrote:
| I've never really been limited by CPU when running postgres
| (few TB instances). The bottleneck is always IO. Do others have
| different experience? Plus there's elegance and a feeling of
| being in control when you know query is associated with
| specific process which you can deal with and monitor just like
| any other process.
|
| But I'm very much clueless about internals, so this is a
| question rather than an opinion.
| hyperman1 wrote:
| I see postgres become CPU bound regularly: Lots of hash
| joins, copy from or to CSV, index or materialized view
| rebuild. Postgis eats CPU. Tds_fdw tends to spend a lot of
| time doing charset conversion, more than actually networking
| to mssql.
|
| I was surprised when starting with postgres. Then again, I
| have smaller databases (A few TB) and the cache hit ratio
| tends to be about 95%. Combine that with SSDs, and it becomes
| understandable.
|
| Even so, I am wary of this change. Postgres is very reliable,
| and I have no problem throwing some extra hardware to it in
| return. But these people have proven they know what they are
| doing, so I'll go with their opinion.
| aetherson wrote:
| I've also definitely seen a lot of CPU bounding on
| postgres.
| sargun wrote:
| With modern SSDs that can push 1M IOPs+, you can get into a
| situation where I/O latency starts to become a problem, but
| in my experience, they far outpace what the CPU can do. Even
| the I/O stack can be optimized further in some of these
| cases, but often it comes with the trade off of shifting more
| work into the CPU.
| ilyt wrote:
| >I've never really been limited by CPU when running postgres
| (few TB instances). The bottleneck is always IO.
|
| Throw a few NVMe drives at it and it might.
| dfox wrote:
| Throw a ridiculous amount of RAM at it is more correct
| assessment. NVMe reads are still an "I/O" and that is slow.
| And for at least 10 years buying enough RAM to have all off
| the interesting parts of OLTP psql database either in
| shared_buffers or in the OS-level buffer cache is
| completely feasible.
| ilyt wrote:
| > NVMe reads are still an "I/O" and that is slow
|
| It's orders of magnitude faster than SAS/SATA SSDs and
| you can throw 10 of them into 1U server. It's nowhere
| near "slow" and still easy enough to be CPU bottlenecked
| before you get IO bottlenecked.
|
| But yes, pair of 1TB RAM servers gotta cost you less than
| half year's worth of developer salary
| phamilton wrote:
| I've generally had buffer-cache hit rates in the 99.9% range,
| which ends up being minimal read I/O. (This is on AWS Aurora,
| where these bo disk cache and so shared_buffers is the
| primary cache, but an equivalent measure for vanilla postgres
| exists.)
|
| In those scenarios,there's very little read I/O. CPU is the
| primary bottleneck. That's why we run up as many as 10 Aurora
| readers (autoscaled with traffic).
| paulddraper wrote:
| Depends on your queries.
|
| If you push a lot of work into the database including JSON
| and have a lot of buffer memory...CPU can easily be limiting.
| Diggsey wrote:
| It's not just CPU - memory usage is also higher. In
| particular, idle connections still consume signficant memory,
| and this is why PostgreSQL has so much lower connection
| limits than eg. MySQL. Pooling can help in some cases, but
| pooling also breaks some important PostgreSQL features (like
| prepared statements...) since poolers generally can't
| preserve session state. Other features (eg. notify) are just
| incompatible with pooling. And pooling cannot help with
| connections that are idle but inside a transaction.
|
| That said, many of these things are solvable without a full
| switch to a threaded model (eg. by having pooling built-in
| and session-state-aware).
| ComputerGuru wrote:
| > solvable without a full switch to a threaded model (eg.
| by having pooling built-in and session-state-aware).
|
| Yeeeeesssss, but solving that is solving the hardest part
| of switching to a threaded model. It requires the team to
| come terms with the global state and encapsulating session
| state in a non-global struct.
| saulrh wrote:
| Also, even if a 2k-member structure is obnoxious, consider the
| alternative - having to think about and manage 2k global
| variables is probably even worse!
| megous wrote:
| Each set of globals is in a module it relates to, not in some
| central file where everything has to be in one struct.
|
| If anything, it's probably easier to understand.
| hans_castorp wrote:
| > I'm honestly surprised it took them so long to reach this
| conclusion.
|
| Oracle also uses a process model on Linux. At some point (I
| think starting with 12.x), it can now be configured on Linux to
| use a threaded model, but the default is still a process-per-
| connection model.
|
| Why does everybody think it's a bad thing in Postgres, but
| nobody thinks it's a bad thing in Oracle.
| topspin wrote:
| > I'm honestly surprised it took them so long to reach this
| conclusion.
|
| I'm not. You can get a long way with conventional IPC, and OS
| processes provide a lot of value. For most PostgreSQL instances
| the TLB flush penalty is _at least_ 3rd or 4th on the list of
| performance concerns, _far_ below prevailing storage and
| network bottlenecks.
|
| I share the concerns cited in this LWN story. Reworking this
| massive code base around multithreading carries a large amount
| of risk. PostgreSQL developers will have to level up
| substantially to pull it off.
|
| A PostgreSQL endorsed "second-system" with the (likely
| impossible, but close enough that it wouldn't matter) goal of
| 100% client compatibility could be a better approach. Adopting
| a memory safe language would make this both tractable and
| attractive (to both developers and users.) The home truth is
| that any "new process model" effort would actually play out
| exactly this way, so why not be deliberate about it?
| atonse wrote:
| Would this basically be a new front end? Like the part that
| handles sockets and input?
|
| Or more if a rewrite of subsystems? Like the query planner or
| storage engine etc?
| topspin wrote:
| Both, I'd imagine.
|
| With regard to client compatibility there are related
| precedents for this already; the PostgreSQL wire protocol
| has emerged as a de facto standard. Cockroachdb and
| ClickHouse are two examples that come to mind.
| nextaccountic wrote:
| From what I gather postgres isn't doing conventional IPC but
| instead it uses shared memory, which means the same mechanism
| threads use but with way higher complexity
| mgaunard wrote:
| What do you think IPC is?
| topspin wrote:
| As does Oracle, and others. I'm aware.
|
| IPC, to me, includes the conventional shared memory
| resources (memory segments, locks, semaphores, condition
| variable, etc.) used by these systems: resources acquired
| by processes for the purpose of communication with other
| processes.
|
| I get it though. The most general concept of shared memory
| is not coupled to an OS "process." You made me question
| whether my concept of term IPC was valid, however. So what
| does one do when a question appears? Stop thinking
| immediately and consult a language model!
|
| Q: Is shared memory considered a form of interprocess
| communication?
|
| GPT-4: Yes, shared memory is indeed considered a form of
| interprocess communication (IPC). It's one of the several
| mechanisms provided by an operating system to allow
| processes to share and exchange data.
|
| ...
|
| Why does citing ChatGPT make me feel so ugly inside?
| faangsticle wrote:
| > Why does citing ChatGPT make me feel so ugly inside?
|
| Its the modern let me Google that for you. Just like
| people don't care what the #1 result on Google is, they
| also don't care what ChatGPT has to say about it. If they
| did, they'd ask it themselves.
| TeMPOraL wrote:
| I always understood IPC, "interprocess communication", in
| general sense, as anything and everything that can be
| used by processes to communicate with each other - of
| course with a narrowing provision that common use of the
| term refers to those means that are typically used for
| that purpose, are relatively efficient, and the process
| in question run on the same machine.
|
| In that view, I always saw shared memory as IPC, in that
| it is a tool commonly used to exchange data between
| processes, but of course it is not strictly tied to any
| process in particular. This is similar to files, which if
| you squint are a form of IPC too, and are also not tied
| to any specific process.
|
| > _Why does citing ChatGPT make me feel so ugly inside?_
|
| That's probably because, in cases like this, it's not
| much different to stating it yourself, but is more noisy.
| wbl wrote:
| Not necessarily. Man 3 shmem if you want a journey back to
| some bad ideas.
| shepardrtc wrote:
| I think this is a situation where a message-passing Actor-based
| model would do well. Maybe pass variable updates to a single
| writer process/thread through channels or a queue.
|
| Years ago I wrote an algorithmic trader in Python (and Cython
| for the hotspots) using Multiprocessing and I was able to get
| away with a lot using that approach. I had one process
| receiving websocket updates from the exchange, another process
| writing them to an order book that used a custom data
| structure, and multiple other processes reading from that data
| structure. Ran well enough that trade decisions could be made
| in a few thousand nanoseconds on an average EC2 instance. Not
| sure what their latency requirements are, though I imagine they
| may need to be faster.
|
| Obviously mutexes are the bottleneck for them at this point,
| and while my idea might be a bit slower than a low-load
| situation, perhaps it would be faster when you start getting to
| higher load.
| hamandcheese wrote:
| I think the Actor model is fine if you start there, but I
| can't imagine incrementally adopting it in a large,
| preexisting code base.
| ilyt wrote:
| That would most likely be several times slower than current
| model
| bb88 wrote:
| This reminds me of this poster: "You must be this tall..."
|
| https://bholley.net/blog/2015/must-be-this-tall-to-write-mul...
|
| Back about a decade ago I was "auditing" someone else's threaded
| code. And couldn't figure it out. But he was the company's
| "golden child" so by default it must be working code because he
| wrote it.
|
| And then it started causing deadlocks in prod.
|
| "What do you want me to do about it? It's the golden child's
| code. He's not even gonna show up til 2pm today."
| wmf wrote:
| The thing is... multi-process with a bespoke shared memory
| system isn't better than multithreading; it's much worse.
| thecopy wrote:
| This feels like developers are bored and want a challenge.
| dboreham wrote:
| It's a multi-decade ask from many PG users and a serious pain
| point for many deployments.
| Icathian wrote:
| Some interesting discussion on this here also:
| https://news.ycombinator.com/item?id=36284487
| [deleted]
| levkk wrote:
| Pretty sure Tom Lane said this will be a disaster in that same
| pgsql-hackers thread. Not entirely sure what benefits the multi-
| threaded model will have when you can easily saturate the entire
| CPU with just 128 connections and a pooler. So I doubt there is
| consensus or even strong desire from the community to undertake
| this boil the ocean project.
|
| On the other hand, having the ability to shut down and cleanup
| the entire memory space of a single connection by just
| disconnecting is really nice, especially if you have extensions
| that do interesting things.
| ilyt wrote:
| >Not entirely sure what benefits the multi-threaded model will
| have when you can easily saturate the entire CPU with just 128
| connections and a pooler.
|
| That the all of those would work faster because of performance
| benefits, as mentioned in article
| BoardsOfCanada wrote:
| From the article:
|
| > Tom Lane said: "I think this will be a disaster. There is far
| too much code that will get broken". He added later that the
| cost of this change would be "enormous", it would create "more
| than one security-grade bug", and that the benefits would not
| justify the cost.
| timcobb wrote:
| You can think of this as an opportunity to rewrite in Rust.
| tracker1 wrote:
| AfterPostgres
| pjmlp wrote:
| So when we as a whole decided that multiprocessing is a much
| better approach from security and application stability point of
| view, they decide to go with threads?
| blinkingled wrote:
| Horses for courses I guess - purely threaded vs purely MP both
| have different set of tradeoffs and shoehorning one over the
| other always fails some use cases. The article says they are
| also considering the possibility of having to keep both process
| and thread models indefinitely for this and other reasons.
|
| I know nothing of PG internals but I can see why process per
| connection model doesn't work for large machines and/or high
| number of connections. One way to do it would be to keep
| connection handling per thread and still keep multiprocess
| approach where it makes sense for security and doesn't add
| linear overheads.
| RcouF1uZ4gsC wrote:
| This would be one those places where a language like Rust would
| be helpful. In C/C++ with undefined behavior and crashes, process
| isolation makes a lot of sense to limit the blast radius. Rust
| borrow checker gives you at compile time a lot of the safety that
| you would rely on process isolation for.
| mattashii wrote:
| Yes, but note that the blast radius of a PostgreSQL process
| crash is already "the whole system reboots", so there are not a
| lot of differences between process- and thread-based PostgreSQL
| written in C.
|
| Rewriting in Rust would be interesting, but it would also
| probably be too invasive to make it worthwile at all - all code
| in PostgreSQL is C, while not all code in PostgreSQL interacts
| with the intrinsics of processes vs threads. Any rewrite to
| Rust would likely take several times more effort than a port to
| threads.
| megous wrote:
| PostgreSQL process crash may also just be, one query fails.
| ceeam wrote:
| Is there any reason at all people use intrinsically bug-prone and
| broken multithreading mode instead of fork() and IPC apart from
| WinAPI having no proper fork?
| adwn wrote:
| Yes, and some of those reasons are even listed in the article.
| ceeam wrote:
| TLB misses? They are just a detail of particular CPU
| implementation, and the architectures change. Also, aren't
| they per core and not per process? What would that solve then
| to switch to MT?
| adwn wrote:
| > _TLB misses? They are just a detail of particular CPU
| implementation, and the architectures change._
|
| TLBs are "just a detail" of roughly 100% of server,
| desktop, and mobile CPUs.
|
| > _Also, aren 't they per core and not per process? What
| would that solve then to switch to MT?_
|
| TLB entries are per address space. Threads share an address
| space, processes do not.
| tragomaskhalos wrote:
| Worked on a codebase which was separate processes, each of which
| has a shedload of global variables. It was a nightmare working
| out what was going on, not helped by the fact that there was no
| naming convention for the globals, plus they were not declared in
| a single place. I believe their use was a performance move, ie
| having the linker pin a var to a specific memory location rather
| than copying it to the stack as a variable and referencing it by
| offset the whole time. Premature optimisation? Optimisation at
| all? Who knows, but there's a good reason coding standards
| typically militate against globals.
| ajkjk wrote:
| There's something to be said for globals whose access is well-
| managed, though.
|
| IMO: if the variable is _truly_ global, i.e. code all over the
| codebase cares about it, then it should just be global instead
| of pretending like it's not with some fancy architecture.
|
| The tricky part is reacting to changes to a global variable.
| Writing a bunch of "on update" logic leads to madness. The
| ideal solution is for there to be some sort of one-directional
| flow for updates, like when a React component tree is re-
| rendered... but that's very hard to build in an application
| that doesn't start out using a library like React in the first
| place.
| shariat wrote:
| this will be the beginning of the end of postgres
| haburka wrote:
| That sounds like really hard programming. I'm glad I write react
| and get paid possibly much more.
| paulddraper wrote:
| We're glad you're writing React too :)
| Exuma wrote:
| I feel this sort of undertaking could only be done by those
| programmers who truly value domain knowledge above all else
| (money, etc). I'm more of the entrepreneureal mind so I
| generally only learn as much as needed to do some task (even if
| it's very difficult), but just seeking information as a means
| to an end doesn't feel fulfilling to me. Of course many people
| DO find that, and its upon those people's shoulders that heroic
| things like this rest, and I'm very thankful to them.
| uudecoded wrote:
| I am going to go ahead and trust Tom Lane on this one, over
| someone who is working on "serverless Postgres". Godspeed to the
| forthcoming fork.
| aseipp wrote:
| Heikki Linnakangas is one of the top Postgres contributors of
| all time, he isn't just "someone." The fact he's working for a
| startup on a fork (that already exists, which you can run right
| now on your local machine) doesn't warrant any snide dismissal.
| Robert Haas admitted that it would be a huge amount of work and
| that it would only be achievable by a small few people anyway,
| Heikki being among them.
|
| Anyway, I think there are definitely limits that are starting
| to appear with Postgres in some spots. This is probably one of
| the most difficult possible solutions to some of those
| problems, but even if they don't switch to a fully threaded
| model, being more CPU efficient, better connection handling,
| etc will all go a substantial way. Doing some of the really
| hard work is better than none of it, probably.
| selimnairb wrote:
| I wonder what AWS's PostgreSQL-compatible Aurora looks like
| under the hood. Does it use threading, processes, both?
| rburhum wrote:
| Sorry if I offend anybody, but this sounds like such a bad idea.
| I have been running various versions of postgres in production
| for 15 years with thousands of processes on super beefy machines,
| and I can tell you without a doubt that sometimes those processes
| crash - specially if you are running any of the extensions.
| Nevertheless, Postgres has 99% of the time proven to be
| resilient. The idea that a bad client can bring the whole cluster
| down because it hit a bug sounds scary. Every try creating a
| spatial index on thousands/millions of records that have nasty
| overly complex or badly digitized geometries? Sadly, crashes are
| part of that workflow, and changing this from process to
| threading would mean all the other clients also crashing and
| cutting connections. This as a potential problem because I want
| to avoid context switching overhead or cache misses, no thanks.
| mike_hock wrote:
| Also, reducing context switching overhead (or any other CPU
| overhead) is probably not gonna fix the garbage I/O
| performance.
| dan-robertson wrote:
| Is the actual number you got 99%? Seems low to me but I don't
| really know about Postgres. That's 3 and a half days of
| downtime per year, or an hour and a half per week.
| dfox wrote:
| Well, hour and half per week is the amount of downtime that
| you need for modestly sized database (units of TB) accessed
| by legacy clients that have ridiculously long running
| transactions that interfere with autovacuum.
| zeroimpl wrote:
| However, it's already the case that if a postgres process
| crashes, the whole cluster gets restarted. I've occasionally
| seen this message: WARNING: terminating
| connection because of crash of another server process
| DETAIL: The postmaster has commanded this server process to
| roll back the current transaction and exit, because another
| server process exited abnormally and possibly corrupted shared
| memory. HINT: In a moment you should be able to
| reconnect to the database and repeat your command. LOG:
| all server processes terminated; reinitializing
| niccl wrote:
| yes, but postmaster is still running to roll back the
| transaction. If you crash a single multi-threaded process,
| you may lose postmaster as well and then sadness would ensue
| jtc331 wrote:
| If you read the thread you'd see the discussion includes
| still having e.g. postmaster as a separate process.
| mattashii wrote:
| The threaded design wouldn't necessarily be single-process,
| it would just not have 1 process for every connection.
| Things like crash detection could still be handled in a
| separate process. The reason to use threading in most cases
| is to reduce communication and switching overhead, but for
| low-traffic backends like a crash handler the overhead of
| it being a process is quite limited - when it gets
| triggered context switching overhead is the least of your
| problems.
| Yoric wrote:
| Seconded. For instance, Firefox' crash reporter has
| always been a separate process, even at the time Firefox
| was mostly single-process, single-threaded. Last time I
| checked, this was still the case.
| cyberax wrote:
| PostgreSQL can recover from abruptly aborted transactions
| (think "pulled the power cord") by replaying the journal.
| This is not going to change anyway.
| cogman10 wrote:
| Transaction roll back is a part of the WAL. Databases write
| to the disk an intent to change things, what should be
| changed, and a "commit" of the change when finished so that
| all changes happen as a unit. If the DB process is
| interrupted during that log write then all changes
| associated with that transaction are rolled back.
|
| Threaded vs process won't affect that.
| dfox wrote:
| Running the whole DBMS as a bunch of threads in single
| process changes how fast is the recovery from some kind
| of temporary inconsistency. In the ideal world, this
| should not happen, but in reality it does and you do not
| want to bring the whole thing down because of some
| superficial data corruption.
|
| On the other hand, all cases of fixable corrupted data in
| PostgreSQL I have seen were result of somebody doing
| something totally dumb (rsyncing live cluster, even
| between architectures), while on InnoDB it seems to
| happen somewhat randomly without any obvious reason of
| somebody doing stupid things.
| anarazel wrote:
| We would still have a separate process doing that part of
| postmaster's work.
| tracker1 wrote:
| You can still have a master control process separate from
| the client connections.
| moonchrome wrote:
| Restart on crash doesn't sound that difficult to do.
| BoardsOfCanada wrote:
| I recently looked through the source code of postgresql and every
| source files starts with a (really good) description of what the
| file is supposed to do, which made it really easy to get in to
| the code compared to other open source projects I've seen. So
| thanks for that.
| Alekhine wrote:
| I have no idea why that isn't standard practice in every
| codebase. I should be able to figure out your code without
| having to ask, or dig through issues or commit messages. Just
| tell me what it's for!
| nologic01 wrote:
| the average programmer thinks they are writting significantly
| above average clean code, so no need to document it :-)
| ComputerGuru wrote:
| It kind of is in rust now, with module-level documentation
| given its own specific AST representation instead of just
| being a comment at the top of the file (a file is a module).
| SoylentOrange wrote:
| Because it takes a lot of time and because the comments can
| get outdated. I also want this for all my code bases. But do
| I always do this myself? No, especially on green field
| projects. I will sometimes go back and annotate them later.
| mhh__ wrote:
| They can get outdated but they usually don't. It's a good
| litmus test for if a file is too big / small if it's
| purpose is hard to nail down.
| withinboredom wrote:
| Even outdated comments can tell you the original purpose of
| the code, which helps if you're looking for a bug.
| Especially if you're looking for a bug.
|
| If someone didn't take the time to update the comments and
| the reviewers didn't point it out, then you've probably
| found the bug because someone was cowboying some shitty
| code.
| alex_reg wrote:
| I have the opposite experience.
|
| Outdated comments are often way worse than no comments,
| because they can give you wrong ideas that aren't true
| anymore, and send you off in the wrong direction before
| you finally figure out the comment was wrong.
| elteto wrote:
| Indeed. I recently found this piece of code:
| if (X) assert(false); // we never do X, ever, anywhere.
|
| Then I look over to the other pane, where I have a
| different, but related file open: if
| (exact same X) { do_useful_stuff(); }
|
| It got a chuckle out of me.
| akira2501 wrote:
| Trying to understand what I previously wrote and why I
| wrote it takes more time than I ever care to spend. I'd
| much rather have the comments, plus at this point, by
| making them a "first class" part of my code, I find them
| much easier to write and I find the narrative style I use
| incredibly useful in laying out a new structure but also in
| refactoring old ones.
| cmrdporcupine wrote:
| It sounds like the specific concerns here are actually around
| buffer pool management performance in and around the TLB: _" Once
| you have a significant number of connections we end up spending a
| *lot* of time in TLB misses, and that's inherent to the process
| model, because you can't share the TLB across processes. "_
|
| Many of the comments here seem to be missing this and talking
| about CPU-boundedness generally and thread-per-request vs process
| etc models, but this seems orthogonal to that, and is actually
| quite specific about the VM subsystem and seems like a legitimate
| bottleneck with the approach Postgres has to take for buffer/page
| mgmt with the process model it has now.
|
| I'm no Postgres hacker (or a Linux kernel hacker), and I only did
| a 6 month stint doing DB internals, but it _feels_ to me like
| perhaps the right answer here is that instead of Postgres getting
| deep down in the weeds refactoring and rewriting to a thread
| based model -- with all the risks in that that people have
| pointed out -- some assistance could be reached for by working on
| specific targeted patches in the Linux kernel?
|
| The addition of e.g. userfaultfd shows that there is room for
| innovation and acceptance of changes in and around kernel re:
| page management. Some new flags for mmap, shm_open, etc. to
| handle some specific targeted use cases to help Postgres out?
|
| Also wouldn't be the first time that people have done custom
| kernel patches or tuning parameters to crank performance out of a
| database.
| eclipticplane wrote:
| For the record, I think this will be a disaster. There is far
| too much code that will get broken, largely silently, and
| much of it is not under our control. regards,
| tom lane
|
| (via https://lwn.net/ml/pgsql-
| hackers/4178104.1685978307@sss.pgh....)
|
| If Tom Lane says it will be a disaster, I believe it will be a
| disaster.
| idiomaticrust wrote:
| He is right. Such rewrites cause a lot of problems if your
| compiler doesn't help you with avoiding data races.
|
| But there is another way.
| hnarn wrote:
| > But there is another way.
|
| Ok?
| mycall wrote:
| Microsoft SQL Server has SQLOS which is another way [0].
|
| [0] https://www.thegeekdiary.com/what-is-sql-server-
| operating-sy...
| carstenhag wrote:
| The person probably implied that Postgres should switch to
| another toolchain that guarantees more things at compile
| time, so probably Rust.
| bb88 wrote:
| You can take a chunk of code and just rewrite it in Rust.
| You'll learn a lot quickly by this.
| steve_adams_86 wrote:
| It's sort of like the inverse of the Matrix when Neo
| learns kung fu. You realize that you actually don't know
| how to program :)
| tylerhou wrote:
| The boundaries within database code are not clear. There
| are too many interlocking parts to take a nontrivial
| chunk and rewrite it Rust.
| blincoln wrote:
| If the existing code is old-school enough to use
| thousands of global variables in a thread-unsafe way,
| seems like changing it enough to compile as safe Rust
| code would push the "non-trivial" envelope pretty far.
| chc wrote:
| I think it's meant to imply the solution given in their
| username ("idiomatic Rust").
| avgcorrection wrote:
| Don't mind the gimmick gallery (username).
| [deleted]
| gremlinsinc wrote:
| Maybe a better option would be finding a team to create nugres,
| aka a fork for this and other experiments. So that mainline
| remains stable.
| datavirtue wrote:
| This should be considered a research effort, assuming it will
| be a complete rewrite. In light of that, you should not draw
| down resources from the established code base to work on it.
|
| Ignoring the above, first state the explicit requirements
| driving this change and let people weigh in on those. This
| sounds like a geeky dev itch.
| duped wrote:
| That's an awful message with the only sensible reply.
| jasonhansel wrote:
| I'm rather surprised that their focus is on improving vertical
| scalability, rather than on adding more features for scaling
| Postgres horizontally.
| tracker1 wrote:
| If you're more interested in horizontal scaling, you may want
| to look into CockroachDB, which has a Postgres compatible
| protocol, but still quite different. There are a lot more
| limitations with CDB over Pg though.
|
| With the changes suggested, I'm not sure it's the best idea
| from where Postgres is... if might be an opportunity to rewrite
| bits in Rust, but even then, there is a _LOT_ that can go
| wrong. The use of shared memory is apparently already in place,
| and the separate process and inter-process communication isn 't
| the most dangerous part... it's the presumption, variables and
| other contextual bits that are currently process globals that
| wouldn't be in the "after" version.
|
| The overall surface is just massive... That doesn't even get
| into plugin compatibility.
| mynonameaccount wrote:
| "the benefits would not justify the cost". PostgreSQL, like any
| software, at some point in it's life need to be refactored. Why
| not refactor with a thread model. Of course there will be bugs.
| Of course it will be difficult. But I think it is a worthwhile
| endeavor. Doesn't sound like this will happen but a new project
| would be cool.
| timtom39 wrote:
| > like any software, at some point in it's life need to be
| refactored.
|
| This is simply not true for most software. Software has a
| product life cycle like everything else and major
| refactors/rewrites should be weighed carefully against
| cost/risk of the refactor. Many traditional engineering fields
| do much better at this analysis.
|
| Although, because I run a contracting shop, I have personally
| profited greatly by clients thinking this is true and being
| unable to convince them otherwise.
| smsm42 wrote:
| "Difficult" doesn't even begin to do it justice. Making a code
| which has 2k global variables and probably order of magnitude
| as many underlying assumptions (the code should know that now
| every time you touch X you may be influenced or influence all
| other threads that may touch X) is a gargantuan task, and will
| absolutely for sure involve many iterations which any sane
| person would never let anywhere near valuable data (and how
| long would it take until you'd consider it safe enough?). And
| making this all performant - given that shared-state code
| requires completely different approach to thinking about
| workload distribution, something that performs when running in
| isolated processes may very well get bogged down in locking or
| cache races hell when sharing the state - would be even harder.
| I am not doubting Postgres has some very smart people - much,
| much smarter than me, in any case - but I'd say it could be
| more practical to write new core from scratch than trying to
| "refactor" the core that organically grew for decades with
| assumptions of share-nothing model.
| djur wrote:
| What you're talking about is a rewrite, not a refactor.
| gremlinsinc wrote:
| a better option would just create an experimental fork that has
| a different name and is obviously a different product but based
| on the original source. That way pg gets updates and remains
| stable and if they fail, they fail and it doesn't hurt all the
| pg in production.
| wielebny wrote:
| Having been using and administering a lot of PostgreSQL servers,
| I hope they don't lose any stability over this.
|
| I've seen (and reported) bugs that caused panics/segfaults in
| specific psql processes. Not just connections, also processes
| related to wal writing or replication. The way it's built right
| now, a child process can be just forced to quit and it does not
| affect other processes. Hopefully switching into thread won't
| force whole PostgreSQL to panic and shut down.
| tracker1 wrote:
| Most likely, the postmaster will maintain a separate process,
| much like today with pg, or similar to Firefox or Chrome's
| control process that can catch the panic'd process, cleanup and
| restart them. The WAL can be recovered as well if there were
| broken transactions in flight.
| jtc331 wrote:
| Because of shared memory most panics and seg faults in a worker
| process take down the entire server already (this wasn't always
| the case, but not doing so was a bug).
| vbezhenar wrote:
| Of course it will. That's better than continue working with
| damaged memory structures and unpredictable consequences. For
| database it's more important than ever. Imagine writing
| corrupted data because other thread went crazy.
| wizofaus wrote:
| You're implying that only an OS can provide memory separation
| between units of execution - at least in .NET AppDomains give
| you the same protection within a single process, so why
| couldn't postgres have its own such mechanism? I'd also think
| with a database engine shared state is not just in-memory -
| i.e. one process can potentially corrupt the behaviour of
| another by what it writes to disk, so moving to a single-
| process model doesn't necessarily introduce problems that
| could never have existed previously (but, yes, would arguably
| make them more likely)
| vbezhenar wrote:
| I don't know .NET enough to comment here, but I'm pretty
| sure that if you would manage to run bare metal C inside
| your .NET app (should be possible), it'll destroy all your
| domains easily. RAM is RAM. The only memory protection that
| we have is across process boundary (even that protection is
| not perfect with shared memory, but at least it allows to
| protect private memory).
|
| At least I'm not aware of any way to protect private thread
| memory from other threads.
|
| Postgres is C and that's not going to change ever.
| wizofaus wrote:
| I certainly wasn't suggesting it would make sense to
| rewrite Postgres to run on .NET (using any language, even
| managed C++, assuming anyone still uses that). Yes, it's
| inherent in the C/C++ language that it's able to randomly
| access any memory that a process has access to, and
| obviously on that basis OS-provided process-separation is
| the "best" protection you can get, just pointing out that
| it's not the only possibility.
| SigmundA wrote:
| No AppDomains are not as good as processes, I have tried to
| go that route before, you cannot stop unruly code reliably
| in an app domain (you must use thread.abort() which is not
| good) and memory can still leak in any native code used
| there.
|
| The only reliable way to stop bad code like say an infinite
| loop is to run in another process even in .Net.
|
| They also removed Appdomain in later versions of .Net
| because they had little benefit and weak protections
| compared to a a full process.
| wizofaus wrote:
| Not claiming they're as good, just noting that there are
| alternative ways to provide memory barriers, though
| obviously if it's not enforced at the language/runtime
| level, it requires either super strong developer disciple
| or the use of some other tool to do so. I can't find
| anything suggesting AppDomains have been removed
| completely though, just they're not fully supported on
| non-Windows platforms, which is interesting, I wonder if
| that means they do have OS-level support.
| SigmundA wrote:
| https://learn.microsoft.com/en-
| us/dotnet/api/system.appdomai...
|
| "On .NET Core, the AppDomain implementation is limited by
| design and does not provide isolation, unloading, or
| security boundaries. For .NET Core, there is exactly one
| AppDomain. Isolation and unloading are provided through
| AssemblyLoadContext. Security boundaries should be
| provided by process boundaries and appropriate remoting
| techniques."
|
| AppDomains pretty much only allowed you to load unload
| assemblies and provided little else. If you wanted to
| stop bad code you still used Thread.Abort which left your
| runtime in a potentially bad state due to no isolation
| between threads.
|
| The only way to do something like an AppDomain to replace
| process isolation would be to re-write the whole OS in a
| memory safe language similar to
| https://en.wikipedia.org/wiki/Midori_(operating_system) /
| https://en.wikipedia.org/wiki/Singularity_(operating_syst
| em)
| wizofaus wrote:
| Is that saying global variables are shared between
| AppDomains on .NET core then? Scary if so, we have a
| bunch of .NET framework code we're looking at porting to
| .NET core in the near future, and I know it relies on
| AppDomain separation currently. It's not the first
| framework->Core conversation I've done, but I don't
| remember changes in AppDomain behaviour causing any
| issues the first time.
|
| As it happens I already know there are bits of code
| currently not working "as expected" exactly because of
| AppDomain separation - i.e. attempting to use a shared-
| memory cache to improve performance and in one or two
| cases in an attempt to share state, and I got the
| impression whoever wrote that code didn't understand that
| there even were two AppDomains involved, and used various
| ugly hacks to "fall back" to alternative means of state-
| sharing, but in fact the fall-back is the only thing that
| actually ever works.
| [deleted]
| rbancroft wrote:
| Changing something so fundamental seems like it should be a
| rewrite.
| papito wrote:
| This has Python 3 vibes.
| newaccount74 wrote:
| A big advantage of the process-based model is its resilience
| against many classes of errors.
|
| If a bug in PostgreSQL (or in an extension) causes the server to
| crash, then only that process will crash. Postmaster will detect
| the child process termination, and send an error message to the
| client. The connection will be lost, but other connections will
| be unaffected.
|
| It's not foolproof (there are ways to bring the whole server
| down), but it does protect against many error conditions.
|
| It is possible to trap on some exceptions in a threaded
| environment, but cleaning up after eg. an attempted NULL pointer
| dereference is going to be very difficult or impossible.
| anarazel wrote:
| We would still have a separate supervisor process of we moved
| connections to threads.
| baggy_trough wrote:
| I hope they are conservative about this, because even the
| smartest and best programmers in the world cannot create bug free
| multithreaded code.
| jerf wrote:
| I mentally snarked to myself that "obviously they should
| rewrite it in Rust first".
|
| Then, after more thought, I'm not entirely sure that would be a
| bad approach. I say this not to advocate for actually rewriting
| it in Rust, but as a way of describing how difficult this is.
| I'm not actually sure rewriting the relevant bits of the system
| in Rust _wouldn 't_ be easier in the end, and obviously, that's
| really, really hard.
|
| This is _really_ hard transition.
|
| I don't think multithread code quality should be measured in
| absolutes. There are things that are so difficult as to be
| effectively impossible, which is the lock-based approach that
| was dominant in the 90s, and convinced developers that it's
| just impossible difficult, but it's not multithreaded code
| that's impossibly difficult, it's lock-based multithreading.
| Other approaches range from doable to even not that hard once
| you learn the relevant techniques (Haskell's full immutability
| & Rust's borrow checker are both very solid), but of course
| even "not that hard" becomes a lot of bugs when scaled up to
| something like Postgres. But it's not like the current model is
| immune to that either.
| xwdv wrote:
| Nonsense, multithreaded code can be written as bug free as
| regular code. No need to fear.
| preordained wrote:
| It _can_ be. Anything can be. It is far more treacherous,
| though.
| taeric wrote:
| I think the point is that some mistakes in process based code
| are not realized as the bugs that they will be in threaded
| code?
| dboreham wrote:
| This is true. However, the blast radius may be smaller with a
| process model. Also recovering from a fatal error in one
| session could possibly be easier. I say this as a 30-year
| threading proponent.
| PhilipRoman wrote:
| I'm assuming you're referring to formally proven programs. If
| that's the case, do you have any pointers?
|
| Aside from the trivial while(!transactionSucceeded){retry()}
| loop, I have trouble proving the correctness of my programs
| when the number of threads is not small and finite.
| baggy_trough wrote:
| In theory, yes. In practice, no.
| ajkjk wrote:
| It is just harder.
| mmphosis wrote:
| _Concurrency isn't a "nice layer over pthreads" - the most
| important thing is isolation - anything that mucks up isolation
| is a mistake.
|
| -- Joe Armstrong_
|
| Threads are evil. https://www.sqlite.org/faq.html#q6
| https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-...
|
| Nginx uses an asynchronous event-driven approach, rather than
| threads, to handle requests.
| https://aosabook.org/en/v2/nginx.html
| http://www.kegel.com/c10k.html
| jrott wrote:
| The fact that they are planning on doing this across multiple
| releases gives me hope that they'll be cautious with this.
| johannes1234321 wrote:
| The code already is multithreaded. They have shared state just
| across multiple processes instead of threads within a process.
|
| They might even reduce complexity that way.
| usefulcat wrote:
| It's not the same at all for global variables, of which pgsql
| apparently has around a couple thousand.
|
| If every process is single threaded, you don't have to
| consider the possibility of race conditions when accessing
| any of those ~2000 global variables. And you can pretty much
| guarantee that little if any of the existing code was written
| with that possibility in mind.
| ants_a wrote:
| Those global variables would be converted to thread locals
| and most of the code would be oblivious of the change. This
| is not the hard part of the change.
| anarazel wrote:
| Postgres is already concurrent today. There's a _lot_ of shared
| state between the processes (via shared memory).
| mastax wrote:
| It would be interesting to have something between threads and
| processes. I'll call them heavy-threads for sake of discussion.
|
| Like light-threads, heavy-threads would share the same process-
| security-boundary and therefore switching between them would be
| cheap. No need to flush TLB, I$, D$.
|
| Like processes, heavy-threads would have mostly-separate address
| spaces by default. Similar to forking a process, they could share
| read-only mappings for shared libraries, code, COW global
| variables, and explicitly defined shared writable memory regions.
|
| Like processes, heavy-threads would isolate failure states. A C++
| exception, UNIX signal, segfault, etc. would kill only the heavy-
| thread responsible.
| mike_hearn wrote:
| There are some problems.
|
| 1. Mostly separate address spaces requires changing the TLB on
| context switch (modern hw lets it be partial). You could use
| MPKs to share a single address space with fast protection
| switches.
|
| 2. Threads share the global heap, but your heavy threads would
| require explicitly defined shared writeable memory regions, so
| presumably each one has its own heap. That's a fair bit of
| overhead.
|
| 3. Failure isolation is more complicated than deciding what to
| kill.
|
| The expand on the last point, Postgres _doesn 't_ isolate
| failures to a single process because they do share memory and
| might corrupt those shared memory regions. But even if you
| don't have shared memory failure recovery isn't always easy.
| Software has to be written specifically to plan for it. You can
| kill processes because everything in the OS is written around
| allowing for that possibility, for example, shells know what to
| do if a sub-process is killed unexpectedly. Killing a heavy
| thread (=process) is no good if the parent process is going to
| wait for a reply from it forever because it wasn't written to
| handle the process going away.
| dasyatidprime wrote:
| So what would be different between those and forked processes?
| ShroudedNight wrote:
| I've been pondering / ruminating with this too; I've been
| somewhat surprised that few operating systems have played with
| reserving per-thread address space as thread-local storage, or
| requiring something akin to a 'far' pointer to access commonly-
| addressed shared memory.
| wbl wrote:
| You cannot COW and share the TLB state. The caches aren't
| flushed in process changes either: it's that the data is
| different so evictions happen.
| mattashii wrote:
| > No need to flush TLB
|
| TLB isn't "flushed" so much as it is useless across different
| memory address spaces. Switching processes means switching
| address spaces, which means you have to switch the contents of
| the TLB to the new process' TLB entries, which eventually
| indeed flushes the TLB, but that is only over time, not
| necessarily the moment you switch processes.
|
| > Like processes, heavy-threads would have mostly-separate
| address spaces by default.
|
| This thus conflicts with the need to not flush TLBs. You can't
| not change TLB contents across address spaces.
| lukeschlather wrote:
| This sounds like a problem that would border on the complexity of
| replacing the GIL in Ruby or Python. The performance benefits are
| obvious but it seems like the correctness problems would be
| myriad and a constant source of (unpleasant) surprises.
| narrator wrote:
| The correctness problem should be handled by a suite of
| automated tests which PostgreSQL has. If all tests pass, the
| application must work correctly. The project is too big, and
| has too many developers to make much progress without full test
| coverage. Where else would up-to-date documentation regarding
| the correct behavior of PostgreSQL exist? In some developers
| head? SQLite is pretty famous for there extreme approach to
| testing including out of memory conditions, and other rare
| circumstances: https://www.sqlite.org/testing.html
| abalashov wrote:
| > If all tests pass, the application must work correctly.
|
| These are "famous last words" in many contexts, but when
| talking about difficult-to-reproduce parallelism issues, I
| just don't think it's a particularly applicable viewpoint at
| all. No disrespect. :)
| lukeschlather wrote:
| Parallelism is often incredibly hard to write automated tests
| for, and this will most likely create parallelism issues that
| were not dreamed of by the authors of the test suite.
| MuffinFlavored wrote:
| Does GIL stand for Global Interpreter Lock?
| Yujf wrote:
| yes
| cactusfrog wrote:
| This is different because there isn't a whole ecosystem of
| packages that depend on access to a thread unsafe C API.
| Getting the GIL out of core Python isn't too challenging.
| Getting all of the packages that depend on Python's C API
| working is.
| masklinn wrote:
| An other component of the Gil story is that removing the Gil
| require adding fine grained locks, which (aside from making
| VM development more complicated) significantly increases lock
| traffic and thus runtime costs, which noticeably impacts
| single-threaded performance, which is of major import.
|
| Postgres starts from a share-nothing architecture, it's quite
| a bit easier to evaluate the addition of sharing.
| bsder wrote:
| > which noticeably impacts single-threaded performance,
| which is of major import.
|
| 1) I don't buy this a priori. Almost everybody who removed
| a gigantic lock suddenly realizes that there was more
| contention than they thought and that atomizing it made
| performance improve.
|
| 2) Had Python bitten the bullet and removed the GIL back at
| Python 3.0, the performance would likely already be back to
| normal or better. You can't optimize hypothetically.
| Optimization on something like Python is an accumulation of
| lots of small wins.
| anarazel wrote:
| Postgres already shares a lot of state between processes
| via shared memory. There's not a whole lot that would
| initially change from a concurrency perspective.
| ComputerGuru wrote:
| > which (aside from making VM development more complicated)
| significantly increases lock traffic and thus runtime
| costs, which noticeably impacts single-threaded
| performance, which is of major import.
|
| I don't think that's a fair characterization of the trade
| offs. Acquiring uncontended mutexes is basically free (and
| fairly side-effect free) so single-threaded performance
| will not be noticeably impacted.
|
| Every large C project I'm aware of (read: kernels) that has
| publicly switched from coarse locks to fine-grained locks
| has considered it to be a huge win with little to no impact
| on single-threaded performance. You can even gain
| performance if you chop up objects or allocations into
| finer-grained blobs to fit your finer-grained locking
| strategy because it can play nicer with cache friendliness
| (accessing one bit of code doesn't kick the other bits of
| code out of the cache).
| erikpukinskis wrote:
| > there isn't a whole ecosystem of packages that depend on
| access to a thread unsafe C API
|
| They mentioned a similar issue for Postgres extensions, no?
|
| > Haas, though, is not convinced that it would ever be
| possible to remove support for the process-based mode.
| Threads might not perform better for all use cases, or some
| important extensions may never gain support for running in
| threads.
| scolby33 wrote:
| I question how important an extension is if there's not
| enough incentive to port it to the new paradigm, at least
| eventually.
| abalashov wrote:
| Well. The thing with that is just that there are a lot of
| extensions. Like, a lot!
| gjvc wrote:
| surely this is "Some guy reconsiders the process-based model of
| PostgreSQL"
| Icathian wrote:
| Uh. Heikki is definitely not just "some guy". Dude is one of
| the top contributors to Postgres.
| gjvc wrote:
| How does that make him immune to having dumb ideas? See, I'm
| judging the idea on merit. You're just defending your hero.
| sargun wrote:
| I'm curious if they can take advantage of vfork / CLONE_VM, to
| get the benefits of sharing memory and lower overhead context
| switches, with the trade of still getting benefits from the
| scheduler, and sysadmin-friendliness.
|
| The other thing that might be interesting is FUTEX_SWAP / UMCG.
| Although it doesn't remove the overhead induced by context
| switches entirely (specifically, you would still deal with TLB
| misses), you can avoid dealing with things like speculative
| execution exploit mitigations.
| nneonneo wrote:
| Per the article, Postgres has many, many global variables, many
| of which track per-session state; much session state is "freed"
| via process exit rather than being explicitly cleaned up.
| Switching to CLONE_VM requires these problems to all be solved.
| why-el wrote:
| what about support for Windows?
| jupp0r wrote:
| Please don't use mutable global state in your work. Global
| variables are universally bad and don't provide much of a
| benefit. The number of desirable architectural refactoring that
| I've witnessed turning into a muddy mess because of them is
| daunting. This is one more example of this.
| orthoxerox wrote:
| You know what a database is, do you? It is the place where you
| store your mutable global state. You can't kick the can down
| the road forever, _someone_ has to tackle the complexity of
| managing state.
| slashdev wrote:
| Thank you for sharing your ideological views, but this is not
| the appropriate venue for that. If you want to have a software
| _engineering_ discussion about the trade offs involved in
| sharing global mutable state, this is a good venue for that.
| All engineering is trade offs. As soon as you make blanket
| statements that X is always bad, you've transitioned into the
| realm of ideology. Now presumably you mean to say it's almost
| always bad. But that really depends on the context. It may well
| be almost always bad in average software projects, but
| PostgreSQL is not your average software project. Databases are
| a different realm.
| refulgentis wrote:
| Global mutable state being a poor choice in software
| architecture isn't an ideology. There is no ideology that
| argues it is awesome.
|
| If you want to have a software _engineering_ discussion about
| the trade offs involved in sharing global mutable state, this
| is a good venue for that.
|
| All engineering is trade offs. As soon as you start telling
| people they're making blanket statements that X is always
| bad, you've transitioned into the realm of nitpicking.
| slashdev wrote:
| It's awesome where performance considerations are
| paramount. It's awesome in databases. It's awesome in
| embedded software. It's awesome in operating system
| kernels.
|
| The fact is sometimes it's good. Saying it's universally
| bad is going beyond the realm of logic and evidence and
| into the realm of ideology.
| megous wrote:
| Using globals is simpler, it's also pretty natural in event
| driven architectures. Passing everything via function
| arguments is welcome for library code, but there's little
| point to using it in application code. It just complicates
| things.
| [deleted]
| CodeWriter23 wrote:
| "no objections" <> "consent"
| ed25519FUUU wrote:
| Have you ever tried to move a large organization forward in a
| certain direction? It's really hard. At some point you have to
| make a decision.
| timcobb wrote:
| Not in something like Postgres, I hope
| formerly_proven wrote:
| ngmi
| chasil wrote:
| Oracle has similar problems.
|
| On UNIX systems, Oracle uses a multi-process model, and you can
| see these: $ ps -ef | grep smon USER
| PID PPID STARTED TIME %CPU %MEM COMMAND oracle 22131
| 1 Mar 28 3:09 0.0 4.0 ora_smon_yourdb
|
| Windows forks processes about 100x slower than Linux, so Oracle
| runs threaded on that platform in one great big PID.
|
| Sybase was the first major database that fully adopted threads
| from an architectural perspective, and Microsoft SQL Server has
| certainly retained and improved on that model.
| EvanAnderson wrote:
| > Windows forks processes about 100x slower than Linux...
|
| I work with a Windows-based COTS webapp that uses Postgres w/o
| any connection pooling. It's nearly excruciating to use because
| it spins-up new Postgres processes for each page load. If not
| for the fact that the Postgres install is "turnkey" with the
| app I'd just move Postgres over to a Linux machine.
| devit wrote:
| Use pgbouncer
| ethbr0 wrote:
| Was curious about this as an architectural solution as
| well.
|
| We're really talking about X-per-client as the primary
| reason to move away from processes, right?
|
| So if you can get most of the benefit via pooling... why
| inherit the pain of porting?
|
| Presumably latency jitter would be a difficult problem with
| pools, but it seems easier (and safer) than porting
| processes -> threads.
|
| Disclaimer: High performance / low latency DB code is
| pretty far outside my wheelhouse.
| ddorian43 wrote:
| > We're really talking about X-per-client as the primary
| reason to move away from processes, right?
|
| Many other things too. Like better sharing of caches.
| Lower overhead of thread instead of process. Etc. (read
| the thread)
| ilyt wrote:
| The reasons are explained in article. Read the article
| treis wrote:
| That helps a lot but it's not a replacement for large
| number of persistent connections. If you had that you could
| simplify things in the application layer and do interesting
| things with the DB.
| ComputerGuru wrote:
| If you run postgres under WSLv1 (now available on Server
| Edition as well), the WSL subsystem handles processes and
| virtual memory in a way that has been specifically designed
| to optimize process initialization as compared to the
| traditional Win32 approach.
| chasil wrote:
| It would not be difficult to simply "pg_dump" all the data to
| Postgres on a Linux machine, then quietly set the clients to
| use the new server.
| blinkingled wrote:
| Didn't Oracle switch to threaded model in 12c - at least on
| Linux I remember there being a parameter to do that - it
| dropped the number of processes significantly.
| chasil wrote:
| No, I ran that on v19. $ ps -ef | grep smon
| UID PID PPID C STIME TTY TIME CMD
| oracle 22131 1 0 Mar28 ? 00:03:09
| ora_smon_yourdb $ $ORACLE_HOME/bin/sqlplus -silent
| '/ as sysdba' select version_full from v$instance;
| VERSION_FULL ----------------- 19.18.0.0.0
| blinkingled wrote:
| https://oracle-base.com/articles/12c/multithreaded-model-
| usi...
|
| Probably still requires the parameter to be set.
| chasil wrote:
| Contrast this to Microsoft SQL Server: $
| systemctl status mssql-server * mssql-
| server.service - Microsoft SQL Server Database Engine
| Loaded: loaded (/usr/lib/systemd/system/mssql-
| server.service; disabled; vendor preset: disabled)
| Active: active (running) since Mon 2023-06-19 15:48:05
| CDT; 1min 18s ago Docs:
| https://docs.microsoft.com/en-us/sql/linux Main
| PID: 2125 (sqlservr) Tasks: 123
| CGroup: /system.slice/mssql-server.service
| +-2125 /opt/mssql/bin/sqlservr +-2156
| /opt/mssql/bin/sqlservr
| blinkingled wrote:
| Yeah multiprocess isn't Microsoft's style given how
| expensive creating processes is on Windows.
|
| Oracle - never had a scalability issue on very big Linux,
| Solaris and HPUX systems though - they do it well in my
| experience.
| hans_castorp wrote:
| > Didn't Oracle switch to threaded model in 12c
|
| It's optional, and the default is still a process model on
| Linux.
| 0xbadcafebee wrote:
| So compromise. Take the current process model, add threading and
| shared memory, with feature flags to limit number of processes
| and number of threads.
|
| Want to run an extension that isn't threadsafe? Run with 10
| processes, 1 threads. Want to run high-performance? Run with 1
| process, 10 threads. Afraid of "stability issues"? Run with 1
| process, 1 thread.
|
| Will it be hard to do? Sure. Impossible? Not at all. Plan for it,
| give a very long runway, throw all your new features into the
| next major version branch, and tell people everything else is off
| the table for the next few years. If you're _really sure_
| threading is going to be increasingly necessary, better to start
| now than to wait until it 's too late. But this idea of "oh it's
| hard", "oh it's dangerous", "too complicated", etc is bullshit.
| We've built fucking spaceships that visit other planets. We can
| make a database with threads that doesn't break. Otherwise we
| admit that basic software development using practices from the
| past 30 years is too much for us to figure out.
| EGreg wrote:
| I hope they don't do it.
|
| I've had a similar situation with PHP, where we had written quite
| a large engine (https://github.com/Qbix/Platform) with many
| features (https://qbix.com/features.pdf) . It took advantage of
| the fact that PHP isolated each script and gave it its own global
| variables, etc. In fact, much of the request handling did stuff
| like this: Q_Request::requireFields(['a', 'b',
| 'c']); $uri = Q_Dispatcher::uri();
|
| instead of stuff like this:
| $this->getContext()->request()->requireFields(['a', 'b', 'c']);
| $this->getContext()->dispatcher()->uri();
|
| Over the last few years, I have run across many compelling
| things: amp reactPHP Swoole (native
| extension) Fibers (inside PHP itself)
|
| It seemed so cool! PHP could behave like Node! It would have an
| event loop and everything. Fibers were basically PHP's version of
| Swoole's coroutines, etc. etc.
|
| Then I realized... we would have to go through the entire code
| and redo how it all works. We'd also no longer benefit from PHP's
| process isolation. If one process crapped out or had a memory
| leak, it could take down everything else.
|
| There's a reason PHP still runs 80% of all web servers in the
| world (https://kinsta.com/blog/is-php-dead/) ... and one of the
| biggest is that commodity servers can host terrible PHP code and
| it's mostly isolated in little processes that finish "quickly"
| before they can wreak havoc on other processes or on long-running
| stuff.
|
| So now back to postgres. It's been praised for its rock-solid
| reliability and security. It's got so many features and the MVCC
| is very flexible. It seems to use a lot of global variables. They
| can spend their time on many other things, like making it
| byzantine-fault-tolerant, or something.
|
| The clincher for me was when I learned that php-fpm (which spins
| up processes which sleep when waiting for I/O) is only 50% slower
| than all those fancy things above. Sure, PHP with Swoole can
| outperform even Node.js, and can handle twice as many requests.
| But we'd rather focus on soo many other things we need to do :)
| zackmorris wrote:
| I've been using PHP for decades and have found its isolated
| process model to be about the best around, certainly for any
| mainstream language. Also Symfony's Process component
| encapsulates most of the errata around process management in a
| cross-platform way:
|
| https://symfony.com/doc/current/components/process.html
|
| Going from a working process implementation to async/threads
| with shared memory is pretty much always a mistake IMHO,
| especially if it's only done for performance reasons. Any speed
| gains will be eclipsed by endless whack-a-mole bug fixes, until
| the code devolves into something unrecognizable. Especially
| when there are other approaches similar to map-reduce and
| scatter-gather arrays where data is processed in a distributed
| fashion and then joined into a final representation through
| mechanisms like copy-on-write, which are supported by very few
| languages outside of PHP and the functional programming world.
|
| The real problem here is the process spawning and context-
| switching overhead of all versions of Windows. I'd vote to
| scrap their process code in its entirety and write a new
| version based on atomic operations/lists/queues/buffers/rings
| with no locks and present an interface which emulates the
| previous poor behavior, then run it through something like a
| SAT solver to ensure that any errata that existing software
| depends on is still present. Then apps could opt to use the
| direct unix-style interface and skip the cruft, or refactor
| their code to use the new interface.
|
| Apple did something similar to this when OS X was released,
| built on a mostly POSIX Darwin, NextSTEP, Mach and BSD Unix. I
| have no idea how many times Microsoft has rewritten their
| process model or if they've succeeded in getting performance on
| par with their competitors (unlikely).
|
| Edit: I realized that the PHP philosophy may not make a lot of
| sense to people today. In the 90s, OS code was universally
| terrible, so for example the graphics libraries of Mac and
| Windows ran roughly 100 times slower than they should for
| various reasons, and developers wrote blitters to make it
| possible for games to run in real time. That was how I was
| introduced to programming. PHP encapsulated the lackluster OS
| calls in a cross-platform way, using existing keywords from
| popular languages to reduce the learning curve to maybe a day
| (unlike Perl/Ruby, which are weird in a way that can be fun but
| impractical to grok later). So it's best to think of PHP more
| like something like Unity, where the nonsense is abstracted and
| developers can get down to business. Even though it looks like
| Javascript with dollar signs on the variables. It's also more
| like the shell, where it tries to be as close as possible to
| bare-metal performance, even while restricted to the 100x
| interpreter slowdown of languages like Python. I find that PHP
| easily saturates the processor when doing things in a data-
| driven way by piping bytes around.
| js4ever wrote:
| Finally! This and a good multi master story and I'll finally
| start to love Postgres
| waselighis wrote:
| It sounds to me like migrating to a fully multi-threaded
| architecture may not be worth the effort. Simply reducing the
| number of processes from thousands to hundreds would be a huge
| win and likely much more feasible than a complete re-
| architecture.
| chucky_z wrote:
| I wish they would do some kind of easy shared storage instead, or
| in addition too. This sounds like an odd solution, however I've
| scaled pgsql since 9 on very, very large machines and doing 1
| pgsql cluster per physical socket ended up doing near-linear
| scaling even on 100+ total core machines with TB+ of memory.
|
| The challenge with this setup is that you need to do 1 writer and
| multiple reader clusters so you end up doing localhost
| replication which is super weird. If that requirement was somehow
| removed that'd be awesome for scaling really huge clusters.
| t43562 wrote:
| Calm down guys! Threading is tricky but they can rewrite it all
| in Rust so it'll be completely ok........
|
| ;-)
___________________________________________________________________
(page generated 2023-06-19 23:00 UTC)