[HN Gopher] The PostgreSQL documentation and the limitations of ...
___________________________________________________________________
The PostgreSQL documentation and the limitations of community
Author : zdw
Score : 91 points
Date : 2023-06-14 16:34 UTC (1 days ago)
(HTM) web link (rhaas.blogspot.com)
(TXT) w3m dump (rhaas.blogspot.com)
| lovasoa wrote:
| I was recently reading the documentation for pgcrypto to
| implement user authentication in SQLPage:
|
| https://www.postgresql.org/docs/current/pgcrypto.html
|
| The page contains the documentation of many functions, all of
| which raise the following error by default when you run them: No
| function matches the given name and argument types.
|
| It turns out you first have to "install" them by running "create
| extension pgcrypto", which is obvious to someone who already
| knows postgres modules well, but not to anyone else, and isn't
| mentioned anywhere on the page!
| pgaddict wrote:
| I think the main limitation of our docs is that it mostly
| explains what the pieces do, not how to use them to achieve a
| particular goal. For example, we have pretty good documentation
| of all the pieces to do HA, we just don't tell people how to
| assemble them together.
|
| The reason is, I think, that flexibility is a pretty fundamental
| part of the project. We're great at providing building blocks
| (and documenting them), but we steer clear of describing a
| particular way to assemble them together.
|
| For example, we might describe a particular HA approach, but then
| that would be perceived as "recommended / official" way, giving
| it preference over other (and equally valid) approaches and
| tooling. These "how to" docs are bound to be way more
| opinionated, so we just focus on documenting the pieces.
|
| In other words, our docs are written by devs for devs, and we
| leave the higher level stuff to tutorials written by others etc.
| derefr wrote:
| These concerns seem to be specific to cases where there are
| various competing high-level design "strategies" with political
| weight behind them.
|
| There are cases where PG is missing high-level docs where I
| don't think this applies.
|
| For example, there's no official doc on _how to write_ PL
| /pgSQL code. There's just an extremely-low-level language
| reference, covering each syntax element separately. There's no
| cookbook (other than the few examples per syntax element that
| exist to document the edge-cases of use of that syntax
| element); no tutorial; no efficiency/performance/scalability
| guide discussing when certain language features should be
| favored over others given the current way they're executed
| (e.g. is IF-ELSE, CASE-WHEN, or a series of IFs with early
| returns cheaper? when should I favor using FOR with a query,
| vs. when should I query data into an in-memory array variable
| and then use FOREACH, vs. when should I query data into a
| TEMPORARY table and then query that?); no place where you can
| get a sense for how procedure CALLs interact with MVCC (e.g.
| when they acquire + release locks, and therefore how and when
| they cause blocking on contended tables vs. how and when a
| SELECTed function that uses dblink/fdw to run independent txs
| would do so); etc. There isn't even a single mention of which
| PL/pgSQL exceptions are potentially raised by what PG builtin
| functions when called in a PL/pgSQL context; how to name those
| exceptions to match on them to catch them; or how to raise them
| yourself. I often need to dig into the PG source code to figure
| that out! (PL/pgSQL honestly feels, in docs terms, like a
| proprietary third-party language-engine "plugin" that someone
| bolted on, where the docs were expected to be provided by the
| third party, but never were. But it's not! It's a first-party
| language, and the reference implementation of how to create a
| language extension!)
| briffle wrote:
| Another good example is the differences in the documentation
| for Indexes, vs the https://use-the-index-luke.com/ that
| explains many of the reasons WHY you want to organize it with
| great examples.
|
| A problem I have is so many tutorials, or 'best practices' I
| find on the internet are for older versions that don't really
| apply as well in newer versions of postgres. Like searching for
| logical replication, you find lots of information for
| pg_logical for older versions of postgres, but many of those
| parts are now baked into postgres, but with a different syntax,
| etc.
|
| I would love to see a 'tutorials/guide' and 'best practices'
| part of the documentation that is updated with each new
| release, that give examples of the most common tasks, and
| when/why to use them, and when to move to something more
| advanced.
|
| Some really basic stuff like "this is the 3 best ways to handle
| replication in version 15, and the 2 or 3 most common ways to
| do backups, or these are the recommended ways to migrate from
| the previous version either in place, or to a new server, etc.
| akira2501 wrote:
| I really miss old-school printed documentation's "Theory of
| Operation" section. To me it's the most useful way to bridge
| this gap. The technical and operations manual describe all the
| parts and how they function, but the theory of operation really
| laid out how and _why_ all of these things were structured the
| way they were.
|
| It also forced the designers to think in those terms and to
| document the product from an overall perspective rather than a
| component perspective. It was high level enough to be useful,
| but not so high level as to be abstracted into hand holding
| tutorial exercises.
|
| I feel like most modern software documentation entirely misses
| this component and would benefit greatly from having it.
| kaycebasques wrote:
| Can you link me to a good old school "theory of operation"
| section? I get the idea but I want to see firsthand what you
| mean.
| giovannibonetti wrote:
| Related: Diataxis - A systematic framework for technical
| documentation authoring [1]
|
| "The Diataxis framework aims to solve the problem of
| structure in technical documentation. It adopts a systematic
| approach to understanding the needs of documentation users in
| their cycle of interaction with a product.
|
| Diataxis identifies four modes of documentation - tutorials,
| how-to guides, technical reference and explanation. It
| derives its structure from the relationship between
| them.(...)"
|
| [1] https://diataxis.fr/
| Rapzid wrote:
| > I think the main limitation of our docs is that it mostly
| explains what the pieces do, not how to use them to achieve a
| particular goal
|
| I honestly prefer this type of documentation. ASP.NET Core has
| the complete opposite problem where it's too example based.
| friendzis wrote:
| This reminds me of technical documentation for embedded
| devices. Usually you get _multiple_ classes of documents: data
| sheets, application notes, reference designs, user guides,
| erratas.
|
| The problems described come from trying to be everything in one
| place, but it does not have to be. As I understand you try to
| be mostly a data sheet, which is probably a net good, because
| it is _the_ document needed to be maintained, even if hard to
| navigate.
|
| However, there are more document classes that can be produced.
| Yes, a reference design is inevitably going to be opinionated,
| whether it is produced by project team or some internet person.
| A reference design produced by project team at least has a
| fighting chance at staying somewhat up to date. And one can
| discuss tradeoffs between different approaches in an
| application note.
| emodendroket wrote:
| I suppose I can see that but I 1) rarely use the index since,
| like most users, I generally find myself looking at the docs
| after a Web search 2) have generally found psql documentation to
| be excellent.
| fdr wrote:
| I think Haas is basically right, that the structure flows from
| the community structure, and that it's not clear alterations
| would be a net win. pgsql-hackers is producing the kind of docs
| only they can, but many usful kinds of docs they cannot produce
| (per Haas's theory, e.g. more narrative in nature) are delegated
| to the relative anarchy of the Internet, in blogs, comment
| threads, stack exchange, and such.
|
| While there should be a lot of hesitancy at the implied
| assumption that any particular arrangement is at the efficient
| frontier -- most situations can be improved in most or all
| dimensions -- an exchanged loss in the kind of documents that
| pgsql-hackers is suited to producing is hard to replace.
| kaycebasques wrote:
| Here's my perspective. I've been a technical writer (TW) for ~10
| years. 3 at an IoT startup, 7 at Google.
|
| > The strengths of this process are also its weaknesses. A
| developer is, by definition, someone who spends the majority of
| their time doing development, which is to say writing code.
| Updating the documentation becomes a task that must be completed
| so that the code one has written can get committed so that one
| can move on to the next project and write some more code.
|
| I may be misinterpreting, but I get the sense that the author
| feels that there is some kind of more optimal way to split up
| docs duties. IMO there is not. At least, not for reference docs.
| As the author said, the people implementing the code are in the
| best position to keep the reference information up-to-date.
|
| If I grokked the rest of the article correctly, the author is
| essentially saying that the engineers have trouble writing and
| maintaining the other main types of docs [1] --- guides,
| tutorials, and overviews (explanations). It also sounds like they
| are having a "too many cooks in the kitchen" problem with pushing
| through changes to the other docs. I have a simple answer to
| that: hire some strong technical writers and make it clear to
| everyone that the TWs are Responsible and Accountable [2] for
| those docs. Also, make it explicit that the engineers are a
| Consulted role when it comes to guides / tutorials / overviews.
| Writing these types of docs is hard, specialized work. As the
| author said, the engineers have lots of other priorities. Of
| course, it's a bit self-serving for a TW to say "the solution is
| to hire TWs" but I get the sense that people don't realize that
| the easiest way to get good guides / tutorials / overviews is to
| hire people who have thought long and hard about those
| specialized tasks. If you want a good database, you don't expect
| your TWs to do the job. You get a database engineer. If you want
| good tutorials / guides / overviews, you likewise shouldn't
| expect your database engineer to do the job. You get TWs.
|
| [1] https://diataxis.fr
|
| [2]
| https://en.m.wikipedia.org/wiki/Responsibility_assignment_ma...
| kaycebasques wrote:
| Just re-read the last two paragraphs from Haas. I have a couple
| further comments / questions.
|
| Quote from Haas:
|
| > But if on the other hand I propose some change to
| documentation that has existed for a long time, or some kind of
| structural change, there's a lot more room for disagreement.
| Because the change isn't strictly mechanical, the right answer
| is a lot more subjective. And because it's a change to existing
| content rather than the addition of new content, many more
| people will be familiar with it and have opinions on how it
| ought to be changed, if at all. Consequently, even when some
| developer does take time away from writing code to try to make
| some larger change to the documentation, it's often an uphill
| battle to get anything done, and people typically have to be
| content with small improvements.
|
| Honest question: do PostgreSQL engineering decisions have the
| same dynamics as what's described here for docs decisions? If
| not, what is different about engineering decisions versus docs
| decisions? Is it just that engineering decisions can be
| literally benchmarked whereas docs decisions do not seem
| benchmark-able? Are there any other potential explanations for
| different dynamics between eng and docs?
|
| If it does indeed just boil down to "docs are not
| benchmarkable" then I would suggest otherwise. You can create
| docs benchmarks. They won't have the rigor of engineering
| benchmarks but they at least establish some notion of docs
| quality and facilitate more targeted discussions during docs
| reviews. E.g. when there's a disagreement between an author and
| reviewer the author can say, "what docs benchmarks am I not
| following here?" The power of that kind of interaction is that
| you often do realize that the docs benchmarks are incomplete
| and some new dimension needs to be added to them. Or do the
| PostgreSQL contributor docs already have some guidelines along
| the lines of a "content quality checklist" and it's still not
| working?
|
| A rigorous effort to survey the PostgreSQL community and get a
| deep sense of what the overall community considers "high-
| quality docs" can itself be a super insightful experience.
| Different developer communities often need / want a different
| focus in the docs. That can be the foundation for a fairly
| authoritative docs quality checklist.
|
| Another thing that I'm very interested in, and will need to
| think through deeply some other day, is this notion that once
| you publish a doc, you can't touch it. It happens all the time
| and it's really weird.
| actuallyalys wrote:
| As someone who's been both a software developer and a technical
| writer, I agree that a lot of these problems seem best suited
| for a technical writer. The expertise with creating different
| types of documents is valuable, of course, but there's another
| benefit: The organization committing resources in the form of
| making it someone's entire responsibility.
|
| While I think updating reference documents lends itself to
| subject matter experts (especially in a project where this
| approach is already successful), I think structuring them can
| be separated and given to a technical writer with more
| technical expertise or a developer with more documentation
| expertise.
| tetha wrote:
| We're seeing similar things at work as the postgres
| documentation has.
|
| We in infra-ops can give you more details about how our
| database clusters are designed for resilience, security, safety
| than you want on more levels than most people in the company
| know exist. We also have reasoning for all of this available.
| This is really good to have for customer question sets during
| sales.
|
| However, this doesn't tell a developer how to connect his
| spring boot thingy to it, and how to connect and manage his
| service well. In fact, 80 - 90% or more of the things we know
| about our database are not relevant to a simple small-scale
| application running queries on it. And quite a lot of the
| issues you can have with running your application on a rock-
| solid database are entirely not relevant at a DBA level. Like,
| the database doesn't care if your DDL modification is backwards
| compatible.
|
| And that's something we're currently learning together with a
| foundation team at work. They are documenting on how to use it
| well from their side, we're learning about easy mistakes to
| make and document those, and help with the actionable
| documentation. And in hard cases we kinda have to talk what's
| the plan, because it's usually not smart from a DBAs
| perspective.
| btilly wrote:
| Just curious. Where do you think that an open source project
| like PostgreSQL gets a budget to hire anyone? Let alone to
| dictate a new line of authority to the volunteers who are
| already maintaining it?
|
| And don't forget that there are valuable volunteers who are
| likely to go elsewhere if too many new rules are added that
| they don't want to live with.
| kaycebasques wrote:
| Open Web Docs is a potential model to draw inspiration from
| regarding funding: https://openwebdocs.org
|
| Presumably, PostgreSQL has leaders who are responsible for
| steering the ship. If the project is going to succeed long-
| term, those leaders have to find ways to keep their
| contributors happy while also creating an organizational
| structure that leads to good docs. Easier said than done, I
| know, but it really is as simple as that.
|
| Sorry if any of my comments came off naive or obtuse when it
| comes to open source dynamics. But the reality is that you
| need good docs, and I'm just trying to give an honest
| assessment from my experience of the conditions that lead to
| good docs.
| btilly wrote:
| _Sorry if any of my comments came off naive or obtuse when
| it comes to open source dynamics._
|
| If you want that apology to be meaningful, you should learn
| something.
|
| When you're talking about a highly successful open source
| project that has been going for more than 3 decades, it is
| beyond ludicrous for you to say, "If the project is going
| to succeed long-term..." It already has succeeded long-
| term. And you would be better off figuring out why it works
| rather than lecturing about how it must work.
|
| When you talk about "a potential model to draw from" for
| funding, please note that I've been involved with open
| source for about a quarter of a century. I've seen a LOT of
| funding models attempted. Mostly they run into one big
| problem. And that problem is that adding funding creates
| bruised egos because people say, "Why is he getting paid
| when I'm not?"
|
| The one funding model that DOESN'T have this problem is
| when a company decides to pay its employees to work on
| features that it wants in the project. Now there are no
| bruised egos - the money comes from the company and it is
| clear why one person gets paid while another does not.
| There are still challenges with this model - employees are
| under pressure to get their contributions accepted whether
| or not the project likes them - but we've learned how to
| navigate those.
|
| But now we're left back where we started. Companies who
| hire core developers don't generally need comprehensive
| documentation - they build internal documentation straight
| for their use case. So comprehensive external documentation
| is hard to find. Sometimes you'll wind up with things like
| an excellent introductory tutorial like
| https://docs.python.org/3/tutorial/. Usually, you don't.
| And generally it is hard to simply pay someone to take care
| of it for you.
| kaycebasques wrote:
| > When you're talking about a highly successful open
| source project that has been going for more than 3
| decades, it is beyond ludicrous for you to say, "If the
| project is going to succeed long-term..." It already has
| succeeded long-term.
|
| Yes, your reaction here totally makes sense. Feedback
| acknowledged.
|
| > If you want that apology to be meaningful, you should
| learn something.
|
| I have re-read my earlier comments and I feel that you
| are being more hostile to me than is justified. I do not
| think you are adhering to HN's code of conduct guidelines
| for comments:
| https://news.ycombinator.com/newsguidelines.html#comments
|
| > you would be better off figuring out why it works
| rather than lecturing about how it must work
|
| This doesn't seem fair. The original post is about the
| limitations of the PostgreSQL docs. Docs have been the
| focus of my career for 10 years. I have experienced and
| analyzed docs problems in many contexts: small orgs,
| large orgs, open source, closed source. I made an on-
| topic comment about ways to resolve the problems that the
| PostgreSQL docs are facing. Is it the only solution? Of
| course not. But I totally have relevant experience in
| this domain and, just like you have a good idea about
| what generally works and doesn't work regarding open
| source funding, I have a pretty good idea about what
| generally works for creating the conditions that lead to
| good docs.
|
| > So comprehensive external documentation is hard to
| find.
|
| Again, I think the web platform space is relevant here.
| Web platform documentation could easily devolve into a
| tragedy of the commons situation. Yet MDN does exist and
| is an amazing resource.
|
| Paragraphs 4 to 6 of your last comment seem to be arguing
| that hiring TWs is not an option for PostgreSQL. That is
| totally understandable. On another day maybe we would
| have arrived at that understanding on friendly terms and
| would have had a constructive conversation about how to
| create good docs when hiring TWs is not possible. But
| it's clear that my ideas aren't welcome here so I'll just
| stop now.
| wrs wrote:
| If someone gets weirdly hostile and condescending towards
| you on HN (sadly not uncommon), I recommend that you try
| to just ignore them and keep contributing. I'd like to
| hear what you have to say.
| minorninth wrote:
| PostgreSQL, like many other open-source projects, has
| sponsors and accepts donations. Here's their sponsors page:
|
| https://www.postgresql.org/about/sponsors/
|
| Also, I think it's important to note that a lot of
| contributors aren't volunteering their "free time", they're
| being paid by some other employer to contribute to PostgreSQL
| as part of their job:
|
| https://www.enterprisedb.com/blog/importance-of-giving-
| back-...
| btilly wrote:
| If you read
| https://www.postgresql.org/about/policies/sponsorship/
| you'll find that the list of sponsors is essentially a
| recognition for companies paying their employees to
| contribute to PostgreSQL.
|
| It isn't for contributing into a pot of money allowing some
| central PostgreSQL committee to hand out money for other
| things, like hiring people to do documentation.
| globular-toast wrote:
| It sounds like someone should write a book about postgres. I'd
| buy it. But I think it should be a supplement to the current
| docs, not a replacement.
| TX81Z wrote:
| I just ask GPT all my Postgres questions now. I think some of
| this will become a moot point over time.
| mannyv wrote:
| The problem with developers writing documentation is that they
| generally have too narrow of a view of things.
|
| Here's an artificial example:
|
| "Added a setting which allows you to change the size of a boba."
|
| It doesn't answer any really useful questions, such as: why would
| you want to change the size of a boba? What is the effect of
| various sizes of boba? How does that interact with other
| settings?
|
| As a database user, I actually want to know these things.
| Internally I have a mental model of how all these settings
| interact with the product, and I use information about the new
| setting to adjust that model.
|
| For psql in particular, the documentation (as people have pointed
| out) shys away from anything too "opinionated."
|
| But what it should do instead of have multiple opinionated
| examples. Multiple examples allow me to learn about the different
| tradeoffs and configuration options.
|
| It reminds me of the old days, when the psql docs talked about
| optimization as a "black art", and basically said "it would be
| impossible to cover everything, so we won't cover anything."
| stigok wrote:
| I find the PostgreSQL documentation to be a seriously good read.
| Something I'd print and take on the holiday with me.
| smitty1e wrote:
| Tl;dr: a designated documentation editor would be both a boon and
| likely a challenge to retain.
| tmaly wrote:
| This is a very valuable post in my opinion. Documentation is so
| critical to open source as well as to the private sector.
|
| It can make the difference in how long it takes you to complete a
| project at work.
| jrott wrote:
| At this point I believe that documentation is the most
| important marketing artifact that a project has.
| tmaly wrote:
| I always point out the readme pages of projects on github.
| How many times has that made the difference between you using
| the project or not?
| jrott wrote:
| Not zero if I can't figure out how to get started in a
| reasonable amount of time. I'll go look for another
| solution
| davidatbu wrote:
| As a regular consumer of pg docs, I vehemently agree that they
| are incredibly detailed, and at the same time, daunting to
| navigate.
| kaycebasques wrote:
| If their problem is an abundance of information that is hard to
| navigate, they really should start experimenting with
| retrieval-augmented generation search experiences [1] like
| Supabase AI. One of the great promises of LLMs for docs IMO is
| the ability to synthesize info from many sources to provide
| more targeted answers.
|
| [1] https://technicalwriting.tools/posts/playing-nicely-with-
| gen... (my blog)
| davidatbu wrote:
| I haven't read your blog yet, but I'd be lying if the same
| thought hasn't crossed my mind :)
| emodendroket wrote:
| How real is this problem though? I have no idea how they're
| organized because I'm usually led to whatever page I wanted to
| see by Google anyway.
| vincent-manis wrote:
| I love the approach of "change the code, write the
| documentation"; the code author is in a unique position to be
| able to explain the new behavior of the system. However, most
| FOSS projects could benefit from a technical writer, who can
| improve these first drafts to make them more usable.
|
| Back in the Stone Age, companies like IBM had vast writing
| staffs. As a result, you got entire walls full of documentation.
| For OS/360, you got Concepts And Facilities, that explained what
| the software did for you; you got reference manuals; and Program
| Logic Manuals, which explained, in exhaustive/ing detail, how the
| program worked (with flowcharts!).
|
| Arguably, the software I use today is far more complex than
| OS/360; yet we don't see the same attention to organization and
| detail in the documentation. I understand the reasons: IBM's
| dead-tree wall relied on paying tech writers, and that's
| incompatible with the tiny budgets most FOSS projects suffer. Far
| too often, I will go to a FOSS site's documentation page and
| discover a mass of links to pages that explain how to build the
| program on Itanium, or why this program is better than another
| one, with no walkthrough of "how to use this program for a
| typical case", or even "if you want to do this kind of operation,
| these are the pages you need to read". Often, the documentation
| isn't included in the repo, which lowers my confidence that the
| documentation and software are updated in step.
|
| So I celebrate projects such as PostgreSQL, Emacs, and Arch Linux
| (for the wiki)--there are many more--where there is a real effort
| to create good documentation, even when I think that
| documentation can be improved or reorganized. Let's not allow the
| perfect to be the enemy of the good.
| kaycebasques wrote:
| The technical writing community refers to this approach as
| "docs-as-code". Just mentioning that keyword in case anyone
| wants to research the space further. There is a famous Write
| The Docs talk on the topic. I think the same author turned that
| talk into a book.
|
| Fabrizio Benedetti did a cool analysis of various common docs-
| as-code architectures: https://passo.uno/docs-as-code-
| topologies/
| justinclift wrote:
| This is kind of what the old PostgreSQL "Tech Docs" website was
| useful for, back in the day (~20 years ago).
|
| Here's a random snapshot of it from the Wayback machine:
|
| http://web.archive.org/web/20040630081140/http://techdocs.po...
|
| Much more user-level oriented than the reference stuff.
___________________________________________________________________
(page generated 2023-06-15 23:01 UTC)