[HN Gopher] The case for continuous documentation
       ___________________________________________________________________
        
       The case for continuous documentation
        
       Author : morchen
       Score  : 78 points
       Date   : 2021-06-06 08:06 UTC (14 hours ago)
        
 (HTM) web link (www.virtuallifestyle.nl)
 (TXT) w3m dump (www.virtuallifestyle.nl)
        
       | ycombiswimm wrote:
       | I completely agree that documentation should be part of the CI/CD
       | and that it should be part of the code.
        
         | Larryreverse wrote:
         | Are you the guy who wrote this? Would love to interview you for
         | my podcast.
        
         | systematical wrote:
         | The only way for documentation to be part of CI (IMO) is for
         | missing documentation to cause failed builds. There are a few
         | ways I could think of to enforce this but is there anything off
         | the shelf that does this?
        
           | simonw wrote:
           | I've been doing this for nearly three years now - it works
           | really well.
           | 
           | It's not particularly sophisticated - just some tests which
           | introspect the code and then use dumb pattern matching
           | against the documentation text to check that different
           | concepts from the code are mentioned at least once in the
           | docs: https://simonwillison.net/2018/Jul/28/documentation-
           | unit-tes...
        
       | remoquete wrote:
       | The article seems to skim over the complexities of docs workflows
       | and the role of docs. In fact, docs are nowhere to be found
       | throughout the article. What are those "docs" that the OP is
       | talking about?
       | 
       | Perhaps it'd be fair to rename that article "The Case for
       | Continuous READMEs".
        
       | nwmcsween wrote:
       | Literate programming is probably the best documentation
        
       | Aeolun wrote:
       | Someone is promoting swimm.io in a sort of sideways way? I sort
       | of agree with the author, but the only solution presented is this
       | unknown product.
        
         | ycombiswimm wrote:
         | I couldn't find any other product that solves this basic
         | problem, happy to hear about them if they exist.
        
       | yosefk wrote:
       | If you essentially have one live version of your program, like
       | you do when it runs on your servers, coupled documentation is
       | probably a good idea.
       | 
       | If you have multiple active versions, like you often do with
       | software released to run on someone else's machines, coupled
       | documentation "works," but has a big downside. Namely, it prompts
       | people to "refactor mercilessly"/change everything all the time,
       | together with the documentation.
       | 
       | When you need to maintain multiple versions at a time, having a
       | _single version of the document explaining the differences
       | between all the live versions_ can somewhat curb the enthusiasm
       | for gratuitous changes (since whoever does the changes must also
       | maintain the increasingly long and ugly description in the single
       | document describing all the versions.) And someone needing to
       | work with all those versions has these differences nicely laid
       | out and _those areas not having differences also clearly
       | visible_. Whereas with multiple versions of the document you need
       | to  "diff" these versions if you want to build a mental model of
       | what changed.
       | 
       | Sadly (for those agreeing with this), I presume that the above is
       | a minority opinion.
        
         | cryptica wrote:
         | I think this argument makes perfect sense. Personally I don't
         | like in-line documentation. It's mostly popular among people
         | who depend on bulky proprietary IDEs. I hate all these
         | approaches which try to trick people into using proprietary
         | tech.
         | 
         | I enjoy reading a nice documentation website maintained by the
         | open source organization; it also gives me a touch-point with
         | the organization which created the library. I also agree with
         | your nuanced argument concerning incentives. Arguments related
         | to incentives are almost always discarded by managers but they
         | are very important.
         | 
         | I do think decoupling the documentation encourages people to
         | think about documentation more carefully as a distinct and
         | important activity. I find that in-line documentation tends to
         | be neglected; as a developer, when you're in the middle of
         | coding an important feature which requires your full attention,
         | you don't want to be distracted with updating the comments all
         | the time because it breaks your train of thought. Usually
         | developers tell themselves that they will do it later and they
         | often forget. Comments are often neglected in the PR review
         | process too.
         | 
         | There is no way around it, you need to set aside some time to
         | write or update the documentation as a distinct activity. There
         | is a time for coding and there is a time for explaining.
        
           | iib wrote:
           | Can you not just not write the documentation, even if it
           | resides in the same repository, and then later make a commit
           | to update it, as a separate task?
        
             | cryptica wrote:
             | Yes, but why take up space in the actual source code and
             | repo? It increases space usage both in terms of disk space
             | (which means more download time) for the library and
             | requires more scrolling when reading the code.
             | 
             | Also, if the library is a sub-dependency which the
             | developer doesn't interact with directly, why should they
             | download the documentation for it? They will never read
             | those comments in the code anyway.
        
         | michael1999 wrote:
         | I see the value in internalizing the cost of breaking changes.
         | But rather than just suffer a doc burden to discourage change,
         | why not fix it with something like Stripe's version
         | conversions?
        
       | simonw wrote:
       | I'm adamant that the documentation for a project should live in
       | the same repository as the code itself. This is crucial for a
       | number of reasons:
       | 
       | 1. If the docs are in the same repo, a commit that changes the
       | code can update the relevant documentation (in addition to the
       | tests) as part of the same unit of work
       | 
       | 2. This means it can be enforced during code review: if a
       | developer forgets to update the docs they can be reminded before
       | they land their PR
       | 
       | 3. This also provides a version history for the documentation
       | which is synchronized with the code history. This is really
       | useful when looking at history and trying to figure out what
       | changed when.
       | 
       | 4. This also works great with branches, PRs and releases. New
       | features can have their documentation developed alongside the
       | code in a branch, which makes it easier to understand a proposed
       | change. If your software is deployed in multiple places as
       | multiple versions (or even just staging vs production) you have a
       | way to view the correct documentation for each individual
       | deployment.
       | 
       | 5. Added together, all of this builds trust. A common problem
       | I've seen with internal documentation is that no-one trusts it to
       | be up-to-date. Making it part of the regular code development
       | lifecycle can fix this.
       | 
       | 6. If you do this, you can write automated tests that enforce
       | aspects of your documentation! I call these documentation unit
       | tests, and wrote about them here:
       | https://simonwillison.net/2018/Jul/28/documentation-unit-tes... -
       | even something as simple as a test that fails if a new API
       | endpoint isn't mentioned in a markdown file using simple string
       | matching can ensure no-one forgets about the docs when they add a
       | new feature.
        
         | dllthomas wrote:
         | > a test that fails if a new API endpoint isn't mentioned [...]
         | 
         | That's a great idea. A related thought I had was testing
         | assertions about architecture by looking at the graph of
         | imports or calls.
         | 
         | From your blog post,
         | 
         | > if a change doesn't update the relevant documentation, point
         | that out in your review!
         | 
         | What you get at but don't seem to state explicitly is that
         | finding the relevant documentation is hard, for both the author
         | and reviewer. Finding _any_ relevant documentation should be
         | easy, but ideally we 're finding _all_ relevant documentation,
         | and we need to be reaching sufficient confidence that we 're
         | missing little enough that we're able to hold back entropy
         | enough to keep the docs useful.
         | 
         | Your tools address this for some cases! Doctest addresses this
         | for other cases. From TFA here, it sounds like Swimm.io tries
         | to address this for more cases (my gut says the article
         | oversells it but I intend to look more closely).
         | 
         | To get further, an idea I've been toying with is to treat
         | claims (implicit or explicit) in documentation as requiring
         | citation, pointing not at sources but at tests. Ideally the
         | test runner, when a test fails, can then surface all references
         | to that test. In addition to highlighting portions of the docs
         | that may need to change, this seems likely to also provide
         | crucial context when fixing the code and/or the test.
        
         | tanaypingalkar wrote:
         | Is there any lib/way that automatically generate doc for
         | api/endpoint. I think it is possible to create such generator
         | for graphql api.
        
           | wmiel wrote:
           | You can checkout Swagger and OpenAPI, there are libraries to
           | annotate the endpoints in the code and then generate
           | interactive docs out of that.
        
         | daurnimator wrote:
         | None of this works if the programmer isn't the same person that
         | writes the docs. e.g. if you have a copy-writer come along and
         | write/update the docs before each release, then its not
         | captured in the same commit/branch/etc.
        
           | prepend wrote:
           | It still works ok, just not as nicely.
           | 
           | The copy editor updates the repo and while their changes
           | won't be in the same commit, they should be nearby.
           | 
           | So you still get the benefit of docs history, and being in
           | the same place.
        
             | skeeter2020 wrote:
             | I think you could also make a case that if you've adopted
             | these strict documentation requirements you don't decouple
             | the functional commits from the documentation commits. Your
             | copy editor could work in the same branch as the code
             | changes and then you only approve the PR when the
             | documentation is at the same level of QA as the code.
             | Otherwise a fear separate commits or branches is the thin
             | edge of the wedge.
        
           | simonw wrote:
           | That can still work, in a couple of ways.
           | 
           | You can have the programmer write bad documentation and file
           | an issue for it to be improved. You can then enforce that
           | releases don't go out until those issues have been resolved
           | by the copywriter.
           | 
           | You can also implement new features in a branch with multiple
           | authors. The branch doesn't get merged until the
           | documentation is in good shape.
        
           | monocasa wrote:
           | Sure it does, developers work with domain experts on features
           | all the time. Perhaps you had SDETs adding some test
           | infrastructure, or designers adding assets and layout
           | information. Either you can have feature branches, or
           | stubs/scaffolding for CI. I've seen both work.
           | 
           | The biggest issue I've see is management and product who seem
           | allergic to the actual repos for some reason.
        
         | Noumenon72 wrote:
         | How do you read the in-repo documentation? Search for all files
         | named readme.md? I have never learned about a library from
         | documentation scattered about the repo. There's the readme at
         | the root, and everything else is on a web page, which is a
         | better way to organize and browse documentation.
        
           | klohto wrote:
           | Static site generated from the repo. Can be hosted locally or
           | online. Github pages are usually leveraged for this usecase
        
             | brox wrote:
             | If using Django there are tools like django-docs
             | (https://django-docs.readthedocs.io/en/latest/) and the
             | recently released django-sphinx-view
             | (https://noumenal.es/django-sphinx-view/).
        
             | navotgil wrote:
             | You can use the CI to publish you md files with tools like
             | https://docusaurus.io/
        
           | Zababa wrote:
           | The webpage can be generated from markdown files which can be
           | in the repo.
        
           | iamEAP wrote:
           | Check out Backstage, and in particular, TechDocs:
           | https://backstage.io/docs/features/techdocs/techdocs-
           | overvie...
        
           | aidos wrote:
           | In the PostgreSQL codebase they have readmes scattered about
           | and it's great to jump into some subfolder and get the
           | details you need in the right context.
        
           | zelphirkalt wrote:
           | If you write your documentation in Emacs org-mode, you could
           | use include [1] to include files on other levels of your
           | repository and then you could export it all to a markdown
           | file for people, who do not know about org-mode and Emacs.
           | This would make documentation anywhere in the repository
           | discoverable / automatically included in your resulting
           | documentation file.
           | 
           | [1] https://orgmode.org/manual/Include-Files.html
        
           | simonw wrote:
           | I use documentation systems that publish the documentation
           | from the repo to a website. Most of my projects use Sphinx
           | and reStructuredText for this, but I recently tried MyST
           | (Markdown for Sphinx) and I like that a lot.
           | 
           | Some examples:
           | 
           | - https://docs.datasette.io serves documentation from
           | https://github.com/simonw/datasette/tree/main/docs - which
           | has documentation unit tests here: https://github.com/simonw/
           | datasette/blob/main/tests/test_doc...
           | 
           | - https://sqlite-utils.datasette.io/ serves from
           | https://github.com/simonw/sqlite-utils/tree/main/docs - unit
           | tests here: https://github.com/simonw/sqlite-
           | utils/blob/main/tests/test_...
           | 
           | - https://django-sql-dashboard.datasette.io/ serves from
           | markdown in https://github.com/simonw/django-sql-
           | dashboard/tree/main/doc... - I don't have documentation unit
           | tests for that yet
           | 
           | Those three are all hosted on https://www.readthedocs.org but
           | I've also used this trick on web app projects that host their
           | own documentation deployed as part of the build process.
        
             | zelphirkalt wrote:
             | This is a good approach and reStructuredText well suited
             | for technical documentation. The only downside I see in
             | reST is, that it is mostly only readable from the standard
             | implementation in Python, which is a custom parser, not a
             | portable grammar for other languages to implement in any
             | parser tool / library. There are some libraries for parsing
             | it, but last I checked those were incomplete.
             | 
             | Emacs org-mode would be a great candidate as well, for
             | things like runnable code inside the documentation as
             | examples and export to many other formats and nice tooling
             | for viewing the files. Unfortunately many git hosts do not
             | support it well and render crappily.
             | 
             | I would recommend against switching to a non-standard
             | Markdown dialect. Why switch away from reST, which offers
             | all of the things nessecary for a good technical
             | documentation, to some Markdown dialect, which has many
             | important features only bolted on?
             | 
             | That said, one thing I noticed happening with this approach
             | is, that people think "Oh, I have my documentation in my
             | comments of the code! I don't need to write anything else!"
             | And then I end up reading documentation like: "def get_a():
             | ..." Docstring or comment: "Get a." Wow, thanks, how
             | helpful!
             | 
             | In short: There is no simple way to have good
             | documentation, except for writing good documentation.
             | Docstrings probably will not be sufficient, unless you
             | write whole novels in your docstrings. A good documentation
             | needs usage examples and rationale of why something was
             | done in a specific way, what kind of gotchas there are and
             | probably other stuff, that does not come to mind right now.
        
               | simonw wrote:
               | "Why switch away from reST, which offers all of the
               | things nessecary for a good technical documentation, to
               | some Markdown dialect, which has many important features
               | only bolted on?"
               | 
               | Honestly, the main reason is that I've encountered
               | developers who have an almost alergic reaction to rST -
               | they genuinely hate writing in it, and will be deterred
               | from writing documentation if they have to figure it out.
               | 
               | Custom Markdown flavours are more likely to get buy-in.
        
             | petepete wrote:
             | This is definitely the best approach in my opinion,
             | providing the people writing the docs are capable of
             | contributing directly.
             | 
             | One of my projects[0] builds and deploys a static
             | documentation site[1] on every push to master. The static
             | site generator (Nanoc, in this case) imports the library
             | and uses it to publish its own documentation. All the
             | examples are snippets of code[2] that are both displayed
             | as-is and eval'd into the final output.
             | 
             | The guide can never be out of sync with the library.
             | 
             | [0] https://github.com/dfe-
             | digital/govuk_design_system_formbuild...
             | 
             | [1] https://govuk-form-builder.netlify.app/
             | 
             | [2] https://github.com/DFE-
             | Digital/govuk_design_system_formbuild...
        
       | CraigJPerry wrote:
       | If it's technical docs for developers, you'll get more bang for
       | your buck by making executable documentation first - tests,
       | deployment automation, build automation. Make it so that to do 1
       | logical action then there's only 1 step needed.
       | 
       | How do i build this? Run the build command.
       | 
       | How do i test this? Run the test command.
       | 
       | How do i run only the unit tests? Run the unit test command.
       | 
       | How do i start this locally? Run the start-local command.
       | 
       | How do these components interact? Run the contract-test command.
       | 
       | ...
       | 
       | That fixes "reference" type docs better than any reference doc
       | but there's still a place for technical guides around a code base
       | but short screen recordings voiced over by an experienced dev on
       | the project navigating their IDE will beat any written guide on
       | any metric (time to write, usefulness etc.)
       | 
       | If it's customer facing docs, treat them as code and host them
       | inside the application in some way. There's few things worse than
       | reading the wrong version of a doc.
        
         | simonw wrote:
         | GitHub have a pattern for this called "scripts to rule them
         | all" - https://github.com/github/scripts-to-rule-them-all -
         | I've not fully adopted it yet but I probably should, it looks
         | very well thought-out.
        
       | cryptica wrote:
       | I don't agree with this at all. There are many projects such as
       | Node.js which have excellent, up-to-date documentation on their
       | websites (for all past versions too). This is good for the
       | Node.js project because it forces developers visit the website
       | which gives the open source project an opportunity to connect
       | with their developers, to potentially monetize and stay
       | independent.
       | 
       | On the other hand, in-code documentation is hard to follow
       | because it's scattered all over the source code, relies on
       | special IDEs (more corporate lock-in) and developers often forget
       | to update the documentation anyway (even more easily than they
       | would forget to update the website). Not to mention that it takes
       | up a LOT of space and requires more scrolling; IMO this has a
       | negative impact on the readability of the code. Well written code
       | is simple enough that it doesn't need much in-line documentation.
       | 
       | I don't know why, but these days, when it comes to software
       | development, I find that I disagree with 90% of all the top links
       | that make it to the top of the HN front page. A lot of the
       | practices which are being advocated are inefficient, bureaucratic
       | and they seem to align with corporate interests as opposed to
       | developer interests.
       | 
       | The agenda seems to be about making developers more reliant on
       | proprietary tools, IDEs, subscription SaaS services - All at the
       | expense of free software principles.
       | 
       | There is also an agenda around making developers more reliant on
       | teams and less independent in the software development process. I
       | remember coming across some outrageous claims such as "Good full
       | stack developers don't exist". Also there is a push towards
       | monorepos and other corporate structures which limit the degree
       | of possible decentralization and autonomy of different projects
       | and their dependencies. The shift towards static typing is also
       | part of the trend towards centralization, de-modularization and
       | high inter-dependency with proprietary tools and services.
       | 
       | It's kind of ironic that tight coupling used to be considered one
       | of the main signs of low-quality code but this concept is barely
       | mentioned these days and the agenda is to promote it without
       | saying outright what is going on.
        
         | cryptica wrote:
         | Ever wondered why the most popular package managers which have
         | the most modules are all for dynamically typed languages? e.g.
         | npm, Ruby Gems, pip... It's because dynamically typed languages
         | are more modular since they have less rigid interfaces. With
         | statically typed languages, there is a possibility that the
         | type system of library Y might not correspond very elegantly
         | with the type system of your own project X. Static typing
         | require stronger coupling between the project and its
         | libraries; it's typical that projects written by different
         | teams will follow completely different typing conventions and
         | names (for many different reasons); this adds friction.
         | 
         | A very common one is when a library was written before some new
         | Type/Interface was introduced as part of the core language and
         | the library had invented its own abstraction which does the
         | same thing... So the interface exposed by the library became
         | redundant. Statically typed libraries require a lot more
         | maintenance and this may also explain why companies are
         | increasingly pushing for a monorepo structure which facilitate
         | this constant maintenance which would have been unnecessary
         | with a dynamically typed language.
        
       | papito wrote:
       | In the _least_ - your repo should be the main gateway to a proper
       | WIKI. The problem with decoupled documentation is that it 's the
       | proverbial tree in a forest - no one knows it's there when it
       | "drops".
       | 
       | Docs are like code - the less your write of it, the less you have
       | to maintain. Documentation should be treated as inherently evil.
       | The only worse thing than no documentation is documentation that
       | is not maintained and out of date. There is nothing more
       | infuriating than following the docs only to find out from someone
       | later that it was antiquated. Why is it there?
       | 
       | There are common sense rules. Why would you have the docs on how
       | to set up a fresh checkout NOT live with the checkout? How would
       | I know it lives somewhere else? How would anyone update those
       | steps if those steps were not _code_?
        
       | 0xbadcafebee wrote:
       | There are still unsolved problems with documentation that I'd
       | like to find solutions for.
       | 
       | Everything can be made into code, but at a certain point it's
       | just so complicated to do that you're spending more time and
       | money automating your docs than your applications. So until we
       | have solutions for all that, you will have to maintain some docs
       | manually.
       | 
       | For those manual docs, how do you keep them fresh? I've thought
       | of automatically sending an email to warn that in 30 days the
       | document would be deleted unless someone updated it, but even if
       | people agreed to such a system, they could just update some
       | punctuation and it would remain stale. Even blank pages, people
       | seem to want to keep around rather than fix.
       | 
       | How do you navigate your docs? Search engines actually suck for
       | the most part. Search is a hard problem to solve, and a home
       | rolled search will usually net terrible results. On the other
       | hand, most people don't have the time to maintain a governance
       | structure for their docs, much less an enforcement mechanism, so
       | the docs invariably become terribly organized.
       | 
       | People also seem to need training to learn how to write good
       | docs. I know there are some trendy pages being passed around
       | about some kind of "golden framework for docs" but they don't
       | explain how to write them either. _I_ know how to write docs, but
       | I feel like I 'd need to write a whole book to get it across. One
       | thing I found really useful was Atlassian's newer Confluence page
       | templates, which come well organized and primed with examples of
       | how to write the docs.
       | 
       | As a philosophy, I really, _really_ love GitLab 's _Handbook
       | First_ model. Their handbook is incredibly detailed and covers
       | pretty much their whole organization, and is fairly easy to
       | update. I feel like this one one of the magical missing links in
       | getting more documentation for the important things that aren 't
       | code.
        
       | LadyCailin wrote:
       | I believe this to be one of the success stories of my programming
       | language, MethodScript[0]. Early on I made the strange decision
       | (in the sense that I've never seen it elsewhere) to make the
       | documentation for each api element be part of the code itself.
       | The documentation generator is part of the code as well, so every
       | single build of the software is capable of generating bespoke
       | documentation for that exact version. The website simply hosts
       | the newest version, but you can always generate your own locally.
       | 
       | Of course I also enforce that contributors must add/modify
       | documentation at the same time as the code, but that's easy,
       | because if you modify most of the code, the documentation is also
       | right next to it.
       | 
       | [0] https://methodscript.com
        
       | deeblering4 wrote:
       | Oh cool, so books and websites in general are "bad" now.
        
         | dllthomas wrote:
         | Not, by this metric, if they live in the same repo as the code.
         | When they don't, they have the same problem as any strongly
         | coupled systems maintained across multiple repos, or you are
         | paying the cost of keeping the two uncoupled.
        
       | blowski wrote:
       | An example of how to do this is the documentation for the PHP
       | framework Symfony. Code examples from the documentation are run
       | in the CI server. If a pull request breaks a code example, then
       | that example must also be fixed as part of the pull request.
       | 
       | That's a fantastic feature for a popular open-source framework,
       | as it means the documentation remains up to date.
       | 
       | I'm not sure at what point it's worth the effort for an internal
       | project, though. If you have a cultural problem with incorrect,
       | out-of-date or missing documentation this could make things
       | worse. I'd look for the root cause of that first (training,
       | motivation?), before trying to enforce it with technology.
        
       | MauranKilom wrote:
       | I'm a fan of having code samples in the documentation, and making
       | sure (at e.g. build/test time) that _those samples actually
       | work_. Given the headline, I thought the article would talk about
       | this, but it 's more of a general "why and how you should keep
       | your documentation up to date".
        
         | ycombiswimm wrote:
         | The product actually makes sure code samples stay up-to-date
         | when the code changes.
        
           | Larryreverse wrote:
           | I just saw their demo. Fresh approach.
        
       | pram wrote:
       | Documentation is great and all, but no one ever talks about when
       | theres too much. Maybe because it's rare? At big companies I've
       | seen "architects" churn out page after page of diagrams, design
       | docs, runbooks, checklists, descriptions, etc. There is so much
       | information that it becomes practically useless in aggregate,
       | because no one is reasonably going to read it all. I'm not going
       | to pretend that I have the patience or the attention span to have
       | the gnostic mysteries of our Kubernetes infrastructure revealed
       | to me.
        
         | zelphirkalt wrote:
         | In such cases I wish for a "cookbook" style of additional
         | documentation, that I can search, to find examples for doing,
         | what I want to do.
        
         | 0xbadcafebee wrote:
         | There is no such thing as too much documentation. There is out
         | of date documentation, inaccessible documentation, unindexed
         | documentation, poor documentation, redundant documentation,
         | etc... But what you described is amazingly valuable. Just
         | because it's not valuable _to you, right now,_ doesn 't mean
         | there's too much of it.
        
         | qayxc wrote:
         | Nothing a good search engine and proper requirements can fix.
         | 
         | In theory it should work like this:
         | 
         | I) "User should be able to do X" <- requirement
         | 
         | II) "X can be achieved by performing steps A, B, and C" <-
         | description of the implementation (high-level, user-
         | perspective)
         | 
         | III) "A works by using components 1 and 2" <- technical
         | documentation (design-level, architecture perspective)
         | 
         | etc.
         | 
         | I) generates your index (what can I even do with the software)
         | 
         | II) generates the documentation (how can I do it)
         | 
         | III) and below is for technical use only (extending, modifying,
         | porting)
         | 
         | Stuff like rationales for design decisions can be structured in
         | the same layered way.
         | 
         | I don't know how something like this can be extracted after the
         | fact, but no matter the development model (waterfall/agile), a
         | structure like this should arise naturally anyway and the
         | absolute amount of documentation isn't a problem. Lack of
         | proper structure, however, is.
        
       ___________________________________________________________________
       (page generated 2021-06-06 23:01 UTC)