[HN Gopher] The case for continuous documentation
___________________________________________________________________
The case for continuous documentation
Author : morchen
Score : 78 points
Date : 2021-06-06 08:06 UTC (14 hours ago)
(HTM) web link (www.virtuallifestyle.nl)
(TXT) w3m dump (www.virtuallifestyle.nl)
| ycombiswimm wrote:
| I completely agree that documentation should be part of the CI/CD
| and that it should be part of the code.
| Larryreverse wrote:
| Are you the guy who wrote this? Would love to interview you for
| my podcast.
| systematical wrote:
| The only way for documentation to be part of CI (IMO) is for
| missing documentation to cause failed builds. There are a few
| ways I could think of to enforce this but is there anything off
| the shelf that does this?
| simonw wrote:
| I've been doing this for nearly three years now - it works
| really well.
|
| It's not particularly sophisticated - just some tests which
| introspect the code and then use dumb pattern matching
| against the documentation text to check that different
| concepts from the code are mentioned at least once in the
| docs: https://simonwillison.net/2018/Jul/28/documentation-
| unit-tes...
| remoquete wrote:
| The article seems to skim over the complexities of docs workflows
| and the role of docs. In fact, docs are nowhere to be found
| throughout the article. What are those "docs" that the OP is
| talking about?
|
| Perhaps it'd be fair to rename that article "The Case for
| Continuous READMEs".
| nwmcsween wrote:
| Literate programming is probably the best documentation
| Aeolun wrote:
| Someone is promoting swimm.io in a sort of sideways way? I sort
| of agree with the author, but the only solution presented is this
| unknown product.
| ycombiswimm wrote:
| I couldn't find any other product that solves this basic
| problem, happy to hear about them if they exist.
| yosefk wrote:
| If you essentially have one live version of your program, like
| you do when it runs on your servers, coupled documentation is
| probably a good idea.
|
| If you have multiple active versions, like you often do with
| software released to run on someone else's machines, coupled
| documentation "works," but has a big downside. Namely, it prompts
| people to "refactor mercilessly"/change everything all the time,
| together with the documentation.
|
| When you need to maintain multiple versions at a time, having a
| _single version of the document explaining the differences
| between all the live versions_ can somewhat curb the enthusiasm
| for gratuitous changes (since whoever does the changes must also
| maintain the increasingly long and ugly description in the single
| document describing all the versions.) And someone needing to
| work with all those versions has these differences nicely laid
| out and _those areas not having differences also clearly
| visible_. Whereas with multiple versions of the document you need
| to "diff" these versions if you want to build a mental model of
| what changed.
|
| Sadly (for those agreeing with this), I presume that the above is
| a minority opinion.
| cryptica wrote:
| I think this argument makes perfect sense. Personally I don't
| like in-line documentation. It's mostly popular among people
| who depend on bulky proprietary IDEs. I hate all these
| approaches which try to trick people into using proprietary
| tech.
|
| I enjoy reading a nice documentation website maintained by the
| open source organization; it also gives me a touch-point with
| the organization which created the library. I also agree with
| your nuanced argument concerning incentives. Arguments related
| to incentives are almost always discarded by managers but they
| are very important.
|
| I do think decoupling the documentation encourages people to
| think about documentation more carefully as a distinct and
| important activity. I find that in-line documentation tends to
| be neglected; as a developer, when you're in the middle of
| coding an important feature which requires your full attention,
| you don't want to be distracted with updating the comments all
| the time because it breaks your train of thought. Usually
| developers tell themselves that they will do it later and they
| often forget. Comments are often neglected in the PR review
| process too.
|
| There is no way around it, you need to set aside some time to
| write or update the documentation as a distinct activity. There
| is a time for coding and there is a time for explaining.
| iib wrote:
| Can you not just not write the documentation, even if it
| resides in the same repository, and then later make a commit
| to update it, as a separate task?
| cryptica wrote:
| Yes, but why take up space in the actual source code and
| repo? It increases space usage both in terms of disk space
| (which means more download time) for the library and
| requires more scrolling when reading the code.
|
| Also, if the library is a sub-dependency which the
| developer doesn't interact with directly, why should they
| download the documentation for it? They will never read
| those comments in the code anyway.
| michael1999 wrote:
| I see the value in internalizing the cost of breaking changes.
| But rather than just suffer a doc burden to discourage change,
| why not fix it with something like Stripe's version
| conversions?
| simonw wrote:
| I'm adamant that the documentation for a project should live in
| the same repository as the code itself. This is crucial for a
| number of reasons:
|
| 1. If the docs are in the same repo, a commit that changes the
| code can update the relevant documentation (in addition to the
| tests) as part of the same unit of work
|
| 2. This means it can be enforced during code review: if a
| developer forgets to update the docs they can be reminded before
| they land their PR
|
| 3. This also provides a version history for the documentation
| which is synchronized with the code history. This is really
| useful when looking at history and trying to figure out what
| changed when.
|
| 4. This also works great with branches, PRs and releases. New
| features can have their documentation developed alongside the
| code in a branch, which makes it easier to understand a proposed
| change. If your software is deployed in multiple places as
| multiple versions (or even just staging vs production) you have a
| way to view the correct documentation for each individual
| deployment.
|
| 5. Added together, all of this builds trust. A common problem
| I've seen with internal documentation is that no-one trusts it to
| be up-to-date. Making it part of the regular code development
| lifecycle can fix this.
|
| 6. If you do this, you can write automated tests that enforce
| aspects of your documentation! I call these documentation unit
| tests, and wrote about them here:
| https://simonwillison.net/2018/Jul/28/documentation-unit-tes... -
| even something as simple as a test that fails if a new API
| endpoint isn't mentioned in a markdown file using simple string
| matching can ensure no-one forgets about the docs when they add a
| new feature.
| dllthomas wrote:
| > a test that fails if a new API endpoint isn't mentioned [...]
|
| That's a great idea. A related thought I had was testing
| assertions about architecture by looking at the graph of
| imports or calls.
|
| From your blog post,
|
| > if a change doesn't update the relevant documentation, point
| that out in your review!
|
| What you get at but don't seem to state explicitly is that
| finding the relevant documentation is hard, for both the author
| and reviewer. Finding _any_ relevant documentation should be
| easy, but ideally we 're finding _all_ relevant documentation,
| and we need to be reaching sufficient confidence that we 're
| missing little enough that we're able to hold back entropy
| enough to keep the docs useful.
|
| Your tools address this for some cases! Doctest addresses this
| for other cases. From TFA here, it sounds like Swimm.io tries
| to address this for more cases (my gut says the article
| oversells it but I intend to look more closely).
|
| To get further, an idea I've been toying with is to treat
| claims (implicit or explicit) in documentation as requiring
| citation, pointing not at sources but at tests. Ideally the
| test runner, when a test fails, can then surface all references
| to that test. In addition to highlighting portions of the docs
| that may need to change, this seems likely to also provide
| crucial context when fixing the code and/or the test.
| tanaypingalkar wrote:
| Is there any lib/way that automatically generate doc for
| api/endpoint. I think it is possible to create such generator
| for graphql api.
| wmiel wrote:
| You can checkout Swagger and OpenAPI, there are libraries to
| annotate the endpoints in the code and then generate
| interactive docs out of that.
| daurnimator wrote:
| None of this works if the programmer isn't the same person that
| writes the docs. e.g. if you have a copy-writer come along and
| write/update the docs before each release, then its not
| captured in the same commit/branch/etc.
| prepend wrote:
| It still works ok, just not as nicely.
|
| The copy editor updates the repo and while their changes
| won't be in the same commit, they should be nearby.
|
| So you still get the benefit of docs history, and being in
| the same place.
| skeeter2020 wrote:
| I think you could also make a case that if you've adopted
| these strict documentation requirements you don't decouple
| the functional commits from the documentation commits. Your
| copy editor could work in the same branch as the code
| changes and then you only approve the PR when the
| documentation is at the same level of QA as the code.
| Otherwise a fear separate commits or branches is the thin
| edge of the wedge.
| simonw wrote:
| That can still work, in a couple of ways.
|
| You can have the programmer write bad documentation and file
| an issue for it to be improved. You can then enforce that
| releases don't go out until those issues have been resolved
| by the copywriter.
|
| You can also implement new features in a branch with multiple
| authors. The branch doesn't get merged until the
| documentation is in good shape.
| monocasa wrote:
| Sure it does, developers work with domain experts on features
| all the time. Perhaps you had SDETs adding some test
| infrastructure, or designers adding assets and layout
| information. Either you can have feature branches, or
| stubs/scaffolding for CI. I've seen both work.
|
| The biggest issue I've see is management and product who seem
| allergic to the actual repos for some reason.
| Noumenon72 wrote:
| How do you read the in-repo documentation? Search for all files
| named readme.md? I have never learned about a library from
| documentation scattered about the repo. There's the readme at
| the root, and everything else is on a web page, which is a
| better way to organize and browse documentation.
| klohto wrote:
| Static site generated from the repo. Can be hosted locally or
| online. Github pages are usually leveraged for this usecase
| brox wrote:
| If using Django there are tools like django-docs
| (https://django-docs.readthedocs.io/en/latest/) and the
| recently released django-sphinx-view
| (https://noumenal.es/django-sphinx-view/).
| navotgil wrote:
| You can use the CI to publish you md files with tools like
| https://docusaurus.io/
| Zababa wrote:
| The webpage can be generated from markdown files which can be
| in the repo.
| iamEAP wrote:
| Check out Backstage, and in particular, TechDocs:
| https://backstage.io/docs/features/techdocs/techdocs-
| overvie...
| aidos wrote:
| In the PostgreSQL codebase they have readmes scattered about
| and it's great to jump into some subfolder and get the
| details you need in the right context.
| zelphirkalt wrote:
| If you write your documentation in Emacs org-mode, you could
| use include [1] to include files on other levels of your
| repository and then you could export it all to a markdown
| file for people, who do not know about org-mode and Emacs.
| This would make documentation anywhere in the repository
| discoverable / automatically included in your resulting
| documentation file.
|
| [1] https://orgmode.org/manual/Include-Files.html
| simonw wrote:
| I use documentation systems that publish the documentation
| from the repo to a website. Most of my projects use Sphinx
| and reStructuredText for this, but I recently tried MyST
| (Markdown for Sphinx) and I like that a lot.
|
| Some examples:
|
| - https://docs.datasette.io serves documentation from
| https://github.com/simonw/datasette/tree/main/docs - which
| has documentation unit tests here: https://github.com/simonw/
| datasette/blob/main/tests/test_doc...
|
| - https://sqlite-utils.datasette.io/ serves from
| https://github.com/simonw/sqlite-utils/tree/main/docs - unit
| tests here: https://github.com/simonw/sqlite-
| utils/blob/main/tests/test_...
|
| - https://django-sql-dashboard.datasette.io/ serves from
| markdown in https://github.com/simonw/django-sql-
| dashboard/tree/main/doc... - I don't have documentation unit
| tests for that yet
|
| Those three are all hosted on https://www.readthedocs.org but
| I've also used this trick on web app projects that host their
| own documentation deployed as part of the build process.
| zelphirkalt wrote:
| This is a good approach and reStructuredText well suited
| for technical documentation. The only downside I see in
| reST is, that it is mostly only readable from the standard
| implementation in Python, which is a custom parser, not a
| portable grammar for other languages to implement in any
| parser tool / library. There are some libraries for parsing
| it, but last I checked those were incomplete.
|
| Emacs org-mode would be a great candidate as well, for
| things like runnable code inside the documentation as
| examples and export to many other formats and nice tooling
| for viewing the files. Unfortunately many git hosts do not
| support it well and render crappily.
|
| I would recommend against switching to a non-standard
| Markdown dialect. Why switch away from reST, which offers
| all of the things nessecary for a good technical
| documentation, to some Markdown dialect, which has many
| important features only bolted on?
|
| That said, one thing I noticed happening with this approach
| is, that people think "Oh, I have my documentation in my
| comments of the code! I don't need to write anything else!"
| And then I end up reading documentation like: "def get_a():
| ..." Docstring or comment: "Get a." Wow, thanks, how
| helpful!
|
| In short: There is no simple way to have good
| documentation, except for writing good documentation.
| Docstrings probably will not be sufficient, unless you
| write whole novels in your docstrings. A good documentation
| needs usage examples and rationale of why something was
| done in a specific way, what kind of gotchas there are and
| probably other stuff, that does not come to mind right now.
| simonw wrote:
| "Why switch away from reST, which offers all of the
| things nessecary for a good technical documentation, to
| some Markdown dialect, which has many important features
| only bolted on?"
|
| Honestly, the main reason is that I've encountered
| developers who have an almost alergic reaction to rST -
| they genuinely hate writing in it, and will be deterred
| from writing documentation if they have to figure it out.
|
| Custom Markdown flavours are more likely to get buy-in.
| petepete wrote:
| This is definitely the best approach in my opinion,
| providing the people writing the docs are capable of
| contributing directly.
|
| One of my projects[0] builds and deploys a static
| documentation site[1] on every push to master. The static
| site generator (Nanoc, in this case) imports the library
| and uses it to publish its own documentation. All the
| examples are snippets of code[2] that are both displayed
| as-is and eval'd into the final output.
|
| The guide can never be out of sync with the library.
|
| [0] https://github.com/dfe-
| digital/govuk_design_system_formbuild...
|
| [1] https://govuk-form-builder.netlify.app/
|
| [2] https://github.com/DFE-
| Digital/govuk_design_system_formbuild...
| CraigJPerry wrote:
| If it's technical docs for developers, you'll get more bang for
| your buck by making executable documentation first - tests,
| deployment automation, build automation. Make it so that to do 1
| logical action then there's only 1 step needed.
|
| How do i build this? Run the build command.
|
| How do i test this? Run the test command.
|
| How do i run only the unit tests? Run the unit test command.
|
| How do i start this locally? Run the start-local command.
|
| How do these components interact? Run the contract-test command.
|
| ...
|
| That fixes "reference" type docs better than any reference doc
| but there's still a place for technical guides around a code base
| but short screen recordings voiced over by an experienced dev on
| the project navigating their IDE will beat any written guide on
| any metric (time to write, usefulness etc.)
|
| If it's customer facing docs, treat them as code and host them
| inside the application in some way. There's few things worse than
| reading the wrong version of a doc.
| simonw wrote:
| GitHub have a pattern for this called "scripts to rule them
| all" - https://github.com/github/scripts-to-rule-them-all -
| I've not fully adopted it yet but I probably should, it looks
| very well thought-out.
| cryptica wrote:
| I don't agree with this at all. There are many projects such as
| Node.js which have excellent, up-to-date documentation on their
| websites (for all past versions too). This is good for the
| Node.js project because it forces developers visit the website
| which gives the open source project an opportunity to connect
| with their developers, to potentially monetize and stay
| independent.
|
| On the other hand, in-code documentation is hard to follow
| because it's scattered all over the source code, relies on
| special IDEs (more corporate lock-in) and developers often forget
| to update the documentation anyway (even more easily than they
| would forget to update the website). Not to mention that it takes
| up a LOT of space and requires more scrolling; IMO this has a
| negative impact on the readability of the code. Well written code
| is simple enough that it doesn't need much in-line documentation.
|
| I don't know why, but these days, when it comes to software
| development, I find that I disagree with 90% of all the top links
| that make it to the top of the HN front page. A lot of the
| practices which are being advocated are inefficient, bureaucratic
| and they seem to align with corporate interests as opposed to
| developer interests.
|
| The agenda seems to be about making developers more reliant on
| proprietary tools, IDEs, subscription SaaS services - All at the
| expense of free software principles.
|
| There is also an agenda around making developers more reliant on
| teams and less independent in the software development process. I
| remember coming across some outrageous claims such as "Good full
| stack developers don't exist". Also there is a push towards
| monorepos and other corporate structures which limit the degree
| of possible decentralization and autonomy of different projects
| and their dependencies. The shift towards static typing is also
| part of the trend towards centralization, de-modularization and
| high inter-dependency with proprietary tools and services.
|
| It's kind of ironic that tight coupling used to be considered one
| of the main signs of low-quality code but this concept is barely
| mentioned these days and the agenda is to promote it without
| saying outright what is going on.
| cryptica wrote:
| Ever wondered why the most popular package managers which have
| the most modules are all for dynamically typed languages? e.g.
| npm, Ruby Gems, pip... It's because dynamically typed languages
| are more modular since they have less rigid interfaces. With
| statically typed languages, there is a possibility that the
| type system of library Y might not correspond very elegantly
| with the type system of your own project X. Static typing
| require stronger coupling between the project and its
| libraries; it's typical that projects written by different
| teams will follow completely different typing conventions and
| names (for many different reasons); this adds friction.
|
| A very common one is when a library was written before some new
| Type/Interface was introduced as part of the core language and
| the library had invented its own abstraction which does the
| same thing... So the interface exposed by the library became
| redundant. Statically typed libraries require a lot more
| maintenance and this may also explain why companies are
| increasingly pushing for a monorepo structure which facilitate
| this constant maintenance which would have been unnecessary
| with a dynamically typed language.
| papito wrote:
| In the _least_ - your repo should be the main gateway to a proper
| WIKI. The problem with decoupled documentation is that it 's the
| proverbial tree in a forest - no one knows it's there when it
| "drops".
|
| Docs are like code - the less your write of it, the less you have
| to maintain. Documentation should be treated as inherently evil.
| The only worse thing than no documentation is documentation that
| is not maintained and out of date. There is nothing more
| infuriating than following the docs only to find out from someone
| later that it was antiquated. Why is it there?
|
| There are common sense rules. Why would you have the docs on how
| to set up a fresh checkout NOT live with the checkout? How would
| I know it lives somewhere else? How would anyone update those
| steps if those steps were not _code_?
| 0xbadcafebee wrote:
| There are still unsolved problems with documentation that I'd
| like to find solutions for.
|
| Everything can be made into code, but at a certain point it's
| just so complicated to do that you're spending more time and
| money automating your docs than your applications. So until we
| have solutions for all that, you will have to maintain some docs
| manually.
|
| For those manual docs, how do you keep them fresh? I've thought
| of automatically sending an email to warn that in 30 days the
| document would be deleted unless someone updated it, but even if
| people agreed to such a system, they could just update some
| punctuation and it would remain stale. Even blank pages, people
| seem to want to keep around rather than fix.
|
| How do you navigate your docs? Search engines actually suck for
| the most part. Search is a hard problem to solve, and a home
| rolled search will usually net terrible results. On the other
| hand, most people don't have the time to maintain a governance
| structure for their docs, much less an enforcement mechanism, so
| the docs invariably become terribly organized.
|
| People also seem to need training to learn how to write good
| docs. I know there are some trendy pages being passed around
| about some kind of "golden framework for docs" but they don't
| explain how to write them either. _I_ know how to write docs, but
| I feel like I 'd need to write a whole book to get it across. One
| thing I found really useful was Atlassian's newer Confluence page
| templates, which come well organized and primed with examples of
| how to write the docs.
|
| As a philosophy, I really, _really_ love GitLab 's _Handbook
| First_ model. Their handbook is incredibly detailed and covers
| pretty much their whole organization, and is fairly easy to
| update. I feel like this one one of the magical missing links in
| getting more documentation for the important things that aren 't
| code.
| LadyCailin wrote:
| I believe this to be one of the success stories of my programming
| language, MethodScript[0]. Early on I made the strange decision
| (in the sense that I've never seen it elsewhere) to make the
| documentation for each api element be part of the code itself.
| The documentation generator is part of the code as well, so every
| single build of the software is capable of generating bespoke
| documentation for that exact version. The website simply hosts
| the newest version, but you can always generate your own locally.
|
| Of course I also enforce that contributors must add/modify
| documentation at the same time as the code, but that's easy,
| because if you modify most of the code, the documentation is also
| right next to it.
|
| [0] https://methodscript.com
| deeblering4 wrote:
| Oh cool, so books and websites in general are "bad" now.
| dllthomas wrote:
| Not, by this metric, if they live in the same repo as the code.
| When they don't, they have the same problem as any strongly
| coupled systems maintained across multiple repos, or you are
| paying the cost of keeping the two uncoupled.
| blowski wrote:
| An example of how to do this is the documentation for the PHP
| framework Symfony. Code examples from the documentation are run
| in the CI server. If a pull request breaks a code example, then
| that example must also be fixed as part of the pull request.
|
| That's a fantastic feature for a popular open-source framework,
| as it means the documentation remains up to date.
|
| I'm not sure at what point it's worth the effort for an internal
| project, though. If you have a cultural problem with incorrect,
| out-of-date or missing documentation this could make things
| worse. I'd look for the root cause of that first (training,
| motivation?), before trying to enforce it with technology.
| MauranKilom wrote:
| I'm a fan of having code samples in the documentation, and making
| sure (at e.g. build/test time) that _those samples actually
| work_. Given the headline, I thought the article would talk about
| this, but it 's more of a general "why and how you should keep
| your documentation up to date".
| ycombiswimm wrote:
| The product actually makes sure code samples stay up-to-date
| when the code changes.
| Larryreverse wrote:
| I just saw their demo. Fresh approach.
| pram wrote:
| Documentation is great and all, but no one ever talks about when
| theres too much. Maybe because it's rare? At big companies I've
| seen "architects" churn out page after page of diagrams, design
| docs, runbooks, checklists, descriptions, etc. There is so much
| information that it becomes practically useless in aggregate,
| because no one is reasonably going to read it all. I'm not going
| to pretend that I have the patience or the attention span to have
| the gnostic mysteries of our Kubernetes infrastructure revealed
| to me.
| zelphirkalt wrote:
| In such cases I wish for a "cookbook" style of additional
| documentation, that I can search, to find examples for doing,
| what I want to do.
| 0xbadcafebee wrote:
| There is no such thing as too much documentation. There is out
| of date documentation, inaccessible documentation, unindexed
| documentation, poor documentation, redundant documentation,
| etc... But what you described is amazingly valuable. Just
| because it's not valuable _to you, right now,_ doesn 't mean
| there's too much of it.
| qayxc wrote:
| Nothing a good search engine and proper requirements can fix.
|
| In theory it should work like this:
|
| I) "User should be able to do X" <- requirement
|
| II) "X can be achieved by performing steps A, B, and C" <-
| description of the implementation (high-level, user-
| perspective)
|
| III) "A works by using components 1 and 2" <- technical
| documentation (design-level, architecture perspective)
|
| etc.
|
| I) generates your index (what can I even do with the software)
|
| II) generates the documentation (how can I do it)
|
| III) and below is for technical use only (extending, modifying,
| porting)
|
| Stuff like rationales for design decisions can be structured in
| the same layered way.
|
| I don't know how something like this can be extracted after the
| fact, but no matter the development model (waterfall/agile), a
| structure like this should arise naturally anyway and the
| absolute amount of documentation isn't a problem. Lack of
| proper structure, however, is.
___________________________________________________________________
(page generated 2021-06-06 23:01 UTC)