[HN Gopher] Unit Testing PDF Generation
       ___________________________________________________________________
        
       Unit Testing PDF Generation
        
       Author : ingve
       Score  : 44 points
       Date   : 2023-02-27 17:57 UTC (5 hours ago)
        
 (HTM) web link (nibblestew.blogspot.com)
 (TXT) w3m dump (nibblestew.blogspot.com)
        
       | t344344 wrote:
       | PDF may have generation date etc, much better to use OCR and
       | compare strings.
        
         | crazygringo wrote:
         | No, at the end of the day the proposed approach of rendering to
         | an image and comparing pixels is best. Things can go wrong
         | graphically that OCR won't catch, like an entire background
         | color is missing or an image is missing.
         | 
         | If you're worried about a generation date in the margin, then
         | compare inside of a bounding box that includes most of the page
         | but not that margin. Or just use a fixed date for the test,
         | even better -- since otherwise you've got to be careful about
         | running the test within a few seconds of midnight anyways.
        
         | ks2048 wrote:
         | The example here is drawing a red rectangle, so OCR won't do
         | anything.
        
       | zubspace wrote:
       | We do something similar, but from my experience small changes,
       | like fonts or lines rendering a tad different after library
       | changes can be quite frequent. Usually small changes you can't
       | really see, only if you compare them as 2 layers in paint.net or
       | something.
       | 
       | Adding something like an error margin for all pixels or
       | subsections sometimes makes sense, but this can be tricky.
       | Downscaling the image and comparing grayscale values with a small
       | error margin is another option. It all depends on how accurate
       | your tests have to be.
        
         | izacus wrote:
         | Well, but those changes are triggered by something aren't they?
         | So when you upgrade your font lib or pdf rendering library,
         | you're warned that you're now generating different output and
         | can update the golden set.
         | 
         | Your dependencies aren't changing without a cause are they?
        
           | zubspace wrote:
           | Yeah sure, it just starts to be a problem when you're having
           | dozens of tests failing because of small rendering changes
           | which can be ignored. Someone still has to look at all the
           | test output, compare it to the old state and update the tests
           | with the new state. In our case this happened quite a lot.
           | 
           | This is not an issue at first, but the more you use tests
           | like this and the more people work with your code, false
           | positives start to drag you down.
        
           | [deleted]
        
       | eddsh1994 wrote:
       | It's interesting how different peoples use of testing terminology
       | is across teams/companies/professions. Vocab is standardized by
       | various ISO's, ASQ, and ISTQB so we could all share the same
       | language, then we don't have to debate about what
       | integration/unit/smoke/component/regression/golden/snapshot
       | testing means
        
       | jimjimjim wrote:
       | This can get very difficult. Especially with pages that are more
       | than just text and images. Lines, interactive content, optional
       | layers, annotations, embedded content, blend mode transparencies.
       | All of this and more make things complex.
       | 
       | The real problem is that reading a pdf is vastly more complex
       | than writing a pdf.
       | 
       | The spec (1000+ pages) is open to interpretation and different
       | readers interpret it differently. A page that might render
       | perfectly in adobe may look different when viewed in firefox or
       | chrome or ghostscript.
        
       | [deleted]
        
       | flandish wrote:
       | Isn't testing the physical generation of a pdf more aligned with
       | "integration" test not unit testing? Testing the api that makes
       | the pdf is ok, but testing like this post suggests, with bitwise
       | comparison is integration testing, no?
        
         | DSMan195276 wrote:
         | > Isn't testing the physical generation of a pdf more aligned
         | with "integration" test not unit testing? Testing the api that
         | makes the pdf is ok, but testing like this post suggests, with
         | bitwise comparison is integration testing, no?
         | 
         | The fact that it writes the PDF out to a file potentially makes
         | it an integration test, but the rendering aspect I don't think
         | so. The poster is not testing the integration of the tool with
         | ghostscript, rather ghostscript is simply used as an oracle for
         | verifying the result. The only thing actually tested is the
         | original a4pdf API, but some way of verifying the resulting PDF
         | was needed, which is what ghostscript accomplishes. Effectively
         | it's no different from a fancy assertion.
        
           | flandish wrote:
           | I reckon so. It could align nice with a mock fs, I suppose.
           | 
           | But if differences in fs or architecture are crucial - the
           | real proof is in the integration.
        
           | kccqzy wrote:
           | I have a much more liberal view of what constitutes a unit
           | test: everything that can be run inside a single container is
           | a unit test. Writing files? Unit test. Using databases? As
           | long as that database is started by the test fixture in the
           | same container and destroyed along with the container, still
           | a unit test.
           | 
           | Of course, if your test needs a database the natural follow-
           | up question is whether it can populate the database with data
           | known at build time, or it needs to reach out to get some
           | realistic looking data. Only the latter makes it an
           | integration test.
        
         | izacus wrote:
         | Is naming these tests a seriously useful thing to bikeshed on?
        
           | PaulStatezny wrote:
           | There is a distinct and meaningful difference between unit
           | tests and integration tests. flandish is not bikeshedding.
           | 
           | Unit tests are about testing a single unit in isolation.
           | Integration tests are about testing the integration of
           | multiple units.
           | 
           | With unit tests, the industry's general attitude is that
           | there should be no side effects, such as reading/writing to
           | databases or the disk. Side effects are generally embraced
           | for integration tests, on the other hand.
           | 
           | As a result, unit tests are mostly useful for "pure"
           | functions, ones where the output is 100% derived from the
           | input, regardless of any state external to the function.
           | (Such as database records.) However, a large portion of the
           | industry hasn't realized this and so you get millions of
           | lines of dependency-injected unit tests that really don't
           | provide much value in terms of catching actual bugs. (If
           | these tests were integration tests, they'd catch actual bugs
           | 10x more often.)
           | 
           | A unit test for generating a PDF will not actually involve
           | writing a PDF for disk. An integration test, however, might.
           | 
           | So as I said, this isn't bikeshedding. ;-)
        
           | leni536 wrote:
           | Well, the blog post could have just called it "test", and
           | nobody would bikeshed it.
        
           | mardifoufs wrote:
           | Having well defined terms, and using them well, is essential
           | to any type of engineering. I don't know why aiming for
           | precise terminology is only controversial in software
           | engineering.
        
           | flandish wrote:
           | ...yes. Because different energy, documentation, and
           | sometimes entire groups of people are on different phases.
           | 
           | It's not always a single 100x-elite-monster-drinking coder
           | cranking out monoliths in a silo.
           | 
           | I have a hard enough time with project management getting it
           | wrong:
           | 
           | - testing an api's public methods is far "faster" than
           | testing how files are made on diff procs or fstabs..
           | 
           | - that translates to silly gantt charts...
           | 
           | You get the idea.
        
       | gbro3n wrote:
       | This is more of an integration test than a unit test. And if
       | you're going to test for a pixel perfect image match, why not
       | check for full equality with a pre-existing PDF file, byte for
       | byte? And then what are you testing? That something has changed?
       | You'd likely know that the output was going to change, so to fix
       | the broken test you need to use the failure result to create the
       | new comparison file, and if your always going to use the failure
       | output as an input for correcting the test, what is the point.
       | "Don't test that the code is like the code" is a similar
       | principle.
        
         | gbro3n wrote:
         | *you're
        
         | systems_glitch wrote:
         | At a previous job, we created a PDF visual diff tool for this.
         | In automated tests, we could look for either red (present in
         | sample but not test output) or green (not present in sample,
         | but present in test output) to fail a test, or issue an
         | automated change approval request.
        
         | tantalor wrote:
         | It's more like "golden" or "snapshot" testing.
         | 
         | These are _very_ common for web apps, because at the end of the
         | day you don 't care about the actual html & CSS, only how they
         | are rendered.
         | 
         | > This is more of an integration test than a unit test.
         | 
         | That's debatable. An integration test generally tests 2 or more
         | systems. This kind of test has 2 systems, the generator and the
         | renderer, and we care about the output of the renderer, so it
         | kind of looks like an integration test. However in an
         | integration test you also have control over the implementation
         | of both systems; a regression can be in any of the systems. But
         | that's not true in snapshot tests: the renderer is a given. If
         | the test fails, it's very unlikely to be due to a regression in
         | the renderer. So in that sense, you are really only testing a
         | single component (the generator) hence it is more like a unit
         | test.
        
           | zoover2020 wrote:
           | Bravo! Excellent summary
        
         | DavidSJ wrote:
         | This sort of tests can be useful when you change things under
         | the hood in such a way that the output shouldn't have changed.
        
         | ks2048 wrote:
         | I've never seen a clear distinction between unit tests and
         | integration tests. If you have a black box, "F", with
         | input/output pairs you want it to replicate, you encode these
         | and call them "tests of 'F'". Why have different names for
         | whether "F" is simple or complex?
        
           | simonw wrote:
           | I find the distinction between the two extremely frustrating.
           | 
           | Some people act like there's an obvious definition, and maybe
           | there is if you're doing pure TDD Java as described in one
           | specific text book... but in my experience most developers
           | can't provide a good explanation of what a "unit" is.
           | 
           | And those that do... often write pretty awful tests! They
           | mock almost everything and build tests that do very little to
           | actually demonstrate that the system works as intended.
           | 
           | So I just call things "tests", and try to spend a lot more
           | time on tests that exercise end-to-end functionality (which
           | some people call "integration tests") than teats that operate
           | against one single little function.
        
           | CiaranMcNulty wrote:
           | They both revolve around a coherent concept of what a 'Unit'
           | is - if you have a (shared, project-level) understanding of a
           | Unit then a 'Unit test' is what tests it, and an 'Integration
           | test' involves >1 Unit
        
         | patrickg wrote:
         | This is what I do
         | 
         | I have ca. 190 test cases on which I run my software and
         | compare the md5 sums of the resulting PDF. If they are not the
         | same, I create a PNG for every page and compare visually with
         | imagemagick.
         | 
         | The trick is to remove all random stuff from the PDF (like ID
         | generation or such).
         | 
         | This takes about 3 seconds on the M1 Pro laptop. I think this
         | is very much okay.
         | 
         | Links: https://github.com/speedata/publisher/tree/develop/qa
         | (the tests)
         | https://github.com/speedata/publisher/blob/develop/src/go/sp...
         | (the Go source code for the comparison)
        
         | mattgreenrocks wrote:
         | These are typically called smoke tests, and can be valuable for
         | regression testing of third party libraries you depend on.
         | 
         | An alternate approach: generate the PDF, then run it through a
         | PDF reader library to scrape the text out and ensure it is
         | there.
        
           | izacus wrote:
           | Your approach will completely miss big changes like missing
           | pictures, broken layout, missing background and other
           | breakages in rendering. Also missing text which isn't
           | embedded as a text layer.
        
             | mattgreenrocks wrote:
             | Of course. It was meant for argument, not as an omnibus to
             | comprehensively testing PDFs. :)
        
         | geraldwhen wrote:
         | It is extremely hard to make two pdfs have the same output
         | binary, especially on CI vs local.
        
       ___________________________________________________________________
       (page generated 2023-02-27 23:01 UTC)