[HN Gopher] Common Pitfalls in Database Performance Testing (201...
       ___________________________________________________________________
        
       Common Pitfalls in Database Performance Testing (2018) [pdf]
        
       Author : eatonphil
       Score  : 40 points
       Date   : 2023-09-07 02:16 UTC (2 days ago)
        
 (HTM) web link (mytherin.github.io)
 (TXT) w3m dump (mytherin.github.io)
        
       | eatonphil wrote:
       | Appendix A is a thoughtful checklist for anyone doing benchmarks:
       | * Choosing your Benchmarks.       # Benchmark covers whole
       | evaluation space       # Justify picking benchmark subset       #
       | Benchmark stresses functionality in the evaluation space
       | * Reproducible. Available shall be:       # Hardware
       | configuration       # DBMS parameters and version       # Source
       | code or binary files       # Data, schema & queries            *
       | Optimization.       # Compilation flags       # System parameters
       | * Apples vs Apples.       # Similar functionality       #
       | Equivalent workload            * Comparable tuning.       #
       | Different data       # Various workloads            *
       | Cold/warm/hot runs.       # Differentiate between cold and hot
       | runs       # Cold runs: Flush OS and CPU caches       # Hot runs:
       | Ignore initial runs            * Preprocessing.       # Ensure
       | preprocessing is the same between systems       # Be aware of
       | automatic index creation            * Ensure correctness.       #
       | Verify results       # Test different data sets       # Corner
       | cases work            * Collecting Results.       # Do several
       | runs to reduce interference       # Check standard deviation for
       | multiple runs       # Report robust metrics (e.g., median and
       | confidence intervals)
        
         | zX41ZdbW wrote:
         | It's interesting that the list reads as a recipe for
         | ClickBench. Although I didn't read the article, so it is just a
         | coincidence.
        
           | eatonphil wrote:
           | Maybe ClickBench followed it! This paper was in 2018 and
           | ClickBench was published last year.
           | 
           | https://news.ycombinator.com/item?id=32084571
        
       | zX41ZdbW wrote:
       | Also I recommend to read the methodology here:
       | https://github.com/ClickHouse/ClickBench/blob/main/README.md
       | 
       | Making an open-source, easily reproducible benchmark with clear
       | rules allowed having more than 30 different DBMS in a single
       | benchmark. Many of the results were submitted or improved by the
       | vendors of the corresponding systems.
       | 
       | There is a comparison section that lists limitations and
       | drawbacks in comparison with other benchmarks.
        
       | crabbone wrote:
       | Here's my little contribution / anecdote. I worked on automated
       | testing of a product in the general category of SDS block device.
       | We wanted to see some "realistic" benchmarks (instead of totally
       | synthetic FIO-based ones), so we decided to do, beside other
       | things, some benchmarks of PostgreSQL on top of our block device.
       | 
       | At that time we've already developed a convention for what would
       | be the "optimal" way to configure a filesystem on top of our
       | block device -- which is a whole other story, but I won't
       | reproduce it here. The most tempting aspect of PostgreSQL was the
       | existence of pgbench tool that I assumed would offer some sort of
       | a standardized workload that would allow me to compare my results
       | to results produced by other testers on other hardware...
       | 
       | Little did I know how and in how many ways PostgreSQL performance
       | was dependent on configuration. The most important pieces are the
       | memory buffers and fsync-related stuff. Eventually, by reading
       | more documentation and some educational material about PostgreSQL
       | I realized that, essentially, I can make the same exact benchmark
       | on the same exact hardware to very hundreds if not thousands of
       | times in performance _for the same exact test_.
       | 
       | I also realized that storage's performance had very little to do
       | with the benchmark's scores as in order to game the benchmark one
       | would simply add more memory.
        
         | BrentOzar wrote:
         | > I also realized that storage's performance had very little to
         | do with the benchmark's scores as in order to game the
         | benchmark one would simply add more memory.
         | 
         | Reminds me of the time [1] Fusion-IO showed how good their
         | drives were for SQL Server by running benchmarks with just 16GB
         | of RAM enabled, even though the servers had 384GB RAM.
         | 
         | Under direct questioning, they admitted that if you just used
         | the rest of the RAM in the server, their drives didn't have any
         | effect, and that's why they limited the server so badly for the
         | presentation. (sigh)
         | 
         | [1] https://ozar.me/2014/06/fact-checking-24-hours-of-pass/
        
       | zzzeek wrote:
       | the most common pitfall is: I ran the query, but didn't fetch all
       | the rows. Query is so fast with library A but not library B!!
       | (library B buffers the rows in some way, so they are pre-
       | fetched).
        
       | twoodfin wrote:
       | Looking forward to reading this. My "favorite" database
       | benchmarking bugaboo is unrealistic random data distribution in
       | the test set, typically by relying on the uniform results you'll
       | get out of the default random() function in your favorite
       | language library.
       | 
       | Application data is almost never uniformly random. As a result,
       | you'll miss the real-world effects of a wide range of system
       | behavior: CPU branch prediction, data compression, adaptive query
       | optimization, ...
        
         | eatonphil wrote:
         | I think you'd at least get a reasonable worst-case behavior by
         | working with (cryptographically secure) random data, no?
         | 
         | I agree you wouldn't see many optimizations that likely would
         | take effect in the real world.
         | 
         | But worst-case scenarios are helpful to know too.
        
           | convolvatron wrote:
           | uniform distribution isn't always the worst case. just for an
           | example, if you stuff a btree with pages and pages of the
           | same key it starts to get funny. it also doesn't really
           | exercise the join ordering machine correctly if every key has
           | at most one correlate.
        
             | eatonphil wrote:
             | Yeah that's another good worst-case to think about.
        
           | o11c wrote:
           | Some useful distributions:
           | 
           | * Generate a random number in [1, NUMBITS], then generate a
           | random number with that many bits.
           | 
           | * Generate a random number and count its trailing (or
           | leading, if easier in your environment) zeroes, then generate
           | a random number with that many bits. Be sure to correctly
           | handle the case where the first random number is zero.
           | 
           | * Generate two random numbers and take the minimum.
        
       ___________________________________________________________________
       (page generated 2023-09-09 23:01 UTC)