[HN Gopher] Common Pitfalls in Database Performance Testing (201...
___________________________________________________________________
Common Pitfalls in Database Performance Testing (2018) [pdf]
Author : eatonphil
Score : 40 points
Date : 2023-09-07 02:16 UTC (2 days ago)
(HTM) web link (mytherin.github.io)
(TXT) w3m dump (mytherin.github.io)
| eatonphil wrote:
| Appendix A is a thoughtful checklist for anyone doing benchmarks:
| * Choosing your Benchmarks. # Benchmark covers whole
| evaluation space # Justify picking benchmark subset #
| Benchmark stresses functionality in the evaluation space
| * Reproducible. Available shall be: # Hardware
| configuration # DBMS parameters and version # Source
| code or binary files # Data, schema & queries *
| Optimization. # Compilation flags # System parameters
| * Apples vs Apples. # Similar functionality #
| Equivalent workload * Comparable tuning. #
| Different data # Various workloads *
| Cold/warm/hot runs. # Differentiate between cold and hot
| runs # Cold runs: Flush OS and CPU caches # Hot runs:
| Ignore initial runs * Preprocessing. # Ensure
| preprocessing is the same between systems # Be aware of
| automatic index creation * Ensure correctness. #
| Verify results # Test different data sets # Corner
| cases work * Collecting Results. # Do several
| runs to reduce interference # Check standard deviation for
| multiple runs # Report robust metrics (e.g., median and
| confidence intervals)
| zX41ZdbW wrote:
| It's interesting that the list reads as a recipe for
| ClickBench. Although I didn't read the article, so it is just a
| coincidence.
| eatonphil wrote:
| Maybe ClickBench followed it! This paper was in 2018 and
| ClickBench was published last year.
|
| https://news.ycombinator.com/item?id=32084571
| zX41ZdbW wrote:
| Also I recommend to read the methodology here:
| https://github.com/ClickHouse/ClickBench/blob/main/README.md
|
| Making an open-source, easily reproducible benchmark with clear
| rules allowed having more than 30 different DBMS in a single
| benchmark. Many of the results were submitted or improved by the
| vendors of the corresponding systems.
|
| There is a comparison section that lists limitations and
| drawbacks in comparison with other benchmarks.
| crabbone wrote:
| Here's my little contribution / anecdote. I worked on automated
| testing of a product in the general category of SDS block device.
| We wanted to see some "realistic" benchmarks (instead of totally
| synthetic FIO-based ones), so we decided to do, beside other
| things, some benchmarks of PostgreSQL on top of our block device.
|
| At that time we've already developed a convention for what would
| be the "optimal" way to configure a filesystem on top of our
| block device -- which is a whole other story, but I won't
| reproduce it here. The most tempting aspect of PostgreSQL was the
| existence of pgbench tool that I assumed would offer some sort of
| a standardized workload that would allow me to compare my results
| to results produced by other testers on other hardware...
|
| Little did I know how and in how many ways PostgreSQL performance
| was dependent on configuration. The most important pieces are the
| memory buffers and fsync-related stuff. Eventually, by reading
| more documentation and some educational material about PostgreSQL
| I realized that, essentially, I can make the same exact benchmark
| on the same exact hardware to very hundreds if not thousands of
| times in performance _for the same exact test_.
|
| I also realized that storage's performance had very little to do
| with the benchmark's scores as in order to game the benchmark one
| would simply add more memory.
| BrentOzar wrote:
| > I also realized that storage's performance had very little to
| do with the benchmark's scores as in order to game the
| benchmark one would simply add more memory.
|
| Reminds me of the time [1] Fusion-IO showed how good their
| drives were for SQL Server by running benchmarks with just 16GB
| of RAM enabled, even though the servers had 384GB RAM.
|
| Under direct questioning, they admitted that if you just used
| the rest of the RAM in the server, their drives didn't have any
| effect, and that's why they limited the server so badly for the
| presentation. (sigh)
|
| [1] https://ozar.me/2014/06/fact-checking-24-hours-of-pass/
| zzzeek wrote:
| the most common pitfall is: I ran the query, but didn't fetch all
| the rows. Query is so fast with library A but not library B!!
| (library B buffers the rows in some way, so they are pre-
| fetched).
| twoodfin wrote:
| Looking forward to reading this. My "favorite" database
| benchmarking bugaboo is unrealistic random data distribution in
| the test set, typically by relying on the uniform results you'll
| get out of the default random() function in your favorite
| language library.
|
| Application data is almost never uniformly random. As a result,
| you'll miss the real-world effects of a wide range of system
| behavior: CPU branch prediction, data compression, adaptive query
| optimization, ...
| eatonphil wrote:
| I think you'd at least get a reasonable worst-case behavior by
| working with (cryptographically secure) random data, no?
|
| I agree you wouldn't see many optimizations that likely would
| take effect in the real world.
|
| But worst-case scenarios are helpful to know too.
| convolvatron wrote:
| uniform distribution isn't always the worst case. just for an
| example, if you stuff a btree with pages and pages of the
| same key it starts to get funny. it also doesn't really
| exercise the join ordering machine correctly if every key has
| at most one correlate.
| eatonphil wrote:
| Yeah that's another good worst-case to think about.
| o11c wrote:
| Some useful distributions:
|
| * Generate a random number in [1, NUMBITS], then generate a
| random number with that many bits.
|
| * Generate a random number and count its trailing (or
| leading, if easier in your environment) zeroes, then generate
| a random number with that many bits. Be sure to correctly
| handle the case where the first random number is zero.
|
| * Generate two random numbers and take the minimum.
___________________________________________________________________
(page generated 2023-09-09 23:01 UTC)