hngopher.com

       [HN Gopher] No, QuestDB is not Faster than ClickHouse
       ___________________________________________________________________
        
       No, QuestDB is not Faster than ClickHouse
        
       Author : krnaveen14
       Score  : 89 points
       Date   : 2022-06-16 16:21 UTC (6 hours ago)
        
 (HTM) web link (telegra.ph)
 (TXT) w3m dump (telegra.ph)
        
       | bluestreak wrote:
       | Our article in question can be found here:
       | https://questdb.io/blog/2022/05/26/query-benchmark-questdb-v...
       | 
       | The intent of the article was to showcase JIT-optimised WHERE
       | clause and we did not use any indexes on QuestDB.
        
         | [deleted]
        
         | [deleted]
        
         | PeterZaitsev wrote:
         | If your intent it to showcase the new optimization in the
         | product it is best to compare it to your own old version
        
           | olluk wrote:
           | Comparison with old version is actually in the article for
           | the patient reader. It could go to the top but I don't think
           | it will make a difference. At the end of the day it is the
           | article at the official QuestDB website which gives the
           | reader a spoiler about the bias.
           | 
           | I am intrigued what Timescale is going to publish next.
        
           | qoega wrote:
           | Agree. And for a blog post it can even have a story like: "We
           | compared with ClickHouse and we were 10x slower, than we
           | looked at this case and made it 100x faster. Thank you,
           | benchmark and ClickHouse developers that showed us use case
           | where we could do better."
           | 
           | For me benchmarking is usual - "Why this query takes so long?
           | We need to improve it. Sometimes 1000x times."
        
           | untitaker_ wrote:
           | Right? How do the folks at QuestDB know that their new JIT
           | engine is actually responsible for those performance
           | improvements? My understanding is that, index or not, data is
           | still sorted by time in questdb, which is exactly what the
           | ClickHouse engineers are replicating in the new schema.
        
             | bluestreak wrote:
             | The query Clickhouse picked on does not actually leverage
             | time order. Perhaps clickhouse vendors on this thread can
             | comment on relevance of the date partitioning for this
             | query. My best guess is that it might help the execution
             | logic to create data chunks for parallel scan.
             | 
             | QuestDB does also use partitions for this purpose but we
             | also calculate chunks dynamically based on available CPU to
             | distribute load across cores more evenly
        
         | datalopers wrote:
         | Please don't post competitor benchmarks until you can hire
         | someone who has a slight clue what they're doing. All you're
         | demonstrating is the sheer incompetency at QuestDB.
        
           | bluestreak wrote:
           | I am in fact very proud of my team, who worked very hard on
           | both implementation and the article. It is disappointing to
           | read unfounded insults where we made every effort to be fair.
        
             | hodgesrm wrote:
             | I appreciate your benchmark and was interested to learn
             | about how QuestDB processes TSBS queries efficiently. I
             | work extensively with ClickHouse and it's always
             | enlightening to learn about how other databases achieve
             | high performance. Your descriptions of the internals are
             | clear and easy to follow, especially since you included
             | comparisons with older versions of QuestDB.
             | 
             | That said, I think I can understand how some users might be
             | a little put off by the comparisons. Your article
             | effectively says "ClickHouse is really slow" without giving
             | readers any easy way to judge what was happening under the
             | covers. I was personally a bit frustrated not to have the
             | time to set up TSBS and dig into what was going on. I
             | therefore appreciated Geoff's effort look up the results
             | and show that the default index choices didn't make a lot
             | of sense for this particular case. That does not detract
             | from QuestDB's performance at least from my perspective.
             | 
             | Anyway congratulations on the performance improvement. As a
             | famous character in Star Wars said, "we will watch your
             | career with great interest."
             | 
             | edit: correct typo
        
             | Dzugaru wrote:
             | "while QuestDB utilizes its full indexing strategy to read
             | just a tiny fraction of the actual data"
             | 
             | Can you please elaborate on this?
        
               | bluestreak wrote:
               | Full disclosure: I am CTO of QuestDB and I took part in
               | JIT implementation. The quote above is not mine, it was
               | written by Clickhouse staff. "utilizes its full indexing
               | strategy" statement is false and is news to me.
        
               | Dzugaru wrote:
               | So you do a full scan and it's ~50 CPU cycles per row (48
               | CPUs at 4 GHZ), correct? This is possible I guess? And in
               | this case Clickhouse is wrong.
        
               | pepemon wrote:
               | So, QuestDB is faster or not? I'm puzzled now!
        
               | olluk wrote:
               | Looks like QuestDB is faster if you don't optimize your
               | table storage for 1 query.
               | 
               | But if you are okay that only limited number of columns
               | to be scanned faster than others ClickHouse comes first.
        
             | datalopers wrote:
        
             | PeterZaitsev wrote:
             | I wonder what "every effort to be fair" means ? The first
             | thing you could have done is reach out to ClickHouse
             | Community to ask for optimization suggestions
        
               | bluestreak wrote:
               | "fair" means that we comparing apples to apples. Ad-hoc,
               | unindexed predicate, compiled by QuestDB into AVX2
               | assembly (using AsmJIT) vs same predicate complied by
               | Clickhouse (I'm assuming by LLVM). One can perhaps view
               | this as comparing SIMD-based scans from both databases.
               | Perhaps we generate better assembly, which incidentally
               | offers better IO.
               | 
               | We all understand that creating very specific index might
               | improve specific query performance. Great, Clickhouse
               | geared the entire table storage model to be ultra
               | specific for latitude search. What if you search by
               | longitude, or other column? Back to the beginning.
               | 
               | JIT-compiled predicates offer arbitrary query
               | optimisation with zero impact on ingestion. This is
               | sometimes useful.
               | 
               | What would you offer assuming that we reached out, other
               | than creating an index?
               | 
               | Clickhouse does better than we do in other areas. It JITs
               | more complicated expressions, such as some date
               | functions. It optimises count() queries specifically. For
               | example we collect "found" rowed_ids in an array.
               | Clickhouse does not specifically for count(). We still
               | have work to do. On other hand we ingested this very
               | dataset about 5x quicker than clickhouse, which we left
               | out because article is not about "QuestDB is faster than
               | Clickhouse"
        
               | olluk wrote:
               | What if the purpose of the article is to compare queries
               | without indexes?
        
               | jsnell wrote:
               | Doesn't matter, since that clearly wasn't the purpose of
               | the article. After all, they were totally happy to add an
               | index for another competing DB as long as they happened
               | to win that comparison. Then they crow about how they
               | beat having an index.
               | 
               | Pretty sleazy.
        
               | xenator wrote:
               | So, maybe do not create specific scenarios for corner
               | cases and then generalize outcome? And write articles
               | about common scenarios that is important for people who
               | will use technology on daily basis.
        
               | olluk wrote:
               | My personal view is that having fast queries without
               | indexes is quite general outcome.
        
           | avianlyric wrote:
           | What an extremely unfair comment. Having read QuestDBs blog,
           | it's quite clear they've taken great pains to point out that
           | a single specific benchmark isn't the be all and end all of
           | DB analysis.
           | 
           | They quite clearly start out by saying they're only looking
           | to demonstrate the impact of a specific new DB feature
           | they've created, and are using benchmarks that illustrate the
           | difference. They make zero claims that QuestDB is faster than
           | Clickhouse overall, and quite carefully point out that
           | prospective users need to run their own benchmarks on their
           | own data to figure out what DB will work for them.
        
             | dimgl wrote:
             | > They make zero claims that QuestDB is faster than
             | Clickhouse overall
             | 
             | Are you sure? Just one look at their website says
             | differently.
             | 
             | https://questdb.io/time-series-benchmark-suite/
             | 
             | I don't use these tools. I just wanted to point out that
             | what you're saying is disingenuous.
        
       | thegeomaster wrote:
       | Sounds like they didn't re-do the QuestDB benchmark with same
       | change to the indexes, and so their claim is that Clickhouse is
       | 27x faster with a specific index than QuestDB without that index.
       | Which is not a fair comparison.
       | 
       | Also, the tone of the post sounds really arrogant. They try to
       | hide it a bit, I feel, but it just seeps through.
        
         | axlee wrote:
         | I didn't really read it as arrogant, more as annoyed about a
         | mischaracterization that was disparaging their product.
        
           | SOLAR_FIELDS wrote:
           | It's also part of a longer trend of saber rattling between
           | these vendors - there's a history of these types of posts
           | also from TimescaleDB:
           | https://news.ycombinator.com/item?id=29096541
        
             | qoega wrote:
             | There is a small list of vendors that do not forbid to run
             | benchmarks with their systems.
             | https://cube.dev/blog/dewitt-clause-or-can-you-benchmark-
             | a-d...
             | 
             | That is why there is a small subset of vendors that are
             | being 'attacked' by this comparisons.
        
               | bombcar wrote:
               | More and more we start to see _why_ these forbids are in
               | place.
        
         | Dzugaru wrote:
         | Well, I don't know how QuestDB works, and I couldn't find
         | anything in the original benchmark, but probably they already
         | have some sort of (geo)index in place? It's really strange to
         | search geo-data by scanning the whole surface of the Earth. The
         | point that Clickhouse outperforms this by just sorting on one
         | axis (and even not using any fancy 2D indices) is reasonable.
        
           | olluk wrote:
           | No, there are no indexes in QuestDB in the article. None.
           | Zero. That's bold mistake in the ClickHouse article. Should
           | be named Yes, QuestDb is Faster.
        
             | [deleted]
        
             | [deleted]
        
             | Dzugaru wrote:
             | Yeah, I've read more carefully and it seems they're doing
             | full scan.
        
           | tomhallett wrote:
           | I was curious to hear more details about this statement -
           | "while QuestDB utilizes its full indexing strategy to read
           | just a tiny fraction of the actual data". Did QuestDB create
           | indexes in their QuestDB benchmark but just not mention it?
           | Are there geoindexes which are automatically enabled which do
           | help (but are of less value in the general sense from
           | Clickhouse' perspective)?
        
             | twoodfin wrote:
             | I don't know how QuestDB is implemented in any detail, but
             | this statement struck me as confused. My understanding is
             | that for this query, QuestDB is performing a full scan of
             | the relevant columns, and the point of the blog post was
             | how fast their JIT engine for filtering makes this.
        
         | [deleted]
        
         | olluk wrote:
         | There were 2 queries in the QuestDB benchmark over the same
         | table. ClickHouse didn't even try to match both of them
         | choosing one as a victim. I guess that's what happens when you
         | optimise the data storage for one query.
        
       | gauravphoenix wrote:
       | I have always felt that DB benchmarks are useless, always use
       | your own dataset
       | 
       | https://gauravkumar.blog/performance-benchmarks-are-useless....
        
         | nojito wrote:
         | This is why the commercial offerings do not allow you publish
         | benchmarks.
        
           | PeterZaitsev wrote:
           | Which is horrible thing. Even bad benchmarks often create
           | create discussions
        
             | nojito wrote:
             | Not true at all. Most people take benchmarks as gospel
             | because they value their time.
        
         | capableweb wrote:
         | All benchmarks are always useless, in 90% of the cases. They
         | could maybe give some baseline understanding, but it's
         | important to always do your own benchmarks as your performance
         | can be very different than what the benchmark showed, simply
         | because the data/data structures are slightly different.
         | 
         | Do your own benchmarks people!
        
       | PeterZaitsev wrote:
       | This response illustrates important point - if you're expert in
       | technology A and compare it to technology B, you're not expert
       | in, comparison is very likely to be unfair.
       | 
       | I very much would like to see vendors at least to follow
       | Journalist ethics and reach out to their competition for
       | optimization comments and suggestions before publishing it, so
       | others are given a chance to suggest optimizations
        
         | hodgesrm wrote:
         | Agree. Or just load test on your own software, publish how you
         | did it, and let other vendors respond for themselves.
        
         | klysm wrote:
         | Yeah this happens a lot. I like it when people maintain a repo
         | that accepts changes for the comparison
        
           | snikolaev wrote:
           | Then you should like https://db-benchmarks.com/
        
             | PeterZaitsev wrote:
             | Great idea.
        
       | noxvilleza wrote:
       | Is there an existing named adage for something like "if one
       | creates a benchmark in order to rank general performance of some
       | products, some of those products will ultimately sacrifice
       | general performance in order to optimize for that benchmark"?
        
         | nathanwh wrote:
         | https://en.wikipedia.org/wiki/Goodhart's_law
         | 
         | You stated it almost directly.
        
           | noxvilleza wrote:
           | Oh dear. I did a brief search for 'adage on benchmarking' and
           | only saw Rugg/Feldman benchmarks.
        
             | bombcar wrote:
             | It's also why the only true benchmark _is using the thing
             | as it needs to be used_ - but this is hard to compare
             | because often you need code to work with the tool and vice-
             | versa.
        
               | gfody wrote:
               | there are the TPC benchmarks which try to cover a wide
               | variety of use cases and scenarios and are designed
               | independently from any one engine:
               | https://www.tpc.org/information/benchmarks5.asp
               | 
               | you post the results for your own product, others do the
               | same, customers can compare:
               | https://www.singlestore.com/blog/tpc-benchmarking-
               | results/
        
               | qoega wrote:
               | It is partially true, but this benchmarks force schema.
               | You can't reorganise data for example in wide table or
               | add indices. So it actually does not show you how to use
               | the system to solve this type of problems in a best way
               | possible, but checks unoptimised results as if you never
               | learn and never utilise best practices of the DBMS you
               | choose for production.
        
         | [deleted]
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-06-16 23:00 UTC)