[HN Gopher] No, QuestDB is not Faster than ClickHouse
___________________________________________________________________
No, QuestDB is not Faster than ClickHouse
Author : krnaveen14
Score : 89 points
Date : 2022-06-16 16:21 UTC (6 hours ago)
(HTM) web link (telegra.ph)
(TXT) w3m dump (telegra.ph)
| bluestreak wrote:
| Our article in question can be found here:
| https://questdb.io/blog/2022/05/26/query-benchmark-questdb-v...
|
| The intent of the article was to showcase JIT-optimised WHERE
| clause and we did not use any indexes on QuestDB.
| [deleted]
| [deleted]
| PeterZaitsev wrote:
| If your intent it to showcase the new optimization in the
| product it is best to compare it to your own old version
| olluk wrote:
| Comparison with old version is actually in the article for
| the patient reader. It could go to the top but I don't think
| it will make a difference. At the end of the day it is the
| article at the official QuestDB website which gives the
| reader a spoiler about the bias.
|
| I am intrigued what Timescale is going to publish next.
| qoega wrote:
| Agree. And for a blog post it can even have a story like: "We
| compared with ClickHouse and we were 10x slower, than we
| looked at this case and made it 100x faster. Thank you,
| benchmark and ClickHouse developers that showed us use case
| where we could do better."
|
| For me benchmarking is usual - "Why this query takes so long?
| We need to improve it. Sometimes 1000x times."
| untitaker_ wrote:
| Right? How do the folks at QuestDB know that their new JIT
| engine is actually responsible for those performance
| improvements? My understanding is that, index or not, data is
| still sorted by time in questdb, which is exactly what the
| ClickHouse engineers are replicating in the new schema.
| bluestreak wrote:
| The query Clickhouse picked on does not actually leverage
| time order. Perhaps clickhouse vendors on this thread can
| comment on relevance of the date partitioning for this
| query. My best guess is that it might help the execution
| logic to create data chunks for parallel scan.
|
| QuestDB does also use partitions for this purpose but we
| also calculate chunks dynamically based on available CPU to
| distribute load across cores more evenly
| datalopers wrote:
| Please don't post competitor benchmarks until you can hire
| someone who has a slight clue what they're doing. All you're
| demonstrating is the sheer incompetency at QuestDB.
| bluestreak wrote:
| I am in fact very proud of my team, who worked very hard on
| both implementation and the article. It is disappointing to
| read unfounded insults where we made every effort to be fair.
| hodgesrm wrote:
| I appreciate your benchmark and was interested to learn
| about how QuestDB processes TSBS queries efficiently. I
| work extensively with ClickHouse and it's always
| enlightening to learn about how other databases achieve
| high performance. Your descriptions of the internals are
| clear and easy to follow, especially since you included
| comparisons with older versions of QuestDB.
|
| That said, I think I can understand how some users might be
| a little put off by the comparisons. Your article
| effectively says "ClickHouse is really slow" without giving
| readers any easy way to judge what was happening under the
| covers. I was personally a bit frustrated not to have the
| time to set up TSBS and dig into what was going on. I
| therefore appreciated Geoff's effort look up the results
| and show that the default index choices didn't make a lot
| of sense for this particular case. That does not detract
| from QuestDB's performance at least from my perspective.
|
| Anyway congratulations on the performance improvement. As a
| famous character in Star Wars said, "we will watch your
| career with great interest."
|
| edit: correct typo
| Dzugaru wrote:
| "while QuestDB utilizes its full indexing strategy to read
| just a tiny fraction of the actual data"
|
| Can you please elaborate on this?
| bluestreak wrote:
| Full disclosure: I am CTO of QuestDB and I took part in
| JIT implementation. The quote above is not mine, it was
| written by Clickhouse staff. "utilizes its full indexing
| strategy" statement is false and is news to me.
| Dzugaru wrote:
| So you do a full scan and it's ~50 CPU cycles per row (48
| CPUs at 4 GHZ), correct? This is possible I guess? And in
| this case Clickhouse is wrong.
| pepemon wrote:
| So, QuestDB is faster or not? I'm puzzled now!
| olluk wrote:
| Looks like QuestDB is faster if you don't optimize your
| table storage for 1 query.
|
| But if you are okay that only limited number of columns
| to be scanned faster than others ClickHouse comes first.
| datalopers wrote:
| PeterZaitsev wrote:
| I wonder what "every effort to be fair" means ? The first
| thing you could have done is reach out to ClickHouse
| Community to ask for optimization suggestions
| bluestreak wrote:
| "fair" means that we comparing apples to apples. Ad-hoc,
| unindexed predicate, compiled by QuestDB into AVX2
| assembly (using AsmJIT) vs same predicate complied by
| Clickhouse (I'm assuming by LLVM). One can perhaps view
| this as comparing SIMD-based scans from both databases.
| Perhaps we generate better assembly, which incidentally
| offers better IO.
|
| We all understand that creating very specific index might
| improve specific query performance. Great, Clickhouse
| geared the entire table storage model to be ultra
| specific for latitude search. What if you search by
| longitude, or other column? Back to the beginning.
|
| JIT-compiled predicates offer arbitrary query
| optimisation with zero impact on ingestion. This is
| sometimes useful.
|
| What would you offer assuming that we reached out, other
| than creating an index?
|
| Clickhouse does better than we do in other areas. It JITs
| more complicated expressions, such as some date
| functions. It optimises count() queries specifically. For
| example we collect "found" rowed_ids in an array.
| Clickhouse does not specifically for count(). We still
| have work to do. On other hand we ingested this very
| dataset about 5x quicker than clickhouse, which we left
| out because article is not about "QuestDB is faster than
| Clickhouse"
| olluk wrote:
| What if the purpose of the article is to compare queries
| without indexes?
| jsnell wrote:
| Doesn't matter, since that clearly wasn't the purpose of
| the article. After all, they were totally happy to add an
| index for another competing DB as long as they happened
| to win that comparison. Then they crow about how they
| beat having an index.
|
| Pretty sleazy.
| xenator wrote:
| So, maybe do not create specific scenarios for corner
| cases and then generalize outcome? And write articles
| about common scenarios that is important for people who
| will use technology on daily basis.
| olluk wrote:
| My personal view is that having fast queries without
| indexes is quite general outcome.
| avianlyric wrote:
| What an extremely unfair comment. Having read QuestDBs blog,
| it's quite clear they've taken great pains to point out that
| a single specific benchmark isn't the be all and end all of
| DB analysis.
|
| They quite clearly start out by saying they're only looking
| to demonstrate the impact of a specific new DB feature
| they've created, and are using benchmarks that illustrate the
| difference. They make zero claims that QuestDB is faster than
| Clickhouse overall, and quite carefully point out that
| prospective users need to run their own benchmarks on their
| own data to figure out what DB will work for them.
| dimgl wrote:
| > They make zero claims that QuestDB is faster than
| Clickhouse overall
|
| Are you sure? Just one look at their website says
| differently.
|
| https://questdb.io/time-series-benchmark-suite/
|
| I don't use these tools. I just wanted to point out that
| what you're saying is disingenuous.
| thegeomaster wrote:
| Sounds like they didn't re-do the QuestDB benchmark with same
| change to the indexes, and so their claim is that Clickhouse is
| 27x faster with a specific index than QuestDB without that index.
| Which is not a fair comparison.
|
| Also, the tone of the post sounds really arrogant. They try to
| hide it a bit, I feel, but it just seeps through.
| axlee wrote:
| I didn't really read it as arrogant, more as annoyed about a
| mischaracterization that was disparaging their product.
| SOLAR_FIELDS wrote:
| It's also part of a longer trend of saber rattling between
| these vendors - there's a history of these types of posts
| also from TimescaleDB:
| https://news.ycombinator.com/item?id=29096541
| qoega wrote:
| There is a small list of vendors that do not forbid to run
| benchmarks with their systems.
| https://cube.dev/blog/dewitt-clause-or-can-you-benchmark-
| a-d...
|
| That is why there is a small subset of vendors that are
| being 'attacked' by this comparisons.
| bombcar wrote:
| More and more we start to see _why_ these forbids are in
| place.
| Dzugaru wrote:
| Well, I don't know how QuestDB works, and I couldn't find
| anything in the original benchmark, but probably they already
| have some sort of (geo)index in place? It's really strange to
| search geo-data by scanning the whole surface of the Earth. The
| point that Clickhouse outperforms this by just sorting on one
| axis (and even not using any fancy 2D indices) is reasonable.
| olluk wrote:
| No, there are no indexes in QuestDB in the article. None.
| Zero. That's bold mistake in the ClickHouse article. Should
| be named Yes, QuestDb is Faster.
| [deleted]
| [deleted]
| Dzugaru wrote:
| Yeah, I've read more carefully and it seems they're doing
| full scan.
| tomhallett wrote:
| I was curious to hear more details about this statement -
| "while QuestDB utilizes its full indexing strategy to read
| just a tiny fraction of the actual data". Did QuestDB create
| indexes in their QuestDB benchmark but just not mention it?
| Are there geoindexes which are automatically enabled which do
| help (but are of less value in the general sense from
| Clickhouse' perspective)?
| twoodfin wrote:
| I don't know how QuestDB is implemented in any detail, but
| this statement struck me as confused. My understanding is
| that for this query, QuestDB is performing a full scan of
| the relevant columns, and the point of the blog post was
| how fast their JIT engine for filtering makes this.
| [deleted]
| olluk wrote:
| There were 2 queries in the QuestDB benchmark over the same
| table. ClickHouse didn't even try to match both of them
| choosing one as a victim. I guess that's what happens when you
| optimise the data storage for one query.
| gauravphoenix wrote:
| I have always felt that DB benchmarks are useless, always use
| your own dataset
|
| https://gauravkumar.blog/performance-benchmarks-are-useless....
| nojito wrote:
| This is why the commercial offerings do not allow you publish
| benchmarks.
| PeterZaitsev wrote:
| Which is horrible thing. Even bad benchmarks often create
| create discussions
| nojito wrote:
| Not true at all. Most people take benchmarks as gospel
| because they value their time.
| capableweb wrote:
| All benchmarks are always useless, in 90% of the cases. They
| could maybe give some baseline understanding, but it's
| important to always do your own benchmarks as your performance
| can be very different than what the benchmark showed, simply
| because the data/data structures are slightly different.
|
| Do your own benchmarks people!
| PeterZaitsev wrote:
| This response illustrates important point - if you're expert in
| technology A and compare it to technology B, you're not expert
| in, comparison is very likely to be unfair.
|
| I very much would like to see vendors at least to follow
| Journalist ethics and reach out to their competition for
| optimization comments and suggestions before publishing it, so
| others are given a chance to suggest optimizations
| hodgesrm wrote:
| Agree. Or just load test on your own software, publish how you
| did it, and let other vendors respond for themselves.
| klysm wrote:
| Yeah this happens a lot. I like it when people maintain a repo
| that accepts changes for the comparison
| snikolaev wrote:
| Then you should like https://db-benchmarks.com/
| PeterZaitsev wrote:
| Great idea.
| noxvilleza wrote:
| Is there an existing named adage for something like "if one
| creates a benchmark in order to rank general performance of some
| products, some of those products will ultimately sacrifice
| general performance in order to optimize for that benchmark"?
| nathanwh wrote:
| https://en.wikipedia.org/wiki/Goodhart's_law
|
| You stated it almost directly.
| noxvilleza wrote:
| Oh dear. I did a brief search for 'adage on benchmarking' and
| only saw Rugg/Feldman benchmarks.
| bombcar wrote:
| It's also why the only true benchmark _is using the thing
| as it needs to be used_ - but this is hard to compare
| because often you need code to work with the tool and vice-
| versa.
| gfody wrote:
| there are the TPC benchmarks which try to cover a wide
| variety of use cases and scenarios and are designed
| independently from any one engine:
| https://www.tpc.org/information/benchmarks5.asp
|
| you post the results for your own product, others do the
| same, customers can compare:
| https://www.singlestore.com/blog/tpc-benchmarking-
| results/
| qoega wrote:
| It is partially true, but this benchmarks force schema.
| You can't reorganise data for example in wide table or
| add indices. So it actually does not show you how to use
| the system to solve this type of problems in a best way
| possible, but checks unoptimised results as if you never
| learn and never utilise best practices of the DBMS you
| choose for production.
| [deleted]
| [deleted]
___________________________________________________________________
(page generated 2022-06-16 23:00 UTC)