[HN Gopher] How bloom filters made SQLite 10x faster
___________________________________________________________________
How bloom filters made SQLite 10x faster
Author : avinassh
Score : 130 points
Date : 2024-12-22 14:44 UTC (8 hours ago)
(HTM) web link (avi.im)
(TXT) w3m dump (avi.im)
| dang wrote:
| Related:
|
| _SQLite: Past, Present, and Future_ -
| https://news.ycombinator.com/item?id=32675861 - Sept 2022 (143
| comments)
| ncruces wrote:
| [flagged]
| gpcz wrote:
| Even if true, it seems like they're doing a pretty good job on
| their own.
| jpalawaga wrote:
| SQLite is self-described as not open contribution. So yes by
| their own measure they've made it more difficult to mainline
| features (and intentionally so).
| steve_gh wrote:
| I submitted a bug report on SQLite a year or so back (a simple
| test case only, not a solution). The folks were super nice, and
| their patch went into the next release.
| binary132 wrote:
| Open contribution isn't a good in and of itself.
| DaveMcMartin wrote:
| SQLite is getting better and better. I am using it in production
| for a bunch of websites and never got a problem.
| immibis wrote:
| It should be fine for read-only data. If you want to write, be
| aware that only one process can write at a time, and if you
| forget to set busy_timeout at the start of the connection, it
| defaults to zero milliseconds and you'll get an error if
| another process has locked the database for writing while you
| try to read or write it. Client-server databases tend to handle
| concurrent writers better.
| bingaweek wrote:
| What do you mean it "should be fine"? It obviously is fine.
| It sounds like you read a blog post on sqlite and couldn't
| wait to share it with us.
| PartiallyTyped wrote:
| Just a thought, just because a general problem is NPHard doesn't
| mean that we can't find specific solutions quickly or that a
| given input is hard to search for. If the downstream effect
| results in an order of magnitude less work, it makes sense, it's
| just a tradeoff.
| bawolff wrote:
| Well yes, heurstics for query planning is a very well
| researched field
| datadeft wrote:
| Next should be this ->
| https://x.com/lemire/status/1869752213402157131
|
| What a progress we have with these. Amazing times.
| mkonecny wrote:
| > At the start of the join operation, we go over all the rows of
| dimension tables and set the bits in the Bloom filter which match
| the query predicate.
|
| Can someone explain this? Seems to me it's just as expensive as
| iterating over the tables (the previous implementation), since
| you still need to visit each row to build the cache?
___________________________________________________________________
(page generated 2024-12-22 23:00 UTC)