[HN Gopher] The Untold Story of SQLite (2021)
___________________________________________________________________
The Untold Story of SQLite (2021)
Author : xrayarx
Score : 209 points
Date : 2023-01-28 16:45 UTC (6 hours ago)
(HTM) web link (corecursive.com)
(TXT) w3m dump (corecursive.com)
| throwaway29303 wrote:
| Shane Harrelson did this for us about 10 years ago. He came up
| with this huge corpus of SQL statements, and he ran them against
| every database engine that he could get his hands on. We wanted
| to make sure everybody got the same answer, and he managed to
| segfault every single database engine he tried, including SQLite,
| except for Postgres. Postgres always ran and gave the correct
| answer. We were never able to find a fault in that. The Postgres
| people tell me that we just weren't trying hard enough. It is
| possible to fault Postgres, but we were very impressed.
| We crashed Oracle, including commercial versions of Oracle. We
| crashed DB2. Anything we could get our hands on, we tried it and
| we managed to crash it, but the point was that we wanted to make
| sure that SQLite got the same answers for all of these queries,
| or equivalent answers, because a lot of these queries, they're
| indeterminate and the rows might come out in a different order
| because you [crosstalk 00:25:10] order by clause, so we wanted to
| make sure that all the database engines got equivalent answers.
| Mostly, we wanted to make sure that SQLite was getting the same
| answers everybody else is. That's another test suite,
| and then we have lots of smaller ones, as well. Between them all,
| it's a lot of testing code, and it takes a long time to run.
|
| I hope some people are/were paying attention to this. ;)
| marcosdumay wrote:
| You mean fixing their databases? I would expect Postgres dev to
| act on it, but not much from the other ones.
| elteto wrote:
| "Shane Harrelson [...] came up with this huge corpus of SQL
| statements, and he ran them against every database engine that he
| could get his hands on. We wanted to make sure everybody got the
| same answer, and he managed to segfault every single database
| engine he tried, including SQLite, except for Postgres. Postgres
| always ran and gave the correct answer. We were never able to
| find a fault in that. The Postgres people tell me that we just
| weren't trying hard enough. It is possible to fault Postgres, but
| we were very impressed."
|
| Did not expect to find such a cool anecdote about Postgres here!
| fijiaarone wrote:
| "Freedom means taking care of yourself."
| posharma wrote:
| There's an industry wide fad (like ChatGPT) and then there's a
| hacker news wide fad which is SQLite :-). Not sure how it all
| started. Last few months (or my be even an year) there has been
| an unprecedented number of SQLite posts. What gives?
| jonas21 wrote:
| Last few months? I feel like SQLite has been extremely popular
| on HN for at least the last 5 years (and the data seems to
| confirm this [1]). In which case, maybe it's not so much of a
| fad?
|
| [1] https://hn.algolia.com/?q=sqlite
| ok123456 wrote:
| Calling SQLite a fad is like calling Linux a fad.
| nindalf wrote:
| He doesn't mean using it. He means posting about SQLite on
| HN. The story of SQLite, how it came to be developed, how
| it's tested, how it really works under the hood, funky ways
| to use it on the server for storage (fly.io, Cloudflare do
| this). All of these have become more common in the last year.
| jestarray wrote:
| how is sqlite a fad? its used pretty much everywhere, is
| simple, and yet surprisingly scalable to some
| posharma wrote:
| SQLite is not a fad. It's just that it never gained so much
| attention here before.
| cdcarter wrote:
| Google Trends [0] indicates that sqlite is about as popular
| as usual. There's been some recent growth in the last year,
| but that's after a bit of a dip. A regression of the time
| series would be fairly flat.
|
| I suspect popularity here comes in waves through a Katamari
| Damacy effect. People start reading about a topic, and
| start posting, thus more people read, research, post,
| etc... until a saturation point, a cooling off period, and
| then a rebuild.
|
| [0]: obviously not the same as whats happening on HN, I'd
| love to see someone pull these numbers from the HN API!
| TheRealPomax wrote:
| They were talking about it being a fad here on HN, in
| terms of submitted (and upvoted to FP) posts per
| week/month/year. It's always been a topic getting the
| incidental FP attention, but the number of times
| something relating to SQLite makes it to the front page
| here seems to have increased considerably in recent
| months.
| lukevp wrote:
| I think as single computers get more and more power, and
| engineers work in these super slow, eventually consistent cloud
| infrastructures, there's a visceral appeal of an architecture
| that can run on my laptop and also scale to a reasonable user
| base. The only limiting factor for most applications at this
| point that would prevent a single machine from serving them is
| having HA/DR. So a lot of effort is being made at the
| replication side of SQLite so it can be part of this new stack.
| I am seeing it emerge as a reaction to the crazy amount of
| microservices and distributed computing just to solve small
| problems that could fit on a machine. And the inability to work
| locally as well.
| sangnoir wrote:
| SQLite (like curl and libcurl) have been pervasive and awesome
| for a long time and you'd always find posts about both on HN
| since forever, because of their ubiquity.
|
| SQLite has been appearing a lot more often on HN because of a
| different more recent fad: edge computing.
| jskulski wrote:
| Haha yeah I feel like SQLite is a very _told_ story on HN.
| Great story and tech for real, inspirational but also very
| trendy right now!
| posharma wrote:
| Yeah, exactly. May be I should've used the word trendy as
| some ppl take offense to the word fad for SQLite.
| geysersam wrote:
| It might be more than a year actually. I'm pretty sure it was
| trendy 1.5 years ago. Remember I read a really cool blog post
| around then where someone used range requests to query SQLite
| statically. It used WASM + SQLite and hyped hard here.
| Seattle3503 wrote:
| For me, it is always the best place to start when I need a DB
| for personal projects. The serverless nature makes it very easy
| to use. It is easy to fall in love with the convience it
| provides, because lets you focus on the fun parts of your
| project.
| lliamander wrote:
| I think most programmers would like to be remembered for having
| written some really cool, useful piece of software - ideally by
| themselves or with a small team of people, where they have
| creative control over the requirements and design.
|
| SQLite (and it's founder, Richard Hipp) are an inspirational
| example of such success.
| ravi-delia wrote:
| I feel like it's just that it's one of the coolest pieces of
| software that every developer- web, backend, systems, or
| otherwise, has at least seen before. It runs in the browser, it
| runs on your phone, and unlike other databases (all of which
| are also very cool) it is completely within the domain of even
| a single inexperienced developer. It's just a little file on
| disk, just a library for accessing it, there's no magic!
| Especially if you're used to taking a database for granted it's
| just an extremely cool thing to think about.
| ethbr0 wrote:
| The brilliance of SQLite is the brilliance of interfaces.
|
| Value (in the abstract, not just $ sense) accrued around SQL.
|
| At some point, so much value accrued that people were using
| it for things it wasn't designed to do.
|
| SQLite provided a solution for "people who want to use a
| database, but don't look like traditional database
| operators." Turns out there's a lot of those.
|
| That this large userbase existed was a brilliant observation,
| combined with brilliant execution in shepherding and evolving
| SQLite since.
|
| And none of the above would've been possible if the SQL
| interface hadn't been standardized and adopted over the last
| few decades*.
|
| * Turns out, SQL's 50th anniversary will be 2024
| therealdrag0 wrote:
| SQLite has excited HN for years
| vagrantJin wrote:
| I doubt ChatGPT is a fad. Feels like the second coming of the
| gun and it will have an outsized impact.
| jacob019 wrote:
| The underlying tech anyway.
| [deleted]
| jacob019 wrote:
| Should I be using file extension .db or .sqlite3 ?
| simonw wrote:
| My experience is that it doesn't matter at all, but .db appears
| to me a lot more common than .sqlite3
| panda_fish wrote:
| Thanks for that amazing trip down memory lane...
|
| > Wow, I've got an SQL database running on my Palm Pilot.
|
| Weirdly, my thesis in 2000 was to write an SQL parser for Palm
| (Palm V - still have it). I remember there being such a massive
| gap in the market for a pervasive standards-compliant data
| storage solution. I used javacc, which is still around I think -
| I can't imagine it covered 1% of the features of SQLite though.
| Bravo!
| rompic wrote:
| My favourite random sqlite story: The company I once worked for
| used an outdated version of sqlite (3.8.6) in one of their
| products. The databases used got bigger and bigger and in a very
| big project one of the "already known to be slow"-queries took
| more than an hour on my laptop making the tool unusable.
|
| On a quiet day, I was able to save the temporary table used as
| part of the process and run the problematic query against it in
| an isolated fashion.
|
| The query returned an extremely high number of results and when I
| discovered this I questioned my SQL-fu, my sanity and my trust in
| computers.
|
| I found that we were hit by a bug that was fixed 6 years before I
| discovered it (https://sqlite.org/src/info/6f2222d550f5b0ee7ed).
| Sqlite's query planner assumed that a field with a not null
| constraint can never be null, which isn't the case for the right
| hand table in a left join.
|
| I fixed it by adding a not null check in the query and then later
| by updating the library. After that the 1 hour query ran in ~700
| ms.
|
| This faster run time also helped with smaller projects and in the
| end allowed extending our test suite considerably.
|
| Tldr: Keep your dependencies up to date.
| EGreg wrote:
| Now what about the untold _true_ story of SQLite?
| dmitriid wrote:
| A Netflix documentary series
| [deleted]
| fijiaarone wrote:
| I'm interested in looking at fossil SCM - not the built in web
| server, wiki, and issue tracking, but the auto sync and
| simplified interface appeal for personal projects, and maybe an
| embedded versioned file based CMS.
| maxk42 wrote:
| I've used git since it was first released to the public and
| recently I feel like I know less about it than ever before. I
| switched to fossil for personal projects around a year ago and
| haven't regretted it one bit!
| thunderbong wrote:
| Previously -
|
| https://news.ycombinator.com/item?id=27718701 (2 years ago, 95
| comments)
| counttheforks wrote:
| [flagged]
| bobobob420 wrote:
| marked as duplicate
| https://news.ycombinator.com/item?id=34504085
| https://news.ycombinator.com/item?id=34511831
|
| These were not taken down. Anyways who cares what the BBC has
| to say about India. They are not exactly a known unbiased
| source when it comes to coverage of India and never have been.
|
| I slightly agree with some of your thoughts on speaking your
| mind freely on this board however its probably for the best.
| Dang knows this place turning into a cesspool is not useful. I
| will say to other hacker news commentators please do not
| downvote if you disagree with a comment. Discourse is important
| but I do agree anonymous discourse is much more nuanced
|
| I do not understand how this post is number one. Are you saying
| Hacker News can promote certain posts to the top and it is not
| organically curated?
| counttheforks wrote:
| A dupe of 3 and 4 day old posts, which got nearly 200
| comments? No. It is not reasonable to just remove that. This
| is censorship. Especially when you consider that dang tends
| to instead just post a comment saying: "Previously discussed
| at ..."
|
| > I do not understand how this post is number one. Are you
| saying Hacker News can promote certain posts to the top and
| it is not organically curated?
|
| Absolutely, yes. The mods can influence the ranking directly
| and they have openly talked about this. If you search
| https://hn.algolia.com/ for "penalized", "demoted" or
| "promoted" you will find discussions regarding this.
|
| ---
|
| Since dang banned me from posting new replies:
|
| > Maybe you haven't been here long enough to have read the
| FAQ or know that generally all dupes from 3-4 days ago are
| marked, and only allowed after at least several months?
|
| I've been here for many years. I just make a new account
| periodically. Maybe you haven't been paying enough attention
| to see the difference in behavior that dang exhibits when he
| wants to censor something? Notice how he's usually all over
| the comments explaining why something was removed as dupe,
| but now there is nothing except him linking to the older
| posts (https://news.ycombinator.com/item?id=34559614),
| waiting for an hour or two, and then silently deleting the
| new one.
|
| Not to mention that the subject of the post is completely
| different. The latest post is about the documentary being
| removed from archive.org at the request of the BBC. None of
| the earlier posts cover this.
| dahart wrote:
| Maybe you haven't been here long enough to have read the
| FAQ or know that generally all dupes from 3-4 days ago are
| marked, and only allowed after at least several months?
|
| https://news.ycombinator.com/newsfaq.html
|
| "If a story has not had significant attention in the last
| year or so, a small number of reposts is ok. Otherwise we
| bury reposts as duplicates."
| dahart wrote:
| I'm confused by your comment and edits. You both asked for
| and complained about being given a reason why the article
| is a dupe. Which is it? The comment explains why using
| those two links. Why has this particular article gotten you
| so riled up?
|
| I come here in part for the moderation, it increases the
| quality of discussions. It's not surprising that if you
| keep making a lot of noise and trying to generate off-topic
| outrage, it might be met with some active moderation,
| that's exactly what I would expect, and I'm sorry it's
| bothering you so much, but FWIW I don't mind that it's
| happening, and it does seem reasonable and in-bounds from
| my perspective. BTW to me it seems like you might be making
| some incorrect assumptions about what's happening and why;
| the news topic does seem to be covered adequately with the
| first one or two articles.
| counttheforks wrote:
| > You both asked for and complained about being given a
| reason why the article is a dupe. Which is it?
|
| Dang posted the links to the previous stories, didn't
| mark the post as a dupe or say anything about it being a
| dupe. Because it's not. Then two hours later it silently
| got deleted.
| dahart wrote:
| It is marked as dupe, and not deleted, we can still see
| it. Why does it matter if there was a time delay between
| the two things you saw? What does that have to do with
| anything?
| croes wrote:
| It's not about what the BBC says but what India does to
| prevent others seeing what the BBC says.
| codetrotter wrote:
| Elaborate
| putna wrote:
| wow, amazing story.
|
| > it's the old joke of, you get 95% of the functionality with the
| first 95% of your budget, and the last 5% on the second 95% of
| your budget.
|
| repeats every time
| aidenn0 wrote:
| I think that's overly optimistic. It implies you can have a
| complete project with less than double your budget.
| pflanze wrote:
| I thought that the implication was that the budget grew by a
| factor of ~20, not 2. As in, the first 95% finished in 95% of
| the original budget, the rest finished in 95% of the final
| budget, so the first 95% took 5% of the final budget.
|
| Hmm, maybe I didn't detect the sarcasm in your reply.
| breck wrote:
| One thing not emphasized enough in this interview: SQLite is
| public domain. So was the web, which also won against its better
| funded licensed competitors at the time.
|
| The lesson is clear: if you want to win in the long run, you need
| not just great skills and tech, but you have to go public domain.
| Or to put it bluntly: #LicensesAreForLosers.
|
| https://breckyunits.com/how-the-public-domain-can-win.html
| simonw wrote:
| I think I've heard SQLite creator Richard D Hipp say in the
| past that he thinks using the public domain declaration rather
| than a regular open source license was a mistake.
|
| Public domain raises all sorts of challenges for potential
| adopters of the software that aren't an issue with a more
| deliberately designed license.
|
| UPDATE: I misremembered this. He does talk about some of the
| surprise challenges in this interview, but does not go as far
| as saying that he regretted it:
| https://changelog.com/podcast/201#transcript-215
| buttspelunker wrote:
| Are you talking about CERN's decision to put their web client
| and server source code into the public domain in 1993? It seems
| like a stretch - to me - to attribute the success of the web to
| that decision.
| andylynch wrote:
| It was probably a big factor. Gopher was a real competitor
| initially but the University of Minnesota which owned the IP
| started trying to charge license fees in Feb '93- I suspect
| CERN's decision was a response to this as well as MOSAIC's
| similar efforts around the same time.
| breck wrote:
| You couldn't predict that CERN's protocol would win, but you
| COULD predict that a public domain protocol would win.
|
| Proof:
|
| For every pair {protocolX,protocolY} where
| functionality(protocolX) = functionality(protocolY) &&
| isPublicDomain(protocolX) == true &&
| isPublicDomain(protocolY) == false, then
| speedAndUtility(protocolX) >> speedAndUtility(protocolY).
|
| https://breckyunits.com/how-the-public-domain-can-win.html
| revskill wrote:
| Interview question: Given a 2GB csv file, write a script to do
| calculation with it.
|
| Me: Put all in SQLite and write a SQL query.
| cauthon wrote:
| how/where do y'all suggest hosting a shared SQLite db?
| simonw wrote:
| That's exactly what my https://datasette.io project aims to
| solve!
|
| Recent examples:
|
| - https://simonwillison.net/2023/Jan/27/exploring-musiccaps/
|
| - https://simonwillison.net/2022/Aug/21/scotrail/
|
| - https://simonwillison.net/2022/Sep/5/laion-aesthetics-
| weekno...
| dfinninger wrote:
| Yeah, there are actually a number of scripts I have at my
| company that use Python's baked in SQLite3 module to do
| something complicated very easily.
|
| There's even a lib for Parquet making analysis on a small
| number of problematic files quite easy.
| pletnes wrote:
| Have a go with duckdb next time - you can query csv files
| without loading them first.
| simonw wrote:
| You can do that with SQLite too:
| https://til.simonwillison.net/sqlite/one-line-csv-operations
|
| (DuckDB is a lot more ergonomic for that kind of thing though
| - it's really fantastic tech)
| sam_lowry_ wrote:
| Unlike sqlite, DuckDB is a very complex and much less
| polished.
| deaddodo wrote:
| The spirit/answer being sought of the question is to not have
| to load all of the data in memory first.
| papito wrote:
| The origins of SQLite are pretty wild. This was one of my
| favorite episodes.
___________________________________________________________________
(page generated 2023-01-28 23:00 UTC)