[HN Gopher] How SQLite Is Tested
___________________________________________________________________
How SQLite Is Tested
Author : thunderbong
Score : 145 points
Date : 2021-12-06 14:41 UTC (8 hours ago)
(HTM) web link (www.sqlite.org)
(TXT) w3m dump (www.sqlite.org)
| uvdn7 wrote:
| In SQlite's source code, it has the following comment.
|
| 2001 September 15
|
| The author disclaims copyright to this source code. In place of a
| legal notice, here is a blessing:
|
| - May you do good and not evil.
|
| - May you find forgiveness for yourself and forgive others.
|
| - May you share freely, never taking more than you give.
|
| SQLite is _the_ most amazing database in my opinion. (Not the
| most amazing distributed database for obvious reasons). It strips
| away all the connect and network IO management, and focuses on
| the actual database. It's _the_ most deployed and used database
| in the world (it's even running outside of the Earth). Because of
| its wide distribution, it has to be very well tested. It's much
| harder to fix a bug on client (comparing to on server).
|
| Dr. Hipp is amazing. I wish more people are like him.
| postalrat wrote:
| Then one day a better embedded db comes along and people forget
| about sqlite.
| karmasimida wrote:
| That DB better inherit those test cases however, those are
| incredible assets
| luto wrote:
| It can't. Most of that test code is proprietary to the SQLite
| team and not available to others.
| jayd16 wrote:
| Interesting. Why do you think that is?
| simonw wrote:
| It's a great business decision. My understanding is that
| SQLite's big consulting income comes when some company
| somewhere produces a new piece of embedded hardware and
| needs SQLite to work on it. They can work directly with
| the company to get that to happen, using the company's
| proprietary testing suite.
| avhon1 wrote:
| A large amount of the testing is performed by their
| proprietary test harness.
|
| > TH3 consists of about 71.5 MB or 978.3 KSLOC of C code
| implementing 46622 distinct test cases. TH3 tests are
| heavily parameterized, though, so a full-coverage test
| runs about 1.9 million different test instances. The
| cases that provide 100% branch test coverage [of SQLite]
| constitute a subset of the total TH3 test suite. [1]
|
| > SQLite itself is in the public domain and can be used
| for any purpose. But TH3 is proprietary and requires a
| license.
|
| > Even though open-source users do not have direct access
| to TH3, all users of SQLite benefit from TH3 indirectly
| since each version of SQLite is validated running TH3 on
| multiple platforms (Linux, Windows, WinRT, Mac, OpenBSD)
| prior to release. So anyone using an official release of
| SQLite can deploy their application with the confidence
| of knowing that it has been tested using TH3. They simply
| cannot rerun those tests themselves without purchasing a
| TH3 license. [2]
|
| [1] https://sqlite.org/testing.html
|
| [2] https://sqlite.org/th3.html
| kevin_thibedeau wrote:
| They don't have an open development model. Controlling
| the test harness prevents a viable fork. That is their
| moat that ensures a revenue stream.
| avhon1 wrote:
| They do sell licenses to their proprietary test harness.
|
| https://sqlite.org/prosupport.html
| lifeisstillgood wrote:
| Maybe. But we learn from the sheer hard work put into this
| level of testing. And having this level of testing makes you
| put a bit more effort into each new line.
|
| The bar has been raised. If someone is to make a better
| embedded DB, they will have to maintain it at least as well as
| this.
|
| (PS I feel embarrassed by my own testing failures when reading
| this).
| henning wrote:
| Since SQLite is in every iOS and every Android device along
| with the Python standard library, I don't think it will be
| forgotten quickly.
| [deleted]
| zepto wrote:
| Such as?
| kingcharles wrote:
| I would say this sums it up:
|
| > the SQLite library consists of approximately 143.4 KSLOC of C
| code. (KSLOC means thousands of "Source Lines Of Code" or, in
| other words, lines of code excluding blank lines and comments.)
| By comparison, the project has 640 times as much test code and
| test scripts - 91911.0 KSLOC.
|
| How does SQLite compare for bugs in comparison to other DBs? How
| much are the other players paying out in bug bounties?
| egeozcan wrote:
| I remember reading someone here explain the extensive number of
| tests the oracle database has and how implementing a feature is
| more of a waiting for tests to fail game rather than writing
| code. I couldn't find it now, maybe I'm misremembering.
|
| Especially after reading the MySQL post yesterday[0], I don't
| know what to expect anymore.
|
| Does anyone have anything to say about the testing story of
| Postgres?
|
| [0]: https://news.ycombinator.com/item?id=29455852
|
| edit: found it https://news.ycombinator.com/item?id=18442941
| nmz wrote:
| He talks about it in the corecursive[1] podcast. No mention of
| code bounties anywhere.
|
| https://corecursive.com/066-sqlite-with-richard-hipp/
| Aeolun wrote:
| 90 million lines of test code?
|
| That seems ridiculous to me. I can't imagine what would lead to
| so many lines, but I strongly doubt it's all actually source
| code.
| Banana699 wrote:
| A lot of the mentioned tools are fuzzers[1], these are tools
| that start from certain input(s), say "SELECT * FROM TABLE
| WHERE some_column = some_value", and repeatedly
|
| 1- mutates[2] the input,
|
| 2- measures some metric of the database's performance (e.g.
| how many non-syntax errors it spat out)
|
| 3- and introduces more or less mutations to increase or
| decrease that metric.
|
| In short, an evolutionary computation where the population
| being generated is test code, the fitness function is some
| metric summarizing how the generated test code tested the db
| code, and mutation operators gradually pushes the generated
| test code towards a certain optima that we want our tests to
| have.
|
| This is an extremely general approach that can be used to
| generate anything, and it's uncannily effective at finding
| bugs especially in structured-format consumers like
| compilers[3] and db engines. You can use it to generate
| queries as above and the fitness function would be something
| like "Did the database gave an unholy error squeak or violate
| some key invariants in the db file?", you can use it to
| generate general test harnesses that exercise the whole C
| code and measure coverage via instrumentation, the fitness
| function would be something like "How much did this mutation
| covered of the application's instructions?". Not only do they
| take _both_ of those approaches and many others, but they do
| it multuple times and independently, so for example it 's
| mentioned that Google and the core development team both have
| an independent query fuzzer. They mention about 4 or 5
| fuzzers doing similar thing.
|
| And off course fuzzing is just one way to generate tests,
| there are plenty others.
|
| >I strongly doubt it's all actually source code
|
| I mean, why not? source code is not necessarily hand-written
| source code.
|
| [1]: skim https://www.fuzzingbook.org/ for a fairly good
| overview
|
| [2]: Generally, mutation is either completely blind and
| general, i.e. byte-level , or structure-aware, i.e. has some
| notion of a grammar or a schema governing the thing it
| mutates, it wouldn't just change a random letter of "SELECT",
| because the result would almost cerainly be invalid SQL.
|
| [3]: https://embed.cs.utah.edu/csmith/
| xeromal wrote:
| Databases are extremely complex.
| tmp_anon_22 wrote:
| And how many of those lines of tests are simply autogenerated?
| xmaayy wrote:
| I hope most of them, 91 million lines of code is insane
| hi41 wrote:
| >> the project has 640 times as much test code and test scripts -
| 91911.0 KSLOC.
|
| This is amazing. One of the reasons some of the developers don't
| write extensive test code is because it can cause delays in the
| deliverables and miss deadlines. How is the sqlite team able to
| write such a large number of test cases and not miss deadlines?
| hawk_ wrote:
| One reason I suspect is that it's a well defined product. The
| "spec" is quite well specified at this point. When changes are
| made existing tests just need to catch regressions, not be
| changed extensively due to changed meaning of existing
| primitives. Projects with significant spec/feature churn can't
| do the same.
| layer8 wrote:
| > not miss deadlines
|
| SQLite doesn't really have market pressure, so they can choose
| any deadline such that it provides adequate time for writing
| test code -- assuming they set deadlines at all.
| sophacles wrote:
| Tangential: I've seen a couple SQLite articles today. Did I miss
| some cool related news?
| rolandog wrote:
| I got the impression that this may have been posted because
| other fellow HN'ers mentioned it in the "How GNU coreutils are
| tested" thread.
| ckastner wrote:
| It's a great read, but it got posted quite often here. Some of
| the previous posts have longer comment threads.
|
| https://hn.algolia.com/?q=how+sqlite+is+tested
| dang wrote:
| Past related threads. I feel like there must have been
| something else since 2016?
|
| _How SQLite Is Tested_ -
| https://news.ycombinator.com/item?id=11936435 - June 2016 (57
| comments)
|
| _How SQLite Is Tested_ -
| https://news.ycombinator.com/item?id=9737754 - June 2015 (1
| comment)
|
| _How SQLite Is Tested_ -
| https://news.ycombinator.com/item?id=9095836 - Feb 2015 (17
| comments)
|
| _How SQLite is tested_ -
| https://news.ycombinator.com/item?id=6815321 - Nov 2013 (37
| comments)
|
| _How SQLite is tested_ -
| https://news.ycombinator.com/item?id=4799878 - Nov 2012 (6
| comments)
|
| _How SQLite is tested_ -
| https://news.ycombinator.com/item?id=4616548 - Oct 2012 (40
| comments)
|
| _How SQLite Is Tested_ -
| https://news.ycombinator.com/item?id=633151 - May 2009 (28
| comments)
| zzzeek wrote:
| exactly, what's the mechanism by which this keeps getting
| posted individually (a time limit I assume)? it almost seems
| like marketing spam at this point.
| dang wrote:
| The mechanism is simple: people keep running across it and
| finding it interesting. It's not spam, it's just the
| statistical cloud at work.
|
| On HN, reposts are fine when a story hasn't had a
| significant thread in a year or so (see
| https://news.ycombinator.com/newsfaq.html). So it's
| certainly ok after 5 years! Though I'd swear we'd seen more
| threads than that...
| ckastner wrote:
| > _I feel like there must have been something else since
| 2016?_
|
| Funny, that was my feeling as well.
|
| I'm 99% sure to have read a discussion on this here, but all
| submissions after 2016 (apart from today's) have no comments
| at all, hence also no comments linking to the older
| discussion.
|
| _Edit: Removed the remark on missing comments. The default
| search settings do not search for comments._
| Dangsucks2 wrote:
| Unfortunately SQLite was developed by a religious extremist for
| use on war ships. (Try to reconcile those two facts...)
| Lhiw wrote:
| SQLite is the most underrated database of all time.
|
| I know it's popular, but nowhere near to the level it should be.
| dnautics wrote:
| > nowhere near to the level it should be.
|
| I get the impression that it's used exactly as it should be, in
| systems that need a database but don't need clustering or
| failover strategies.
| pletnes wrote:
| As far as I can tell, there's far too many file formats out
| there. I'm sure sqlite should be used more than it currently
| is.
| bob1029 wrote:
| > but don't need clustering or failover strategies.
|
| Even for those cases where you do need it, there are emerging
| options. You can handle replication 100% in business logic
| (something I personally enjoy), or you can use a path like
| dqlite to replicate the physical WAL log.
| jerodsanto wrote:
| "SQLite is likely used more than all other database engines
| combined"
|
| So, like, even more popular than that?!
|
| https://www.sqlite.org/mostdeployed.html
| wenc wrote:
| SQLite is a file format with a SQL engine which makes it a
| great embedded database.
|
| It is not an RDBMS in the true sense so in my opinion it is
| correctly rated.
|
| I tried using SQLite to solve problems which normally would
| require an RDBMS and ran into so many issues like the lack of
| enforced static types, built in date or decimal types (and fast
| aggregations on dates), concurrent writes, etc. It's only when
| you've gone through this process that you'll realize that
| SQLite it not the database you think it is -- and SQLite itself
| is up front about that: https://www.sqlite.org/aff_short.html
| simonw wrote:
| Enforced static types are now available, as of SQLite 3.37.0
| which came out last week:
| https://www.sqlite.org/stricttables.html
| rmbyrro wrote:
| Why so? I have the impression every developer relies on it for
| a lot of local stuff, and many libraries also use it locally in
| users' machines for persistence needs.
|
| What do you think it " _should be_ " and is missing?
| chaxor wrote:
| I think they have this perspective because SQLite is almost
| never mentioned when DBs are discussed. Almost everyone
| pushes postgres, which I think is often a bit insane.
| Postgres or (often rarely) MySQL is given as the answer to
| most DB questions, even when coming from clearly new devs.
| It's not often that the dev needs any of the large amount of
| setup/permissions required for postgres/MySQL. The fact that
| SQLite comes as part of python's standard library should be
| used as second nature for most devs. I feel the same way -
| that SQLite should be the default answer, and one should
| reach towards postgres or MySQL unless you _really_ * _have*_
| to.
| cortesoft wrote:
| What? SQLite is rarely the answer to replace Postgres or
| MySQL... it is not designed for network access nor high
| availability.
| rmbyrro wrote:
| SQLite is not meant to _compete_ directly with MySQL or
| Postgres. It 's meant for local usage only, where you don't
| need high concurrency, managing multiple connections,
| replication, etc.
|
| I mean, you could probably use SQLite in many simple cases
| which currently use Postgres (personal servers, small
| apps), but advocating it as default solution is inadequate
| at best, if not misleading to people starting out in the
| field.
| Lhiw wrote:
| This view is the view I was talking about in the op.
|
| Inadequate for what? Most persistence layers I've ever
| seen did not need MySQL, postgres or MongoDB.
|
| > high concurrency, managing multiple connections,
| replication
|
| In the grand scheme of things, basically nobody needs
| these things. Sorry, your app is overengineered.
|
| Your app actually suffered in areas such as performance,
| security, portability, complexity and cost.
|
| Developers love to prematurely optimise everything. Most
| SaaS products today don't need a dedicated database
| server. Most apps that require a database server backend
| would be better with sqlite, most Dev/test/staging/prod
| pipelines would be better served with sqlite files.
| yung_steezy wrote:
| I use sqlite every day at work as a web dev. I do a lot of work
| with file uploaders and being able to import CSV files as tables
| I can query is a killer feature.
___________________________________________________________________
(page generated 2021-12-06 23:02 UTC)