hngopher.com

       [HN Gopher] How SQLite Is Tested
       ___________________________________________________________________
        
       How SQLite Is Tested
        
       Author : thunderbong
       Score  : 145 points
       Date   : 2021-12-06 14:41 UTC (8 hours ago)
        
 (HTM) web link (www.sqlite.org)
 (TXT) w3m dump (www.sqlite.org)
        
       | uvdn7 wrote:
       | In SQlite's source code, it has the following comment.
       | 
       | 2001 September 15
       | 
       | The author disclaims copyright to this source code. In place of a
       | legal notice, here is a blessing:
       | 
       | - May you do good and not evil.
       | 
       | - May you find forgiveness for yourself and forgive others.
       | 
       | - May you share freely, never taking more than you give.
       | 
       | SQLite is _the_ most amazing database in my opinion. (Not the
       | most amazing distributed database for obvious reasons). It strips
       | away all the connect and network IO management, and focuses on
       | the actual database. It's _the_ most deployed and used database
       | in the world (it's even running outside of the Earth). Because of
       | its wide distribution, it has to be very well tested. It's much
       | harder to fix a bug on client (comparing to on server).
       | 
       | Dr. Hipp is amazing. I wish more people are like him.
        
       | postalrat wrote:
       | Then one day a better embedded db comes along and people forget
       | about sqlite.
        
         | karmasimida wrote:
         | That DB better inherit those test cases however, those are
         | incredible assets
        
           | luto wrote:
           | It can't. Most of that test code is proprietary to the SQLite
           | team and not available to others.
        
             | jayd16 wrote:
             | Interesting. Why do you think that is?
        
               | simonw wrote:
               | It's a great business decision. My understanding is that
               | SQLite's big consulting income comes when some company
               | somewhere produces a new piece of embedded hardware and
               | needs SQLite to work on it. They can work directly with
               | the company to get that to happen, using the company's
               | proprietary testing suite.
        
               | avhon1 wrote:
               | A large amount of the testing is performed by their
               | proprietary test harness.
               | 
               | > TH3 consists of about 71.5 MB or 978.3 KSLOC of C code
               | implementing 46622 distinct test cases. TH3 tests are
               | heavily parameterized, though, so a full-coverage test
               | runs about 1.9 million different test instances. The
               | cases that provide 100% branch test coverage [of SQLite]
               | constitute a subset of the total TH3 test suite. [1]
               | 
               | > SQLite itself is in the public domain and can be used
               | for any purpose. But TH3 is proprietary and requires a
               | license.
               | 
               | > Even though open-source users do not have direct access
               | to TH3, all users of SQLite benefit from TH3 indirectly
               | since each version of SQLite is validated running TH3 on
               | multiple platforms (Linux, Windows, WinRT, Mac, OpenBSD)
               | prior to release. So anyone using an official release of
               | SQLite can deploy their application with the confidence
               | of knowing that it has been tested using TH3. They simply
               | cannot rerun those tests themselves without purchasing a
               | TH3 license. [2]
               | 
               | [1] https://sqlite.org/testing.html
               | 
               | [2] https://sqlite.org/th3.html
        
               | kevin_thibedeau wrote:
               | They don't have an open development model. Controlling
               | the test harness prevents a viable fork. That is their
               | moat that ensures a revenue stream.
        
             | avhon1 wrote:
             | They do sell licenses to their proprietary test harness.
             | 
             | https://sqlite.org/prosupport.html
        
         | lifeisstillgood wrote:
         | Maybe. But we learn from the sheer hard work put into this
         | level of testing. And having this level of testing makes you
         | put a bit more effort into each new line.
         | 
         | The bar has been raised. If someone is to make a better
         | embedded DB, they will have to maintain it at least as well as
         | this.
         | 
         | (PS I feel embarrassed by my own testing failures when reading
         | this).
        
         | henning wrote:
         | Since SQLite is in every iOS and every Android device along
         | with the Python standard library, I don't think it will be
         | forgotten quickly.
        
         | [deleted]
        
         | zepto wrote:
         | Such as?
        
       | kingcharles wrote:
       | I would say this sums it up:
       | 
       | > the SQLite library consists of approximately 143.4 KSLOC of C
       | code. (KSLOC means thousands of "Source Lines Of Code" or, in
       | other words, lines of code excluding blank lines and comments.)
       | By comparison, the project has 640 times as much test code and
       | test scripts - 91911.0 KSLOC.
       | 
       | How does SQLite compare for bugs in comparison to other DBs? How
       | much are the other players paying out in bug bounties?
        
         | egeozcan wrote:
         | I remember reading someone here explain the extensive number of
         | tests the oracle database has and how implementing a feature is
         | more of a waiting for tests to fail game rather than writing
         | code. I couldn't find it now, maybe I'm misremembering.
         | 
         | Especially after reading the MySQL post yesterday[0], I don't
         | know what to expect anymore.
         | 
         | Does anyone have anything to say about the testing story of
         | Postgres?
         | 
         | [0]: https://news.ycombinator.com/item?id=29455852
         | 
         | edit: found it https://news.ycombinator.com/item?id=18442941
        
         | nmz wrote:
         | He talks about it in the corecursive[1] podcast. No mention of
         | code bounties anywhere.
         | 
         | https://corecursive.com/066-sqlite-with-richard-hipp/
        
         | Aeolun wrote:
         | 90 million lines of test code?
         | 
         | That seems ridiculous to me. I can't imagine what would lead to
         | so many lines, but I strongly doubt it's all actually source
         | code.
        
           | Banana699 wrote:
           | A lot of the mentioned tools are fuzzers[1], these are tools
           | that start from certain input(s), say "SELECT * FROM TABLE
           | WHERE some_column = some_value", and repeatedly
           | 
           | 1- mutates[2] the input,
           | 
           | 2- measures some metric of the database's performance (e.g.
           | how many non-syntax errors it spat out)
           | 
           | 3- and introduces more or less mutations to increase or
           | decrease that metric.
           | 
           | In short, an evolutionary computation where the population
           | being generated is test code, the fitness function is some
           | metric summarizing how the generated test code tested the db
           | code, and mutation operators gradually pushes the generated
           | test code towards a certain optima that we want our tests to
           | have.
           | 
           | This is an extremely general approach that can be used to
           | generate anything, and it's uncannily effective at finding
           | bugs especially in structured-format consumers like
           | compilers[3] and db engines. You can use it to generate
           | queries as above and the fitness function would be something
           | like "Did the database gave an unholy error squeak or violate
           | some key invariants in the db file?", you can use it to
           | generate general test harnesses that exercise the whole C
           | code and measure coverage via instrumentation, the fitness
           | function would be something like "How much did this mutation
           | covered of the application's instructions?". Not only do they
           | take _both_ of those approaches and many others, but they do
           | it multuple times and independently, so for example it 's
           | mentioned that Google and the core development team both have
           | an independent query fuzzer. They mention about 4 or 5
           | fuzzers doing similar thing.
           | 
           | And off course fuzzing is just one way to generate tests,
           | there are plenty others.
           | 
           | >I strongly doubt it's all actually source code
           | 
           | I mean, why not? source code is not necessarily hand-written
           | source code.
           | 
           | [1]: skim https://www.fuzzingbook.org/ for a fairly good
           | overview
           | 
           | [2]: Generally, mutation is either completely blind and
           | general, i.e. byte-level , or structure-aware, i.e. has some
           | notion of a grammar or a schema governing the thing it
           | mutates, it wouldn't just change a random letter of "SELECT",
           | because the result would almost cerainly be invalid SQL.
           | 
           | [3]: https://embed.cs.utah.edu/csmith/
        
           | xeromal wrote:
           | Databases are extremely complex.
        
         | tmp_anon_22 wrote:
         | And how many of those lines of tests are simply autogenerated?
        
           | xmaayy wrote:
           | I hope most of them, 91 million lines of code is insane
        
       | hi41 wrote:
       | >> the project has 640 times as much test code and test scripts -
       | 91911.0 KSLOC.
       | 
       | This is amazing. One of the reasons some of the developers don't
       | write extensive test code is because it can cause delays in the
       | deliverables and miss deadlines. How is the sqlite team able to
       | write such a large number of test cases and not miss deadlines?
        
         | hawk_ wrote:
         | One reason I suspect is that it's a well defined product. The
         | "spec" is quite well specified at this point. When changes are
         | made existing tests just need to catch regressions, not be
         | changed extensively due to changed meaning of existing
         | primitives. Projects with significant spec/feature churn can't
         | do the same.
        
         | layer8 wrote:
         | > not miss deadlines
         | 
         | SQLite doesn't really have market pressure, so they can choose
         | any deadline such that it provides adequate time for writing
         | test code -- assuming they set deadlines at all.
        
       | sophacles wrote:
       | Tangential: I've seen a couple SQLite articles today. Did I miss
       | some cool related news?
        
         | rolandog wrote:
         | I got the impression that this may have been posted because
         | other fellow HN'ers mentioned it in the "How GNU coreutils are
         | tested" thread.
        
       | ckastner wrote:
       | It's a great read, but it got posted quite often here. Some of
       | the previous posts have longer comment threads.
       | 
       | https://hn.algolia.com/?q=how+sqlite+is+tested
        
         | dang wrote:
         | Past related threads. I feel like there must have been
         | something else since 2016?
         | 
         |  _How SQLite Is Tested_ -
         | https://news.ycombinator.com/item?id=11936435 - June 2016 (57
         | comments)
         | 
         |  _How SQLite Is Tested_ -
         | https://news.ycombinator.com/item?id=9737754 - June 2015 (1
         | comment)
         | 
         |  _How SQLite Is Tested_ -
         | https://news.ycombinator.com/item?id=9095836 - Feb 2015 (17
         | comments)
         | 
         |  _How SQLite is tested_ -
         | https://news.ycombinator.com/item?id=6815321 - Nov 2013 (37
         | comments)
         | 
         |  _How SQLite is tested_ -
         | https://news.ycombinator.com/item?id=4799878 - Nov 2012 (6
         | comments)
         | 
         |  _How SQLite is tested_ -
         | https://news.ycombinator.com/item?id=4616548 - Oct 2012 (40
         | comments)
         | 
         |  _How SQLite Is Tested_ -
         | https://news.ycombinator.com/item?id=633151 - May 2009 (28
         | comments)
        
           | zzzeek wrote:
           | exactly, what's the mechanism by which this keeps getting
           | posted individually (a time limit I assume)? it almost seems
           | like marketing spam at this point.
        
             | dang wrote:
             | The mechanism is simple: people keep running across it and
             | finding it interesting. It's not spam, it's just the
             | statistical cloud at work.
             | 
             | On HN, reposts are fine when a story hasn't had a
             | significant thread in a year or so (see
             | https://news.ycombinator.com/newsfaq.html). So it's
             | certainly ok after 5 years! Though I'd swear we'd seen more
             | threads than that...
        
           | ckastner wrote:
           | > _I feel like there must have been something else since
           | 2016?_
           | 
           | Funny, that was my feeling as well.
           | 
           | I'm 99% sure to have read a discussion on this here, but all
           | submissions after 2016 (apart from today's) have no comments
           | at all, hence also no comments linking to the older
           | discussion.
           | 
           |  _Edit: Removed the remark on missing comments. The default
           | search settings do not search for comments._
        
       | Dangsucks2 wrote:
       | Unfortunately SQLite was developed by a religious extremist for
       | use on war ships. (Try to reconcile those two facts...)
        
       | Lhiw wrote:
       | SQLite is the most underrated database of all time.
       | 
       | I know it's popular, but nowhere near to the level it should be.
        
         | dnautics wrote:
         | > nowhere near to the level it should be.
         | 
         | I get the impression that it's used exactly as it should be, in
         | systems that need a database but don't need clustering or
         | failover strategies.
        
           | pletnes wrote:
           | As far as I can tell, there's far too many file formats out
           | there. I'm sure sqlite should be used more than it currently
           | is.
        
           | bob1029 wrote:
           | > but don't need clustering or failover strategies.
           | 
           | Even for those cases where you do need it, there are emerging
           | options. You can handle replication 100% in business logic
           | (something I personally enjoy), or you can use a path like
           | dqlite to replicate the physical WAL log.
        
         | jerodsanto wrote:
         | "SQLite is likely used more than all other database engines
         | combined"
         | 
         | So, like, even more popular than that?!
         | 
         | https://www.sqlite.org/mostdeployed.html
        
         | wenc wrote:
         | SQLite is a file format with a SQL engine which makes it a
         | great embedded database.
         | 
         | It is not an RDBMS in the true sense so in my opinion it is
         | correctly rated.
         | 
         | I tried using SQLite to solve problems which normally would
         | require an RDBMS and ran into so many issues like the lack of
         | enforced static types, built in date or decimal types (and fast
         | aggregations on dates), concurrent writes, etc. It's only when
         | you've gone through this process that you'll realize that
         | SQLite it not the database you think it is -- and SQLite itself
         | is up front about that: https://www.sqlite.org/aff_short.html
        
           | simonw wrote:
           | Enforced static types are now available, as of SQLite 3.37.0
           | which came out last week:
           | https://www.sqlite.org/stricttables.html
        
         | rmbyrro wrote:
         | Why so? I have the impression every developer relies on it for
         | a lot of local stuff, and many libraries also use it locally in
         | users' machines for persistence needs.
         | 
         | What do you think it " _should be_ " and is missing?
        
           | chaxor wrote:
           | I think they have this perspective because SQLite is almost
           | never mentioned when DBs are discussed. Almost everyone
           | pushes postgres, which I think is often a bit insane.
           | Postgres or (often rarely) MySQL is given as the answer to
           | most DB questions, even when coming from clearly new devs.
           | It's not often that the dev needs any of the large amount of
           | setup/permissions required for postgres/MySQL. The fact that
           | SQLite comes as part of python's standard library should be
           | used as second nature for most devs. I feel the same way -
           | that SQLite should be the default answer, and one should
           | reach towards postgres or MySQL unless you _really_ * _have*_
           | to.
        
             | cortesoft wrote:
             | What? SQLite is rarely the answer to replace Postgres or
             | MySQL... it is not designed for network access nor high
             | availability.
        
             | rmbyrro wrote:
             | SQLite is not meant to _compete_ directly with MySQL or
             | Postgres. It 's meant for local usage only, where you don't
             | need high concurrency, managing multiple connections,
             | replication, etc.
             | 
             | I mean, you could probably use SQLite in many simple cases
             | which currently use Postgres (personal servers, small
             | apps), but advocating it as default solution is inadequate
             | at best, if not misleading to people starting out in the
             | field.
        
               | Lhiw wrote:
               | This view is the view I was talking about in the op.
               | 
               | Inadequate for what? Most persistence layers I've ever
               | seen did not need MySQL, postgres or MongoDB.
               | 
               | > high concurrency, managing multiple connections,
               | replication
               | 
               | In the grand scheme of things, basically nobody needs
               | these things. Sorry, your app is overengineered.
               | 
               | Your app actually suffered in areas such as performance,
               | security, portability, complexity and cost.
               | 
               | Developers love to prematurely optimise everything. Most
               | SaaS products today don't need a dedicated database
               | server. Most apps that require a database server backend
               | would be better with sqlite, most Dev/test/staging/prod
               | pipelines would be better served with sqlite files.
        
       | yung_steezy wrote:
       | I use sqlite every day at work as a web dev. I do a lot of work
       | with file uploaders and being able to import CSV files as tables
       | I can query is a killer feature.
        
       ___________________________________________________________________
       (page generated 2021-12-06 23:02 UTC)