[HN Gopher] Why Is SQLite Coded in C (2017)
       ___________________________________________________________________
        
       Why Is SQLite Coded in C (2017)
        
       Author : piyushsthr
       Score  : 99 points
       Date   : 2021-08-23 17:04 UTC (5 hours ago)
        
 (HTM) web link (www.sqlite.org)
 (TXT) w3m dump (www.sqlite.org)
        
       | mikece wrote:
       | While I know it's a joke it's still funny on so many levels
       | because it _could_ be true: an interview with Stroustrup claiming
       | that he invented C++ to be purposely hard to preserve the
       | mystique that programming was hard and keep programming salaries
       | up:
       | 
       | https://webhome.phy.duke.edu/~rgb/Beowulf/c++_interview/c++_...
        
         | pjmlp wrote:
         | The real reason why Bjarne invented C++ was not having to deal
         | with BCPL or C.
         | 
         | After his experience getting his thesis ready in Simula, only
         | to rewrite it in BCPL, he swore himself not to put himself ever
         | again through similar pain.
         | 
         | So after working for a while at AT&T, C with Classes was his
         | way to avoid dealing with raw C, while having some of that
         | Simula productivity.
        
       | ChrisArchitect wrote:
       | (2017)
       | 
       | Previous discussion:
       | https://news.ycombinator.com/item?id=16585120
        
       | butterisgood wrote:
       | "Go hates assert" explain?
       | 
       | What is panic but a reaction to an assertion failure?
        
         | steveklabnik wrote:
         | https://golang.org/doc/faq#assertions
        
       | xiphias2 wrote:
       | ,,Rust needs a mechanism to recover gracefully from OOM errors.''
       | 
       | It seems that Linux needs the same requirements from Rust as
       | SQLite to compete with C as a systems level language. It's going
       | in the right direction.
        
       | faustocarva wrote:
       | "old and boring" made my day more fun. Thanks!
        
       | juice_bus wrote:
       | How is Rust in regards to the SQLite's requirements now?
        
         | tptacek wrote:
         | Pretty much the same. Arch support in particular would be a
         | clear dealbreaker, I think.
        
           | geofft wrote:
           | Won't rustc_codegen_gcc effectively solve arch support? It's
           | not done and shipped, but it exists and is usable and is
           | landing into rustc, which is a fair amount of progress.
           | 
           | (Or is SQLite portable to architectures that GCC does not
           | support?)
        
             | marcosdumay wrote:
             | > architectures that GCC does not support
             | 
             | Do those exist?
             | 
             | (Yeah, there are some microcontrllers that can only be
             | programmed by the manufacturer's C compiler, that doesn't
             | even support the entire language and is full of bugs. But I
             | would be very surprised if SQLite run on a PIC.)
        
               | masklinn wrote:
               | > Do those exist?
               | 
               | IIRC a possible issue is the use of forks of gcc (or
               | something else), I think it used to be common for console
               | toolchains though maybe less so these days.
        
             | tptacek wrote:
             | Yes, I imagine it will.
        
           | tyingq wrote:
           | For some more detail, sqlite has makefiles for things like
           | Windows CE and VxWorks, and the generic configure process
           | builds on almost anything else sufficiently POSIXY, like QNX.
        
             | estebank wrote:
             | When I see projects like this, with such impressive range
             | of supported platforms, I am always a bit weary at _how
             | well tested_ those platforms effectively are. I know that
             | some bugs have been caught in OpenBSD due to some of the
             | more esoteric platforms making evident some incorrect
             | assumptions, but I also remember Debian boasting a huge
             | amount of packages for, let 's say, ARM that would build,
             | but any attempt to use would show they had never been tried
             | out and were wholly unsupported.
        
               | tyingq wrote:
               | True, though sqlite is pretty universally lauded for
               | their approach to testing. Perhaps not all platforms are
               | tested by the Sqlite team, but if you run their tests on
               | your platform, the coverage is pretty good.
        
             | steveklabnik wrote:
             | rustc has seven supported vxworks targets, incidentally.
        
         | masklinn wrote:
         | > Rust needs to mature a little more, stop changing so fast,
         | and move further toward being old and boring.
         | 
         | Didn't make any sense at any point post 1.0. Rust 1.0 code
         | still works today modulo BC breaks required to fix
         | unsoundnesses.
         | 
         | > Rust needs to demonstrate that it can be used to create
         | general-purpose libraries that are callable from all other
         | programming languages.
         | 
         | Also didn't make much sense at any point as the story there was
         | always straightforward. But efforts like librsvg have pretty
         | much demonstrated that (interestingly the libsrvg conversion
         | effort started around the time that page was created, circa
         | 2017).
         | 
         | > Rust needs to demonstrate that it can produce object code
         | that works on obscure embedded devices, including devices that
         | lack an operating system.
         | 
         | Rust is used in a bunch of embedded contexts. Whether Rust can
         | produce object code that works on _your_ embedded device is a
         | more debatable question, depends on the existence (and quality)
         | of the proper llvm backend.
         | 
         | I think there's also a gcc frontend in the works, but I expect
         | it's essentially nowhere yet as it was only just started (few
         | months old I think?). Though I believe it has financial support
         | and a fair amount of manpower. I believe there's also an even
         | more recent effort for a gcc backend in rustc.
         | 
         | So yeah this one I'd say there's limited progress yet but
         | things seem to be moving in the right direction _and picking
         | up_.
         | 
         | > Rust needs to pick up the necessary tooling that enables one
         | to do 100% branch coverage testing of the compiled binaries.
         | 
         | Unclear what the issue is there so no idea.
         | 
         | > Rust needs a mechanism to recover gracefully from OOM errors.
         | 
         | That was always possible by working no_std, though of course
         | required reimplementing your own abstractions.
         | 
         | With the linux kernel integration effort, a lot more work is
         | going into "fallible allocation" APIs, and thus the ability to
         | gracefully recover from allocation failures.
         | 
         | > Rust needs to demonstrate that it can do the kinds of work
         | that C does in SQLite without a significant speed penalty.
         | 
         | -\\_(tsu)_/-
        
           | tick_tock_tick wrote:
           | > depends on the existence (and quality) of the proper llvm
           | backend
           | 
           | LLVM's support is really quite lacking that this issue alone
           | is enough of a justification for a project like SQLite to be
           | written in C. The rust-gcc effort you mentioned will
           | hopefully solve this.
           | 
           | Quite a few of the others really are they just don't want to
           | be pioneers which is really quite fair.
        
           | [deleted]
        
           | tptacek wrote:
           | The branch coverage thing is a weird, artificial-seeming
           | requirement that all the branches in the compiled code ---
           | not the code as written, but the code ultimately produced by
           | the compiler --- be testable. In other words: if the compiler
           | generates a bounds check _anywhere_ , it should be possible
           | to test what happens when that specific bounds check fails.
           | The problem is that sane Rust code doesn't give you all the
           | tools you'd need to deliberately trip all the checks the
           | compiler generates, because that is part of the point of
           | being a safe language.
        
             | infogulch wrote:
             | > The branch coverage thing is a weird, artificial-seeming
             | requirement
             | 
             | Yes that's because it's a requirement designed by a
             | standards body. I found a paper "Is 100% Test Coverage a
             | Reasonable Requirement? Lessons Learned from a Space
             | Software Project" (2017) that mentions that 100% branch
             | coverage is a requirement in European Cooperation for Space
             | Standardization (ECSS) for Class A software (where failure
             | could result in loss of life, etc). The paper concluded:
             | 
             | > Our findings include that there seems to be a break-even
             | point between 80% and 95%, and everything beyond this
             | points is increasingly costly and could introduce new
             | project risks--which confirms findings reported so far in
             | literature (Section 5). However, the interview revealed
             | that, still, 100% coverage can be a reasonable quality
             | requirement; even though a 100% requirement is not a good
             | indicator for the software quality as such.
             | 
             | https://www.researchgate.net/publication/319141355_Is_100_T
             | e...
             | 
             | It doesn't follow that _because_ rust is a safe language
             | that it _cannot_ expose test harnesses that would enable
             | 100% branch coverage. Personally I 'm ambivalent on this,
             | maybe it's not useful, but it doesn't seem bad either. But
             | your reaction to this requirement seems weird, like the
             | "fox and the grapes" fable... You can't get 100%-branch
             | coverage, so you give up and claim that 100% branch
             | coverage is dumb, why would anyone want that anyway? Do you
             | really think that 100% branch coverage testing should be
             | unavailable to rust programmers if that's what they need
             | (for whatever reason, including meeting some admittedly
             | arbitrary standard)?
        
               | tptacek wrote:
               | A couple of people have brought this up here, and it's an
               | argument that makes sense. I'll just note that the sqlite
               | page, which is what I'm critiquing, isn't written this
               | way; the project doesn't say "we use C because ECSS
               | requires us to build software in an 100%-branch-coverage
               | language", but rather speaks to the important benefit of
               | literal 100% branch coverage. It's that important benefit
               | I question, not the logistics problems they face, which I
               | concede.
        
               | infogulch wrote:
               | That might just be me twice, I added a new reference this
               | time at least ^_^ I agree that 100% branch coverage as a
               | goal in and of itself is generally dubious.
               | 
               | I seem to recall an interview where the SQLite creator,
               | Richard Hipp, described the reasons behind the branch
               | coverage testing in a bit more detail and mentioned that
               | it was a requirement from one of their customers, which
               | is where I got that idea. Sorry I don't have a specific
               | reference.
        
             | masklinn wrote:
             | Right so really more of an expressivity issue: the compiler
             | is not smart enough to remove _all_ branches which can not
             | happen in a given program, so some of those branches will
             | be completely untestable despite being in the final object
             | code e.g. have a vec![_;4], structurally use it such that
             | the index can only be in-bounds, the compiler may not be
             | able to elide the OOB checks because it might not
             | understand they 're unnecessary for real-world code.
        
               | tptacek wrote:
               | Frankly, I think it's a pretty silly concern.
        
               | cormacrelf wrote:
               | It's substantially less silly than a small-time
               | JavaScript component library adding 100% branch coverage
               | testing requirements as a blocker to accepting a PR. But
               | they do it for the same reason, to be able to advertise
               | it and demonstrate reliability. This is how the sqlite
               | project makes money. I guess someone's got to build the
               | instrumentation tools that let them keep doing this,
               | sounds like it won't be them. (Edit: not sure who else
               | would have the motivation, to be honest. If they had to
               | pioneer one thing, it should be that.)
        
         | opheliate wrote:
         | With the discussion of getting Rust into the Linux kernel, I
         | think there's more interest in graceful recovery from OOM
         | errors. The new Allocator API will (maybe?) help this.
        
           | masklinn wrote:
           | > The new Allocator API will (maybe?) help this.
           | 
           | The Allocator API is about the ability to mix allocators and
           | provide "precise" (per-object) allocation strategies. That's
           | orthogonal to fallible allocation, which is mostly about
           | adding fallible versions of possibly-allocating APIs, and
           | being able to statically remove access to the non-failing one
           | (and being able to implicitly reject any dependency relying
           | on those APIs) (and / or providing alternate implementations
           | which expose a fallible API which is what
           | fallible_collections does, but the stdlib seems to have gone
           | with adding fallible APIs and probably adding a compiler
           | feature / flag to be able to disable the non-failig ones)
        
         | petters wrote:
         | There is a C backend for LLVM. If/when that works, that should
         | solve most of the portability issues.
        
           | masklinn wrote:
           | It's been dead for years, though the julia folks are
           | apparently trying to resurrect it.
        
       | zelphirkalt wrote:
       | > 2. Safe programming languages solve the easy problems: memory
       | leaks, use-after-free errors, array overruns, etc. Safe languages
       | provide no help beyond ordinary C code in solving the rather more
       | difficult problem of computing a correct answer to an SQL
       | statement.
       | 
       | Uhm ... If those were project wide "easy problems", then how come
       | vulnerabilities in a project like Chromium are 70% caused by
       | these "easy problems"?
       | 
       | I'd say with that category of bugs on board, you cannot mistrust
       | yourself enough and that there is nothing easy about preventing
       | these bugs all the time over the life-time of a project.
        
         | [deleted]
        
         | nmstoker wrote:
         | My interpretation is that they mean "easy" within the context
         | of the problems raised within that question. See the final
         | sentence of your quote.
        
       | Scarbutt wrote:
       | _Safe programming languages solve the easy problems: memory
       | leaks, use-after-free errors, array overruns, etc._
       | 
       | At least with visual studio I agree these are trivial to check
       | for and solve.
        
         | junon wrote:
         | They're even easier to solve on Linux with appropriate tools
         | (e.g Valgrind).
        
       | tptacek wrote:
       | Less than a year after this was published, Tencent released the
       | Magellan series of sqlite RCEs.
       | 
       | I think this is a fine page and it is eminently reasonable that
       | sqlite remains a C codebase. In particular, I think he's right
       | that rewriting sqlite in a memory-safe language would introduce a
       | bunch of bugs and likely result in a couple of years of
       | instability.
       | 
       | But the "security" paragraphs in this page do the rest of the
       | argument a disservice. The fact is, C is a demonstrable security
       | liability for sqlite. The real position of the project is that
       | memory safety security vulnerabilities are an acceptable tradeoff
       | for an otherwise reliable database engine; in practice, people
       | will deal with the exposure either by treating it as an
       | externality (ie: baking sqlite into products where it is directly
       | exposed as part of attack surface, and then throwing up their
       | hands and issuing patches when RCEs are discovered) or by
       | carefully positioning sqlite so it isn't a meaningful part of the
       | attack surface.
       | 
       | Both of these approaches are suboptimal --- that's why we call
       | them "tradeoffs" --- and it is the case that if you held
       | everything else equal (and you can't, but bear with me), sqlite
       | would be a better piece of software written in, I guess, Rust;
       | memory corruption wouldn't be one of the problems you need to
       | consider (or blow off uncomfortably).
       | 
       | Again: the argument as a whole, and this page --- fine! I use and
       | like sqlite.
        
         | eloff wrote:
         | Presumably people would want to rewrite sqlite into Rust. But
         | it's still a database, a low level, high performance software
         | system. Even if you write it in rust, some percentage of the
         | code will be unsafe rust or rust that calls a library written
         | in C. It will be safer, but still not a panacea. It would be
         | more viable in my opinion to improve the safety of the C code
         | that currently makes up sqlite.
        
           | tptacek wrote:
           | I do not agree with this at all, for what it's worth.
        
             | eloff wrote:
             | Can you clarify why?
        
               | masklinn wrote:
               | The amount of code which would have to be `unsafe` in a
               | library like sqlite would be absolutely minuscule if it
               | would exist at all, and much easier to check for than
               | _literally all of the codebase_?
        
               | tptacek wrote:
               | I don't think we understand enough about computer science
               | or computer engineering (or something in the middle of
               | those two things) to deliver large-scale C projects with
               | rich, flexible interfaces safely. The last 20 years has
               | just been a sequence of events where we've been surprised
               | by new oversights, from buffer overflows to integer
               | mishandling to uninitialized variables to UB. These
               | problems compound; they're never solved, but rather
               | beaten into development teams (at least the conscientious
               | ones) and so even the decades-old problems recur, because
               | it's not enough to know about the general pattern of a
               | problem, you also have to viscerally understand all the
               | combinatorics of those problems _and all the new code
               | that you write_ , all of which has the potential to
               | create some new scenario that allows a well-known
               | vulnerability to re-emerge. And even if you manage to
               | clean the Augean Stables this way, you're still S.O.L.
               | when the next new memory corruption bug class is
               | discovered. A mug's game, a bad bet.
               | 
               | You can use formal methods to sidestep this! But my
               | argument would be that at that point, you're really
               | writing C In Name Only. By all means, if the aesthetics
               | of Rust are that painful to you (I get it!), write C
               | against a formal verifier. :)
        
             | wk_end wrote:
             | That's not a useful or constructive comment. At best it is,
             | indeed, worthless; at worst, you're using your stature on
             | HN to bully another commenter by implying that their
             | opinion can be dismissed out of hand, because you, tptacek,
             | say so.
             | 
             | I don't really agree with it either - it overstates the
             | issues with occasionally using escape hatches in Rust by
             | implying that they're at all comparable with the problems
             | inherent with using C. But yours was still an unnecessary
             | reply.
        
           | vlovich123 wrote:
           | Let's be generous and assume that 5% of the overall code
           | remains unsafe. That's 95% of the code that doesn't need to
           | be checked extra. Additionally, that 5% is likely to remain
           | static. With C any "securing" effort is a snapshot effort
           | that bitrots quickly as more code is added.
        
             | tptacek wrote:
             | sqlite is renowned as a project that tests meticulously;
             | they claim 100% branch test coverage, to the point where a
             | reason they reject current Rust is that they can't achieve
             | similar coverage against the branches the compiler itself
             | inserts as safety checks (that is: they can't use Rust
             | because they can't test safety checks that don't even exist
             | in the language they use today).
             | 
             | And the track record of that approach is, well, right there
             | for you to see.
             | 
             | The idea that what's needed for sqlite's C to be safe is
             | more of the approach sqlite already uses seems pre-
             | falsified, doesn't it?
        
               | vlovich123 wrote:
               | > sqlite is renowned as a project that tests
               | meticulously; they claim 100% branch test coverage
               | 
               | You realize that there are lots of test coverage metrics
               | and none of them tell you anything about the correctness
               | or safety of the code or about how thoroughly the edge
               | cases have been tested, right? 100% branch coverage is
               | neat and may indicate that the testing is thorough, or it
               | might just indicate the project is chasing it as a
               | metric. I'll bet on compile time lifetime and ownership
               | analysis guaranteed by the language every time over
               | metrics of how good the test suite is (a test suite is
               | important but classes of bugs are just impossible in Rust
               | and don't need testing/coverage in the first place).
               | 
               | > that is: they can't use Rust because they can't test
               | safety checks that don't even exist in the language they
               | use today
               | 
               | Can you back up this claim? This feels like a very FUD
               | statement as any high level language (including C) has a
               | risk that the compiler inserts branches that aren't
               | present in the code. Regardless, AFAIK, coverage
               | instrumentation happens at a level below where the
               | distinction between C and Rust matters, so any coverage
               | should be the same. I buy the argument that replicating
               | the test suite may be time consuming bit, but the claim
               | that there's something inherent about Rust preventing
               | branch coverage feels extraordinary to me. Also, I'm not
               | even sure where the extra branches are being inserted.
               | Are you referring to drop statements the compiler injects
               | for lifetimes?
               | 
               | > And the track record of that approach is, well, right
               | there for you to see.
               | 
               | Constant CVEs and a fundamental inability to handle
               | maliciously crafted files? I'm not shitting on the SQLite
               | team. The project is amazing and a marvel. I'm just
               | saying that its C heritage has some inescapable
               | realities. Also remember that studies tend to show that
               | the bug rate is pretty constant across languages in terms
               | of LOC. This you want the language and stdlib doing as
               | much as possible for you if you're prioritizing
               | correctness and security.
        
               | tptacek wrote:
               | I'm simply citing the article we're commenting on.
        
               | ansible wrote:
               | > _sqlite is renowned as a project that tests
               | meticulously..._
               | 
               | Any language that can produce a C library interface
               | identical to the existing one for sqlite3 would be a good
               | candidate for a reimplementation.
               | 
               | If you show me a Rust library that passes all the sqlite3
               | tests flawlessly on my platform (typically x86-64) then
               | I'd include that in my project, and sleep soundly at
               | night afterwards. There _might_ be problems, but the
               | chances are very low.
               | 
               | Most other libraries don't nave nearly as good coverage,
               | and it is much riskier to switch over an implementation.
        
               | infogulch wrote:
               | I think 100% machine code branch test coverage is
               | _legally required_ in some of the environments that
               | SQLite targets. That is, there must be a test that
               | exercises both sides of every machine code branch
               | instruction emitted in the final binary. Basically, every
               | branch must have a justification for why it exists. I
               | feel like that 's not such a bad target for rust to aim
               | for.
        
             | eloff wrote:
             | It is better, definitely safer. But it's not perfect. I
             | write rust on a daily basis. I've written a lot of C++ in
             | the past. It's safer, but I have memory safety issues in
             | both. It's more a matter of degree.
             | 
             | For a large existing code base that are well written I
             | think it's easier to improve the safety of the existing
             | code than rewrite it completely in a safer language.
        
               | tptacek wrote:
               | People talk about having memory safety issues in Rust and
               | Go, and my general reaction is that these claims tend to
               | be pretty artificial (for instance: they've managed to
               | introduce concurrency bugs that abort their program). If
               | you've got a war story about a security-relevant memory
               | corruption vulnerability you managed to introduce into a
               | Rust codebase (that wasn't in straight-up `unsafe` code),
               | I'd be interested in hearing more about it.
               | 
               | My position right now, before hearing that war story, is
               | that Rust vs. C is more than a difference of degree. It's
               | a difference of degree in _security writ large_ , to be
               | sure. But for memory corruption? I'd say the distinction
               | is close to categorical.
        
               | pjmlp wrote:
               | It was already like that in the Pascal/Modula/Ada vs C
               | days.
               | 
               | Just because it isn't 100% bullet prof, just 95% better,
               | the C team always advocates it isn't worth the effort.
        
               | eloff wrote:
               | > If you've got a war story about a security-relevant
               | memory corruption vulnerability you managed to introduce
               | into a Rust codebase (that wasn't in straight-up `unsafe`
               | code), I'd be interested in hearing more about it.
               | 
               | No, they're in unsafe code. Except Go where you don't get
               | memory safety with some kinds of race conditions. I'm
               | debugging a segfault in Rust today as it turns out. I
               | guarantee it's in unsafe code. But I have some unsafe
               | code. A database would also definitely have quite a bit
               | of unsafe code.
               | 
               | It's a lot better than the situation in C. But rewrites
               | always introduce bugs. And sqlite is so well written and
               | tested, it's not a low quality code base. I don't think a
               | rewrite makes sense here, it'd be better to improve the
               | existing code to make it safer. That's my opinion anyway.
        
           | CraigJPerry wrote:
           | >> some percentage of the code will be unsafe rust
           | 
           | What would be the significance of that? Unsafe blocks in rust
           | must still satisfy the borrow checker and all the compiler's
           | safety rules. The only things unsafe gives you are:
           | Dereference a raw pointer         Call an unsafe function or
           | method         Access or modify a mutable static variable
           | Implement an unsafe trait         Access fields of unions
           | 
           | From https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html
           | 
           | The use of unsafe blocks does not automatically mean unsafe
           | code?
        
             | eloff wrote:
             | Those guarantees are checked by the programmer, not the
             | compiler. Mistakes and bugs can happen, just like in C.
        
         | ksec wrote:
         | >But the "security" paragraphs in this page do the rest of the
         | argument a disservice.
         | 
         | I mean the whole security and safe language paragraphs were
         | _added_ after people literally start either harassing sqlite or
         | claiming SQLite is unsafe.
         | 
         | The paragraphs were an extremely polite way of saying No. We
         | dont want to do switch to another language.
        
         | cormacrelf wrote:
         | > in practice, people will deal with the exposure either by
         | treating it as an externality ... or by carefully positioning
         | sqlite so it isn't a meaningful part of the attack surface.
         | 
         | I think you're missing one of the ways. The one where people
         | make no deliberate attempt to engage with the risk at all. No
         | big throwing up of the hands. Sqlite is deeply embedded
         | software, it takes years for a whole generation of smart TVs
         | and security cameras and cars to find themselves in landfill
         | and put a decade of vulnerabilities to bed.
        
         | nine_k wrote:
         | I suppose that a version of SQLite written in a safe(r)
         | language will eventually appear, and hopefully become popular.
         | 
         | But it will take a long time to mature. SQLite the project
         | cannot _switch_ languages, sadly. The only way to migrate is
         | that another project would grow beside in the meantime, mature,
         | and become a viable and compatible alternative.
        
           | tptacek wrote:
           | I don't like C (correction: I _love_ writing C; I don 't like
           | running code written in C), but I'd probably stick with
           | sqlite for several years after the introduction of a
           | competing Rust sqlike. But I also wouldn't expose sqlite
           | directly to untrusted users (for instance, in the hellworld
           | where I'm a designer on a major browser, I wouldn't make
           | sqlite part of the Javascript interface of that browser).
        
             | nine_k wrote:
             | I see; I share much of the sentiment.
             | 
             | But, tangentially, if not SQLite, what would you expose as
             | a DB interface in a browser? SQL, while large and hairy, is
             | both powerful and logical (in the relational part, not
             | syntax). A plain KV store like dbm is no match to it. A
             | Redis-like store is better but still more limited. Kdb's
             | approach is sort of wonderful but geared towards time
             | series and not much general-purpose. Is there an existing
             | interface / language you would reuse to give the browser a
             | _rich_ database interface?
        
           | masklinn wrote:
           | > SQLite the project cannot switch languages, sadly.
           | 
           | I'm sensible to the issues of tradeoffs (e.g. platform
           | support which is a completely fair issue) and manpower
           | requirements and whatnot, but I don't see why it _can not_
           | switch language. Converting libraries  "inside out" _has been
           | done_. Adding Rust support inside sqlite 's build system and
           | migrating modules is technically feasible (again, not opining
           | on whether it would be _worth it_ ).
        
             | nine_k wrote:
             | It cannot "switch" languages in a way like "we stop
             | development in C and switch to development in XYZ only;
             | C-based code will only get security updates".
             | 
             | If can switch the officially endorsed "primary"
             | implementation, but only after an alternative
             | implementation has been around for a long time and was
             | battle-tested, all the while the original C implementation
             | continued to exist and develop.
        
               | masklinn wrote:
               | > It cannot "switch" languages in a way like "we stop
               | development in C and switch to development in XYZ only;
               | C-based code will only get security updates".
               | 
               | At a purely technical level (ignoring issues of platform
               | support and all) it pretty much can do that, actually:
               | integrate XYZ into the build system, build new features
               | in XYZ, start converting old features to XYZ, end up with
               | an sqlite in XYZ.
        
             | Jtsummers wrote:
             | Adding Rust support would reduce (presently) the number of
             | architectures and platforms that SQLite can target. That
             | would greatly reduce its utility for a _lot_ of customers.
             | Once Rust supports the same variety of architectures that C
             | presently supports this will become a non-issue, but that
             | 's unlikely to happen in the near term.
        
               | masklinn wrote:
               | > Adding Rust support would reduce (presently) the number
               | of architectures and platforms that SQLite can target.
               | 
               | I mentioned that but i don't think that's the subject as
               | it does not warrant such a drastic assertion that sqlite
               | _can not_ switch language.
        
               | Jtsummers wrote:
               | Ok, yes. SQLite _can_ switch languages _if_ the objective
               | is to cutoff a large number of customers. Until Rust is
               | well-supported on the variety of architectures that C is
               | supported on, then that will be the result.
        
               | justin66 wrote:
               | > Until Rust is well-supported on the variety of
               | architectures that C is supported on
               | 
               | There's no particular reason to think that this will ever
               | happen. SQLite runs on a lot of crazy stuff that Rust
               | won't want to bother with.
        
           | CraigJPerry wrote:
           | I think https://gitlab.com/cznic/sqlite has become pretty
           | popular. It's a pure go implementation
        
             | masklinn wrote:
             | It is popular not because it's a good idea but because go:
             | using cgo is a huge imposition, so recoding everything in
             | go so it can be used from go is basically the norm.
             | 
             | That is useless to sqlite itself, this exists only because
             | of go's issues and is essentially unusable outside of the
             | go ecosystem.
        
               | pjmlp wrote:
               | Dislike for cgo, is similar to the old JNI displeasure,
               | nowadays plenty of Java libraries don't have any issues
               | making use of JNI.
               | 
               | The only issue I see with cgo is that they decided to
               | follow such path instead of a proper FFI declaration
               | support like most languages.
               | 
               | Even Java is now having such capability thanks Project
               | Panama.
        
               | CraigJPerry wrote:
               | I've noticed cgo dependencies massively slow down
               | compilation.
               | 
               | E.g. the hugo project built with and without extended
               | features. Build without and it's all go, it compiles in
               | the blink of an eye. The tooling in go, from a devops
               | point of view, is surprisingly good. It's not just
               | compile speed.
               | 
               | Build with all the c deps and well you're compiling more
               | code so of course it's going to be slower but it's
               | disproportionately so.
        
           | dathinab wrote:
           | I don't think a rust version of sqlite would make sense,
           | rewriting a very mature and well tested library rarely makes
           | sense.
           | 
           | Only if either:
           | 
           | - there are anyway some _major_ changes comming
           | 
           | - or major problems/improvements in maintainability and
           | better external dev commitment/support
           | 
           | could rewriting it make sense IMHO.
           | 
           | Lets be honest many (most?) of the "recent" (~3years)
           | security bugs sqlite had would most likely not have been
           | prevented by using rust, go or similar.
           | 
           | EDIT:
           | 
           | I guess the biggest drawback of C is that it keeps
           | contributors away, but I also might keep some of the
           | contributors projects like sqlite might not want to have
           | away. So depending on the maintainer it might be seen as a
           | benefit not a drawback.
        
             | PaulDavisThe1st wrote:
             | >I guess the biggest drawback of C is that it keeps
             | contributors away
             | 
             | There's this idea floating around in some circles that if
             | you could just adopt technology X, more people would
             | contribute to a project.
             | 
             | I myself was guilty of this ... believing that if we added
             | a web frontend to the DAW I've worked on for 21 years, all
             | those JS/webtech developers would show up.
             | 
             | There's no platform or language you could choose for SQLite
             | that would increase the number of contributors. C is
             | already a wildly popular programming language, with far,
             | far more practicing users than Rust. Rust may be The Cool
             | New Thing at present, and in time it may possibly grow to
             | something much bigger, bigger even than the C/C++ universe.
             | But that's not true right now, and it also wouldn't be true
             | even if SQLite was implemented using some impossibly
             | performant JS or its cousin.
        
             | tptacek wrote:
             | The problem with the first sentence of this comment is that
             | it also justifies keeping stuff like ImageMagick around.
        
             | woodruffw wrote:
             | > Lets be honest many (most?) of the "recent" (~3years)
             | security bugs sqlite had would most likely not have been
             | prevented by using rust, go or similar.
             | 
             | Using SQLite's CVE list from 2020[1], we see 12
             | vulnerabilities:
             | 
             | * Two NULL pointer dereferences; it's impossible to produce
             | a NULL reference in safe Rust.
             | 
             | * Two integer overflows; Rust makes these harder, but not
             | impossible.
             | 
             | * Three UAFs; these are impossible in safe Rust.
             | 
             | * One uncategorized segfault; there are impossible in safe
             | Rust barring environmental constraints (like a stack
             | overflow from unchecked recursion).
             | 
             | * One segmentation fault from incorrect object
             | initialization; this is impossible in safe Rust.
             | 
             | * 3 SQL and table-level bugs; Rust is unlikely to have
             | helped with these.
             | 
             | By my count, that's 6/12 bugs are would be impossible in
             | plain old safe Rust, and another 3/12 that would _likely_
             | be prevented by normal best practices in Rust. I still don
             | 't think this _necessarily_ means that we should drop
             | everything and rewrite SQLite in Rust, but the raw numbers
             | don 't back up the claim that doing so _wouldn 't_
             | eliminate the actual security bugs that SQLite is seeing.
             | 
             | [1]: https://www.cvedetails.com/vulnerability-
             | list/vendor_id-9237...
        
               | PaulDavisThe1st wrote:
               | The 3/12 that safe Rust would have prevented would likely
               | have been prevented by enforcing the use of current tools
               | for C. Since neither Rust safety or lint-ish safety are
               | enforced, I think this is a reasonable comparison.
        
               | [deleted]
        
               | woodruffw wrote:
               | Show me a project that claims to consistently use static
               | analysis tools for C and _doesn't_ ignore them, and I'll
               | show you a liar!
               | 
               | But more seriously: Rust's toolchain comes with linting
               | built in, and the community as a whole is _much_ better
               | about applying and _responding_ to static analysis
               | results than the C ecosystem is. And that's even before
               | we get to false positives, which (subjectively) C static
               | analysis tools seem to spit out a great deal more often.
               | 
               | And I say all of that as someone who's currently doing
               | whole-program static analysis of C and C++! It's not that
               | you _can't_ do it, it's degrees of ease and a culture of
               | stringency that's lacking.
        
               | dathinab wrote:
               | Impossible in safe rust.
               | 
               | But would a sqlite port limit itself to safe rust?
               | 
               | (As a side note wrt integer overflows, rust only makes it
               | easier to detect them during tests and provides neat
               | methods for integer overflow aware code, but just that.)
               | 
               | Anyway I'm surprised that there where more "preventable"
               | bugs then I expected.
        
               | woodruffw wrote:
               | > But would a sqlite port limit itself to safe rust?
               | 
               | I suppose they wouldn't have to, but why would they
               | bother with unsafe? They proudly announce how few system
               | APIs and syscalls they depend on, so they have no need
               | for that (assuming they chose to not use any number of
               | safe wrappers). Complicated self-referential data
               | structures, perhaps, but that again is the kind of thing
               | that could be exhaustively tested and tucked into a safe
               | interface.
               | 
               | And yes, you're absolutely right about integers: Rust
               | _itself_ is not going to save you in release builds. But
               | it _does_ avoid a major source of overflows in C
               | (implicit conversions and promotions), and has explicit,
               | fallible APIs that are easy to enforce as a lint.
        
       | geofft wrote:
       | > _Safe programming languages solve the easy problems: memory
       | leaks, use-after-free errors, array overruns, etc. Safe languages
       | provide no help beyond ordinary C code in solving the rather more
       | difficult problem of computing a correct answer to an SQL
       | statement._
       | 
       | I see where the author is coming from, but I don't think this is
       | quite true. The way that safe programming languages work is that
       | they have a richer type system that knows about the semantic
       | context of variables, which in turn is a tool that helps a lot
       | with the "more difficult problems".
       | 
       | For instance, one of the tools Rust uses for enforcing memory
       | safety (data races and use-after-free, in particular) is that
       | there's a distinction between "mutable" and "constant"
       | references. But this is, really, a distinction between unique and
       | shared references. If I am statically guaranteed the only holder
       | of a reference to X, I can modify it; if some other part of the
       | code might have a reference to X, I cannot.
       | 
       | This is essentially a readers/writer lock enforced at compiled
       | time, and it therefore is a pattern that makes it much easier to
       | use actual readers/writer locks: the lock-for-read function gets
       | you a shared reference and the lock-for-write function gets you
       | an immutable one. And Rust makes it easy to say, you cannot
       | unlock the lock (in either variant) until you return the
       | reference, and you cannot accidentally leak the reference out of
       | the scope of the lock.
       | 
       | If you're using a readers-writer lock on, say, the schema of a
       | table (many simultaneous readers can use a table, but only one
       | task can alter the table and nothing else can touch the table
       | while it's being altered), having the tools to meaningfully
       | distinguish the cases and enforce that your mutable references
       | don't get copied does actually make it easier to compute the
       | correct answer to a SQL statement that's running at roughly the
       | same time as an ALTER TABLE.
       | 
       | Another of the tools Rust uses is tagged unions: a C construct
       | like union U {char _x; int y;} would be memory-unsafe, so Rust is
       | obligated to forbid it in safe code. Instead, you get an enum
       | type that C would describe something like struct U {int tag;
       | union {char_ x; int y;};}. If tag == 0, then you're on the first
       | variant; if tag == 1, then you're on the second variant, and the
       | compiler ensures that (in safe code) tag is never equal to
       | anything else, never uninitialized, etc. And there are a bunch of
       | language constructs to assist with this - for instance, the
       | 'match' keyword lets you write cases for each variant, allowing
       | access to x only if tag == 0 and y only if tag == 1. You're not
       | allowed to access x or y directly at all outside of a match
       | keyword or equivalent (like 'if let'), because that would defeat
       | memory safety.
       | 
       | But because you've got good syntax and compiler-checked support
       | for handling tagged unions, you may as well use it for problems
       | even if you don't care about memory safety / security. Take this
       | union from the SQLite source code, for instance:
       | https://www.sqlite.org/cgi/src/file?ci=trunk&name=src/sqlite...
       | 
       | There are three types of tables (eTabType), normal, virtual, or
       | view. There are three union variants with different data. Even
       | ignoring the fact that some of the variables are pointers and
       | others are non-pointers, the data _doesn 't make semantic sense_
       | when interpreted as the wrong variant. If you have code that
       | handles just a normal table, and you extend it to handle virtual
       | tables or views too, you will need to make sure you're not
       | unconditionally accessing addColOffset, pFKey, or pDfltList,
       | because the information you get will be wrong. Rust's enforcement
       | of memory safety means it also prevents you from making this
       | logical mistake.
       | 
       | I think we've been selling memory-safe languages as a tool for
       | security, and I don't mean to detract from that argument at all -
       | but there's also the fact that Rust and Go (and D and Vala and
       | even modern C++) are newer languages that have been able to
       | implement more things than C can, which in turn makes it easier
       | to write correct programs in general.
        
         | pjmlp wrote:
         | We already had languages like that 10 years before C was
         | invented.
         | 
         | https://en.m.wikipedia.org/wiki/Burroughs_large_systems
         | 
         | https://en.m.wikipedia.org/wiki/JOVIAL
         | 
         | There are plenty of other examples when one looks for what was
         | happening outside AT&T.
         | 
         | C apologists like to pretend C is some kind of special gift to
         | mankind in systems programming languages, and nothing was
         | happening before C and UNIX came into scene.
        
       | greenyoda wrote:
       | Note: Article is from 2017.
       | 
       | Original HN discussion (from 2018) for those who are interested:
       | https://news.ycombinator.com/item?id=16585120
        
         | lukeschlather wrote:
         | Worth noting it looks like the section on Rust was added in
         | reply to some of the offhand suggestions of rust in that
         | thread.
         | 
         | https://web.archive.org/web/20180317194408/https://sqlite.or...
        
       | wudangmonk wrote:
       | Its a well know cliche at this point for pretty much every
       | program to be rewritten in the "safe" language of Rust. But it
       | makes sense for C++ be in there too, before it was the Rust
       | people wanting to rewrite everything, it was the C++ people
       | wanting to rewrite everything.
        
         | pjmlp wrote:
         | It has been like that since the 80's as UNIX was gaining market
         | share, but as Rust is gaining ground on that effort, it is easy
         | to shit on the effort.
         | 
         | Yes, even with its C underpinnings, C++ is better than plain
         | old C, provided the safer types for arrays, strings and RAII
         | resource management are used.
         | 
         | Hoare was already complaining about C on his 1980 Turing Award
         | speech.
        
       | dathinab wrote:
       | Wrt. what rust needs:
       | 
       | A) I don't think this is needed you could just pin a older rust
       | version. But if it's needed we are not quite there yet.
       | 
       | B) I think that point has been meet.
       | 
       | c) Depending on the definition of "obscure" this has been meet,
       | but given that sqlite only requires _very_ little to make it run
       | I guess the for the appropriate definition of  "obscure" it is
       | not fulfilled quite yet.
       | 
       | D) This still needs a lot of work even just for line coverage the
       | tooling isn't quite up to my standards tbh.. While my standards
       | are pretty high, I fear the sqlite standards might be even
       | higher.
       | 
       | E) It's kinda meet but needs a bit more time to mature.
       | 
       | F) I think this was more or less already meet in 2017, if not
       | then it's by now.
        
         | zinekeller wrote:
         | > A) I don't think this is needed you could just pin a older
         | rust version. But if it's needed we are not quite there yet.
         | 
         | The problem I think is that SQLite is also used in embedded
         | systems, they need to be predictable (for example, for all of
         | its flaws C89 is still used in SQLite). So unless that there is
         | a subset that is that stable, they won't move yet.
        
       | dathinab wrote:
       | While there are a lot of good points there are some which are
       | strange IMHO (wrt. the save language section):
       | 
       | > import complete binary SQLite database files from untrusted
       | sources
       | 
       | Standard sqlite3 contains features _which makes opening databases
       | from untrusted sources quite dangerous_ and as far as I know
       | there is no  "un-trusted" open mode, through you might be able to
       | compile a hardened/restricted sqlite. Either way it's true that
       | using a "safe" language would not have helped here.
       | 
       | > Safe languages insert additional machine branches to do things
       | like verify that array accesses are in-bounds. In correct code,
       | those branches are never taken. That means that the machine code
       | cannot be 100% branch tested, which is an important component of
       | SQLite's quality strategy.
       | 
       | But sqlite favors the use of asserts and what this languages
       | insert are basically asserts...
       | 
       | > Safe languages usually want to abort if they encounter an out-
       | of-memory (OOM) situation.
       | 
       | This might be true about go (idk.) but isn't this generalizing a
       | bit to much?
       | 
       | Just to be clear that sqlite is and stays in C makes totally
       | sense I just feel the author tries a bit to hard to find
       | arguments beyond the necessary ones.
       | 
       | I mean:
       | 
       | - When it was written there where no chooseable "safe" language
       | alternatives.
       | 
       | - Sqlite is _extremely_ well tested, many (most?) of the recent
       | bugs where not of the kind any safe language would have
       | prevented.
       | 
       | - Safe languages don't necessarily help you with avoiding logic
       | bugs. And while many do have additional abstractions/tooling to
       | help with preventing logic bugs rewriting sqlite is more likely
       | to introduce more logic bugs then it would prevent in the long
       | runs. Rewrites are always a good chance to both fix bugs but also
       | introduce new bugs.
       | 
       | - Sqlite is extremely portable non of the safe languages have
       | quite that level of portability sqlite wants to provide.
        
       ___________________________________________________________________
       (page generated 2021-08-23 23:02 UTC)