[HN Gopher] immudb - world's fastest immutable database, built o...
___________________________________________________________________
immudb - world's fastest immutable database, built on a zero trust
model
Author : dragonsh
Score : 124 points
Date : 2021-12-27 15:00 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| timdaub wrote:
| I went on their website and tried to understand how immutability
| is enforced but I couldn't find anything.
|
| I'm sceptical, but particularly because they make a deliberate
| comparison to blockchain that I doubt they'll be able to deliver.
|
| The PoW immutability of e.g. BTC and ETH is strong as it yields
| the following guarantees for stored data:
|
| - Immutability of the BTC blockchain is protected through all
| cummulative work that has happened on a specific branch of the
| chain. Even if someone replayed BTC, it'd take millenias to
| recompute the work on an average machine
|
| - The immutability isn't enforced on a file level, as I suspect
| it is with immudb. Immutability is enforced through the network
| that has additionally shown to have conservative political views
| too. You can go, sync a BTC node and change the underlying level
| db. Still that won't change the network state. Immutability on a
| single system is physically impossible if e.g you consider
| deleting the file as mutation.
|
| - immudb says "it's immutable like a blockchain but less
| complicated", but Bitcoin isn't more complicated than some
| sophisticated enterprise db solution.
|
| - I think immudb should be maximally upfront what they mean by
| immutability: It seems they want to communicate that they're
| doing event sourcing - that's different from immutability
|
| Finally there's a rather esotheric argument. If you run an
| immutable database as an organizatiom where one individual node
| cannot alter the network state but you have (in)direct control
| over all nodes: Isn't it always mutable as you could e.g. choose
| to swap out consensus?
|
| So from a philosophical perspective, then immutability can truly
| only occur if mutability is out of an individual's control.
|
| Why do I have the authority to say this? Because I too have once
| worked for a database with blockchain characteristics called
| https://www.bigchaindb.com
|
| Edit: The best solution that also has a theoretically unlimited
| throughput is this toy project:
| https://github.com/hoytech/quadrable
|
| Conceptually, it computes a merkle tree over all data and
| regularly commits to Ethereum. Through this commitment the data
| may still change locally: But then at least would be provably
| tampered. So I guess for databases, the artibute we can really
| implement is "tamper-proof".
| jandrese wrote:
| The big question is if someone gets on your DB server and wants
| to change a record how does the software prevent them from
| altering a record and then recomputing the remainder of the
| chain?
| layer8 wrote:
| I'd say the attribute is "tamper-proof history", not "tamper-
| proof data (current content)".
| YogurtFiend wrote:
| I'm not sure that this is a _useful_ tool. Let's talk about the
| threat model or the attacks that this defends against.
|
| If a Client is malicious, they might try to manipulate the data
| in the database in an untoward way. In a "normal" database, this
| might cause data loss, if the database isn't being continuously
| backed up. But immudb does continuous backups (effectively, since
| it's immutable) so, if a malicious client has been detected, it's
| possible to restore an older version of the database. The real
| problem is how would you know that a client has tampered with
| your database? Well, because this database is "tamper-proof,"
| duh! But the issue lies in the definition of tamper-proof. From
| my reading of the source code and documentation, the "proof that
| no tampering has occurred" is a proof that the current state of
| the database can be reached by applying some database operations
| to a previous state. As a result, a malicious client could simply
| ask the database to "delete everything and insert this new data,"
| to make the database look like whatever it wanted. This is a
| valid way to transition the state of the database from its old
| state to the new state, and so shouldn't be rejected by the
| tamper detection mechanism.
|
| "Ah," but you say, "it would look super sus [as the kids say] to
| just delete the entire database. We'd know that something was
| up!" The problem with this solution is how are you going to
| automate "looking super sus?" You could enact a policy to flag
| any update that updates more than N records at a time, but that's
| not really a solution. The "right" solution is to trace the
| provenance of database updates. Rather than allowing arbitrary
| database updates, you want to allow your database to be changed
| only by updates that are sensible for your application. The
| _actual_ statement you want to prove is that "the current state
| of the database is a known past state of the database updated by
| operations that my application ought to have issued." Of course
| what are "operations that my application ought to have issued?"
| Well, it depends how deep you want to go with your threat model.
| A simple thing you could do is have a list of all the queries
| that your application issues, and check to make sure all
| operations come from that list. This still allows other attacks
| through, and you could go even more in depth if you wanted to.
|
| Importantly, immudb doesn't appear to contend with any of this.
| They claim that their database is "tamper-proof," when in reality
| you'd need a complicated external auditing system to make it
| meaningfully tamper-proof for your application. (Again, a threat
| model ought to include a precise definition of "tamper-proof,"
| which would help clear up these issues.)
|
| It's also worth comparing this to
| https://en.wikipedia.org/wiki/Certificate_Transparency, which is
| an append-only database. Compared to immudb, the _exposed data
| model_ for certificate transparency logs is an append-only set,
| which means that it doesn't have any of these same problems. The
| problem with immudb is that the data model it exposes is more
| complicated, but it's built-in verification tools haven't been
| upgraded to match.
|
| (Also, for context, I've tried to obtain a copy of their white
| paper, but after an hour the email with the link to it never
| arrived.)
| layer8 wrote:
| Regarding backups, note that you still need separate backups
| with immudb.
| gigatexal wrote:
| So is this a useful alternative to blockchains or just hype?
| newtonapple wrote:
| Has anyone tried immudb in production? What are some of immudb's
| performance characteristics? It'd be nice to know how it performs
| under various conditions: query per sec, database / table sizes,
| SQL join performance etc.
|
| Also, what are the system requirements for immudb? What kind of
| machine would I need to run a medium to large website (say, 1TB
| of data, 5-25K qps, e.g. Wikipedia)?
|
| It mentioned in the documentation that it can use S3 as its
| storage? Are there performance implications if you do this?
| tarr11 wrote:
| Previous HN thread about immutable databases:
|
| https://news.ycombinator.com/item?id=23290769
| artemonster wrote:
| Can someone ELI5 how immutability applies to databases and which
| advantages it brings. Thank you!
| gopalv wrote:
| > immutability ... which advantages it brings
|
| Immutability brings a bunch of perf short-cuts which is usually
| impossible to build with a mutable store.
|
| You'll find a lot of metric stores optimized for fast ingest to
| take advantage of the immutability as a core assumption, though
| they don't tend to do what immudb does with the cryptographic
| signatures to check for tampering.
|
| Look at GE Historian or Apache Druid for most of what I'm
| talking about here.
|
| You can build out a tiered storage system which pushes the data
| to a remote cold store and keep only immediate writes or recent
| reads locally.
|
| You can run a filter condition once on an immutable
| block/tablet, then never run it again (like a count(*) where
| rpm > X and plane_id = ?) can be remembered as compressed
| bitsets of each column, rather than as final row selection
| masks. Then reuse half of that when you change the plane_id = ?
| parameter.
|
| The fact that the data will never be updated makes it
| incredibly fast to query as you stream more data constantly
| while refreshing the exact same dashboard every 3 seconds for a
| monitoring screen - every 3s, it will only actually process the
| data that arrived in those 3 seconds, not repeat the query over
| the last 24h all over again.
|
| The moment you allow even a DELETE operation, all of this
| becomes a complex mess of figuring out how to adjust for
| changes (you can invalidate the bit-vectors of the updated cols
| etc, but it is harder).
| jandrese wrote:
| If the data is being added or updated continually how do you
| prevent the database from growing without bound?
| mjh2539 wrote:
| You don't. You just keep throwing disks at it.
| throwaway984393 wrote:
| Immutability is probably the most powerful concept that applies
| to how modern technology can be used. Versioned, immutable, and
| cryptographically-signed artifacts do a bunch of things for
| you.
|
| From an operational standpoint, it allows you to roll out a
| change in exactly the way you tested, confident that it will
| work the way it's intended. It also allows you to roll back or
| forward to any change with the same confidence. It also means
| you can restore a database _immediately_ to the last known good
| state. Changes essentially cannot fail; no monkey-patching a
| schema or dataset, no "migrations" that have to be
| meticulously prepared and tested to make sure they won't
| accidentally break in production.
|
| From a security and auditing standpoint, it ensures that a
| change is exactly what it's supposed to be. No random changes
| by who-knows-who at who-knows-when. You see a reliable history
| of all changes.
|
| From a development standpoint, it allows you to see the full
| history of changes and verify the source or integrity of data,
| which is important in some fields like research.
| bob1029 wrote:
| There is also a performance advantage if you can build
| everything under these constraints. A pointer to something
| held in an immutable log will never become invalid or
| otherwise point to garbage data in the future. At worst,
| whatever is pointed to has since been updated or compensated
| for in some _future_ transaction which is held further
| towards the end of the log. Being able to make these
| assumptions allows for all kinds of clever tricks.
|
| The inability to mutate data pointed to in prior areas of the
| log does come with tradeoffs regarding other performance
| optimizations that expressly rely on mutability, but in my
| experience constraining the application to work with an
| immutable log (i.e. dealing with stale reads & compensating
| transactions) usually results in substantial performance
| uplift compared to solutions relying on mutability. One
| recent idea that furthers this difference is NAND storage,
| where there _may_ be a substantial cost to be paid if one
| wants to rewrite prior blocks of data (depending on the type
| of controller /algorithm used by the device).
| fouc wrote:
| > A pointer to something held in an immutable log will
| never become invalid or otherwise point to garbage data in
| the future.
|
| Now I'm wondering if we can have immutable versioned APIs
| pharmakom wrote:
| Is it possible to delete data for compliance reasons? Not as a
| frequent operation, but say on a monthly batch?
| jeroiraz wrote:
| logical deletion is in place, physical deletion is already on
| the roadmap
| cabalamat wrote:
| Would it be possible to have something like this that works by
| writing to a PROM? That would make it immutable at the hardware
| level.
| ShamelessC wrote:
| > Data stored in immudb is cryptographically coherent and
| verifiable. Unlike blockchains, immudb can handle millions of
| transactions per second, and can be used both as a lightweight
| service or embedded in your application as a library. immudb runs
| everywhere, on an IoT device, your notebook, a server, on-premise
| or in the cloud.
|
| Seems pretty useful actually. Can anyone with a relevant
| background comment on when this would be a bad idea to use?
| KarlKemp wrote:
| The data that is at risk of being changed with malicious intent
| is certainly not insignificant, but still just a fraction of
| all data. Changing to this adds a new and complicated system,
| replacing whatever you're currently using, which will have seen
| far better testing and is known by the people working with it.
| staticassertion wrote:
| If you can trust your writers there's likely no need for this.
| A modern approach tends to have databases owned by a single
| service, which exposes the model via RPCs. So you generally
| don't have more than one writer, which means you're pretty much
| de-facto "zero trust" if that single writer follows a few rules
| (ie: mutual auth, logging, etc).
|
| But in some cases you don't have that same constraint. For
| example, databases that store logs (Elastic, Splunk, etc) might
| have many readers and writers, including humans.
|
| In that case enforced immutability might be a nice property to
| have. Attackers who get access to your Splunk/ES cluster
| certainly will have fun with it.
| imglorp wrote:
| There are a few properties to be aware of. Although it might be
| a KV store, you're probably going to want sensible queries
| using other than the primary key. Eg time series or secondary
| keys. So in addition to the KV store, there is probably a need
| for an external index and query mechanism. Another issue is
| obtaining consistent hashing, where multiple documents might
| have the same content but vary by order or by date format.
| Finally, do you have to go to the beginning and hash everything
| to get a proof of one transaction, or is there some shortcut
| aggregation possible?
|
| We evaluated AWS QLDB for these things in our application as a
| financial ledger and were impressed at their progress with a
| novel data store. They invented some of the tech in house for
| this product instead of grabbing an off the shelf open product.
| Lockin would be a downside here.
|
| Immudb looks promising because it's not locked to a cloud host.
|
| https://aws.amazon.com/qldb/faqs/
| jaboutboul wrote:
| note that it does KV and SQL
| mistrial9 wrote:
| > Eg time series or secondary keys
|
| not an "all or nothing" question.. for example, a fast-enough
| "return the most recent in a time series" is not exactly
| time-series, but solves many use cases
| rattlesnakedave wrote:
| Seems like it would still be vulnerable to rollback attacks.
| Signed rows would probably get you farther with less novel tech
| involved if you want immutability.
| throwaway984393 wrote:
| Don't forget to star this repo if you like immudb!
|
| I didn't realize GitHub had "Like and subscribe" culture now. : /
| willcipriano wrote:
| > You can add new versions of existing records, but never change
| or delete records. This lets you store critical data without fear
| of it being tampered.
|
| > immudb can be used as a key-value store or relational data
| structure and supports both transactions and blobs, so there are
| no limits to the use cases.
|
| This is game changing. Use it for say a secondary data store for
| high value audit logs. I'll consider using it in the future.
| voidfunc wrote:
| What happens if you have some data that absolutely _must_
| change or be deleted? For example, a record gets committed with
| something sensitive by mistake.
| pmontra wrote:
| Or customers ask their personal data to be deleted, GDPR,
| right to be forgotten, etc.
|
| I guess we must consider what can go in an immutable storage
| and what must not.
| rch wrote:
| You should be storing potentially GDPR-covered data
| encrypted with entity specific keys, which are destroyed
| when necessary.
| gnufx wrote:
| Right, regardless of the storage, but in the research
| computing circles I see, it's just not done. The promises
| of "data destruction" that get demanded are basically
| accompanied by fingers crossed behind the back (is that
| an international thing to "cover" for lying?) considering
| the filesystem and backup mechanisms etc.
| jaboutboul wrote:
| There is a data expiration and also logical deletion
| feature for exactly this use case.
| dillondoyle wrote:
| I don't totally understand the value of the second, but isn't
| the first already exist in things like BigQuery?
| zimpenfish wrote:
| > isn't the first already exist in things like BigQuery?
|
| You can truncate a BQ table and reload it if you want to
| change things. Had to do this at a previous gig (twice a
| day!) because the data warehouse people would only take data
| from BQ but the main data was in Firebase (yes, it was an
| insane place.)
| mathnmusic wrote:
| Or a traditional database with read-only credentials and a
| function that adds "ORDER BY version DESC LIMIT 1".
| [deleted]
| lojack wrote:
| but is a traditional database cryptographically secure? if
| a super user with write permissions (or, say, direct access
| to the physical data store) modifies records are users able
| to validate the integrity of the data?
| jayd16 wrote:
| You could use permissions and stored procedures that
| ensure append only.
| jeroiraz wrote:
| The difference is that client applications does not need
| to trust proper "append-only" permissions were enforced
| on server side but they will be have the chance to detect
| any tampering while in the former approach, it won't be
| noticeable
| ledgerdev wrote:
| Does this have, or are there any plans for a change-feed? Has
| anyone used this as an event sourcing db?
| chalcolithic wrote:
| >millions of transactions per second I wonder if I wanted to
| survey a landscape of all databases that claim such numbers how
| could I possibly find them?
| furstenheim wrote:
| GDPR compliance will be tricky. How does one delete data?
| KarlKemp wrote:
| Pruning is on the roadmap.
| jquery wrote:
| How is it immutable if you can prune it?
| jeroiraz wrote:
| several solutions may be possible. Simplest would be to
| delete payloads associated to entries. While the actual
| data won't be there, it will still be possible to build
| cryptographic proofs. Then it's possible to prune by
| physical deleting entire transaction data, which may or not
| affect proof generation. However, tampering will still be
| subject to detection.
| jcims wrote:
| Are records atomically immutable or is there a set
| concept such that the lack of mutation can be verified
| over a set of records?
| jeroiraz wrote:
| currently it's logical deletion and time-based expiration.
| Actual values associated to expired entries are not fetched.
| Physical deletion is already in the roadmap.
| ledgerdev wrote:
| My preferred method is to tokenize sensitive data before
| storing in the immutable logs/database.
| [deleted]
| fragmede wrote:
| Store the data encrypted, then delete the keys when requested.
| endisneigh wrote:
| This isn't really deleting it though. What happens if in the
| future technology changes and current cryptography is moot?
| nowherebeen wrote:
| Then fire up a new database with the latest customer data
| every 18 months. And completely delete the old database
| once you confirm it no longer has value.
| cookiengineer wrote:
| Or just store the customer database in /tmp and reboot
| the server every 18 months. /s
| endisneigh wrote:
| I thought the point of this is to have an exhaustive
| record for audit purposes.
| okr wrote:
| You clone the database and remove/update the corresponding
| lines. GDPR does not mean, you have to fix it right away, imho.
| gnabgib wrote:
| Article 17 does include the term "without undue delay"[0],
| but such vague language seems ripe for some court precedent.
|
| A clone and remove/update per GDPR request seems like undue
| delay, certainly one that could be avoided by alternative
| architecture choices (keep the personally identifiable
| information (PII) in a mutable store)
|
| [0]: https://gdpr-info.eu/art-17-gdpr/
| sigzero wrote:
| No, it's not undue delay. That's just how it currently
| works and that is a fine argument.
| [deleted]
| peoplefromibiza wrote:
| But you have to do in a pretty short timeframe
|
| > _Under Article 12.3 of the GDPR, you have 30 days to
| provide information on the action your organization will
| decide to take on a legitimate erasure request. This
| timeframe can be extended up to 60 days depending on the
| complexity of the request._
|
| even if they ask for more time, first communication has to
| come within 30 days
| 1cvmask wrote:
| Words like immutable make me allergic
| abc_lisper wrote:
| See a doctor then . It isn't expected. Could be a lack of CS
| education, in which case, read some books. If that doesn't fix
| it, see a psychiatrist - something could be wrong with your
| brain.
___________________________________________________________________
(page generated 2021-12-27 23:00 UTC)