[HN Gopher] In-Memory C++ Leap in Blockchain Analysis
___________________________________________________________________
In-Memory C++ Leap in Blockchain Analysis
Hey HN We're the core engineering team at Caudena (which is used
globally by investigative and intelligence agencies, including:
Europol, Interpol, BKA, DHS, IRS-CI, FBI, NPA and others), and we
just released the technical details behind Prism - our real-time,
in-memory C++ database for blockchain analysis. To tackle the
massive scale and complexity of blockchain data, we had to get
creative with low-level engineering: - We utilize barebone servers
with 2TB RAM and 48 Cores. - Implemented lock-free concurrent data
structures - Developed a custom memory management system -
Leveraging CPU-level vectorization - Built a custom in-memory
columnar/graph database from scratch We'd love to AMA about: -
the engineering choices we made - crazy optimizations that paid
off - pitfalls we hit Ask us anything about scaling, memory
trade-offs, building real-time analytics on immutable data, or the
crypto-forensics space. Looking forward to a great convo!
Author : caudena
Score : 64 points
Date : 2025-06-18 20:40 UTC (1 days ago)
(HTM) web link (caudena.com)
(TXT) w3m dump (caudena.com)
| rubenvanwyk wrote:
| This should be a Show HN?
| caudena wrote:
| We've been thinking about it, but according to Show HN rules,
| blog posts are considered off-topic.
| Snoozus wrote:
| We built something very similar back in 2016, in the jvm with
| unsafe memory and garbage-free data structures to avoid GC
| pauses. The dynamic clustering is not too hard, are you able to
| dynamically undo a cluster when new information shows up?
|
| Are you running separate instances per customer to separate the
| information they have access to?
| caudena wrote:
| Assuming by undoing you mean splitting the cluster:
|
| A linked list can be split in two in O(1). When it comes to
| updating the roots for all the removed nodes, there is no easy
| way out, but luckily:
|
| - This process can be parallelized.
|
| - It could be done just once for multiple clustering changes.
|
| - This is a multi-level disjoint set, not all the levels or
| sub-clusters are usually affected. Upper level clustering,
| which is based on lower confidence level, can be rebuilt more
| easily.
|
| If by undoing you mean reverting the changes, we don't use a
| persistent data structure. When we need historical clustering,
| we use a patched forest with concurrent hash maps to track the
| changes, and then apply or throw them away.
|
| We use a single instance for all clients, but when one CFD
| server processes new block data, it becomes fully blocked for
| read access. To solve this, we built a smart load balancer that
| redirects user requests to a secondary CFD server. This ensures
| there's always at least two servers running, and more if we
| need additional throughput.
| joshstrange wrote:
| Do you see crypto as anything more than scams/crime/speculation?
|
| Most people involved in crypto pretend it's the future and their
| business models depend on pumping up crypto. That might be the
| same for you all but I figure of anyone in the space, a group
| dedicated to tracking down where coins are moving for government
| agencies (I assume for scams/crime reasons) might not have the
| wool so pulled over their eyes.
| caudena wrote:
| First of all, at Caudena, we are not involved in crypto
| projects or investments ourselves. Our expertise lies in
| analyzing blockchains and providing deep technical insights
| into how various blockchains operate. We focus on tracking and
| understanding the flow of digital assets, often in support of
| government agencies investigating scams, fraud, and other
| illicit activities.
|
| That said, we absolutely believe that blockchain and
| cryptocurrency will shape the future of the financial system.
| When you look beyond the noise of scam tokens, speculative
| NFTs, and high-profile scandals, there is significant and
| meaningful financial innovation happening. This extends beyond
| DeFi to include the tokenization of RWA, where major
| institutions like BlackRock and JPM Chase are actively
| exploring and implementing blockchain-based solutions. Numerous
| projects are driving real progress, and there's a slow but
| steady movement toward a more decentralized and transparent
| financial ecosystem.
| jnkl wrote:
| Can you be a bit more specific about the practical aspects of
| block chain technology regarding RWA?
| germandiago wrote:
| Who says that crypto is exclusively scams? There is that of
| course, but not only that. I do not find Bitcoin to be a scam.
| newswasboring wrote:
| There are like attempts at non scam projects, but none of
| them get any traction and usually end up closing. What, in
| your opinion, is a success story in this space?
| seviu wrote:
| Cadena post above yours mentioned quite a few use
| successful cases, all built on top of Ethereum or copy cats
| (Ethereum is by itself a successful use case)
|
| Without thinking too hard, Aave is shaping to be a giant by
| its own as lending protocol.
|
| Circle recently had a very successful IPO.
|
| Farcaster and Lens are attempting to compete as social
| network platforms (surprisingly they lack much of the
| toxicity that comes on the most known ones)
|
| And lastly don't forget Polymarket, which is pretty well
| known beyond the crypto space.
|
| The list goes on and on if you care to dig a bit deeper
| newswasboring wrote:
| All of these are at best nice prospects and most of these
| are just services which are only useful if the space
| itself is useful. I'm sorry but I'm not convinced of
| utility by layers upon layers of "successful" protocols.
| seviu wrote:
| I wouldn't call polymarket a nice prospect though.
| IshKebab wrote:
| Apart from Bitcoin is there anything successful that isn't a
| scam? I never heard of any.
| drdrey wrote:
| Stripe would like to have a word
| IshKebab wrote:
| What successful crypto products do they have?
| plq wrote:
| When implementing the lock-free stuff, was portability (across
| processors) a goal? If yes, did you have to deal with anything
| specific? Do you notice any difference in behavior of correct
| implementations when ran on different processors? How do you test
| for correctness of lock-free stuff?
|
| EDIT: Oh and did you implement from scratch? Why not use eg. the
| RCU implementation from folly?
| caudena wrote:
| We never targeted weakly-ordered architectures like ARM, only
| x86. We never used a wide variety of different processors. We
| are not developing the Linux kernel and are not into control
| dependencies, just relying on the fences and the memory model.
| There may be some CPU-dependent performance differences, like
| discrepancy because of NUMA or false sharing being noticeable
| on one processor, but not on another. RCU and hazard pointers
| are nothing new. For the disjoint sets we don't need them. For
| the forest patches and the tries we do. We are using TBB and
| OpenMP whenever possible and trying to keep things simple.
| folk111 wrote:
| is it true that XMR / monero is untraceable?
| caudena wrote:
| No.
| lossolo wrote:
| "No" what? There are cases where it is traceable if someone
| uses it in a certain way, but if you do everything by the
| book, it's untraceable.
| layer8 wrote:
| > barebone servers
|
| You mean bare-metal servers?
| caudena wrote:
| Ohh, you're absolutely right!
| dboreham wrote:
| barebone server is a thing fwiw: A product that comprises a
| motherboard installed in a case with PSU. Customer adds CPU,
| memory and storage devices to make a complete usable server. We
| typically buy servers in this way because figuring out what
| motherboard fits in which case is a pita, conversely buying
| complete servers is more expensive and potentially runs into
| inventory issues at the vendor. So possibly they are running
| bare metal servers that were also barebone.
| generalenvelope wrote:
| Curious why you chose C++? Were there aspects of other
| languages/ecosystems like Rust that were lacking? Would choosing
| Rust be advantageous for blockchains that natively support it
| (like Solana)?
|
| To be clear: I don't mean to imply you should have done it any
| other way. I'm interested mainly in gaps in existing ecosystems
| and whether popular suggestions to "deprecate C++ for memory safe
| languages" (like one made by Azure CTO years ago) are realistic.
| kanbankaren wrote:
| What is wrong with C++?
|
| With POSIX semaphores, mutexes, and shared pointers, it is very
| rare to hit upon a memory issue in modern C++.
|
| Source: Writing code in C/C++ for 30 years.
| wat10000 wrote:
| What a terrifying statement.
|
| Edit: to be less glib, this is like saying "our shred-o-matic
| is perfectly safe due to its robust and thoroughly tested off
| switch." An off switch is essential but not nearly enough. It
| only provides acceptable safety if the operator is perfect,
| and people are not. You need guards and safety interlocks
| that ensure, for example, that the machine can't be turned on
| while Bob is inside lubricating the bearings.
|
| Mutexes and smart pointers are important constructs but they
| don't provide safety. Safety isn't the presence of safe
| constructs, but the absence of unsafe ones. Smart pointers
| don't save you when you manage to escape a reference beyond
| the lifetime of the object because C++ encourages passing
| parameters by reference all over the place. Mutexes and
| semaphores don't save you from failing to realize that some
| shared state can be mutated on two threads simultaneously.
| And none of this saves you from indexing off the end of a
| vector.
|
| You can probably pick a subset of C++ that lets you write
| reasonably safe code. But the presence of semaphores,
| mutexes, and shared pointers isn't what does it.
|
| Source: also writing C and C++ for 30 years.
| lisper wrote:
| > Safety isn't the presence of safe constructs, but the
| absence of unsafe ones.
|
| Exactly. Here is a data point:
| https://spinroot.com/spin/Doc/rax.pdf
|
| Tl;DR: This was software that ran on a spacecraft.
| Specifically designed to be safe, formally analyzed, and
| tested out the wazoo, but nonetheless failed in flight
| because someone did an end-run around the safe constructs
| to get something to work, which ended up producing a race
| condition.
| FpUser wrote:
| >"What a terrifying statement."
|
| The statement may not be correct but calling it terrifying
| is way melodramatic.
| nesarkvechnep wrote:
| The worst code is usually written by someone who's doing it
| for 30 years and can't find a problem with their technology
| of choice.
|
| Especially with shared pointers you can encounter pretty
| terrible memory issues.
| kanbankaren wrote:
| Dude, provide examples of "terrible" memory issues.
| Otherwise, you are just repeating the folklore which is
| outdated.
| CharlesW wrote:
| > _With POSIX semaphores, mutexes, and shared pointers, it is
| very rare to hit upon a memory issue in modern C++._
|
| There is a mountain of evidence (two examples follow) that
| this is not true. Roughly two-thirds of serious security bugs
| in large C++ products are still memory-safety violations.
|
| (1) https://msrc.microsoft.com/blog/2019/07/we-need-a-safer-
| syst... (2) https://www.chromium.org/Home/chromium-
| security/memory-safet...
| kanbankaren wrote:
| Show me a memory issue that was caused by proper usage of
| POSIX concurrency primitives.
| jenadine wrote:
| Proper usage is fine. The problem is that it is easy to
| make mistakes. The compiler won't tell you and you may
| not notice until too late in production, and it will take
| forever to debug.
| CharlesW wrote:
| Here's two: CVE-2021-33574, CVE-2023-6705. The former had
| to be fixed in glibc, illustrating that proper usage of
| POSIX concurrency primitives does nothing when the rest
| of the ecosystem is a minefield of memory safety issues.
| There are some good citations on page 6 of this NSA
| Software Memory Safety overview in case you're interested
| ://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/CSI_S
| OFTWARE_MEMORY_SAFETY.PDF
| treyd wrote:
| You're right, if you use the concurrency primitives
| properly you won't have data races. But the issue is when
| people don't use the concurrency primitives properly,
| which there is ample evidence for (posted in this thread)
| happening all the time.
|
| But with this argument, the response is "well they didn't
| use the primitives properly so the problem is them",
| which shifts the blame onto the developer and away from
| the tools which are too easy to silently misuse.
|
| This also ignores memory safety issues that aren't data
| races, like buffer overflows, UAF, etc.
| wat10000 wrote:
| Any reasonable meaning of "proper" would include not
| causing memory issues, so you've just defined away any
| problems. Note that this is substantially different from
| not having any problems.
|
| The great lesson in software security of the past few
| decades is that you can't just document "proper usage,"
| declare all other usage to be the programmer's fault, and
| achieve anything close to secure software. You must have
| systems that either disallow unsafe constructs (e.g. rust
| preventing references from escaping at compile time) or
| can handle "improper usage" without allowing it to become
| a security vulnerability (e.g. sandboxing).
|
| Correctly use your concurrency primitives and you won't
| have thread safety bugs, hooray! And when was the last
| time you found a bug in C-family code caused by someone
| who didn't correctly use concurrency primitives because
| the programmer incorrectly believed that a certain piece
| of mutable data would only be accessed on a single
| thread? I'll give you my answer: it was yesterday. Quite
| likely the only reason it's not today is because I have
| the day off.
| kanbankaren wrote:
| > And when was the last time you found a bug in C-family
| code caused by someone who didn't correctly use
| concurrency primitives because the programmer incorrectly
| believed that a certain piece of mutable data would only
| be accessed on a single thread? I'll give you my answer:
| it was yesterday.
|
| You answered my question. My original argument was using
| concurrency primitives "properly" in C++ prevents memory
| issues and Rust isn't strictly necessary.
|
| I have nothing against Rust. I will use it when they
| freeze the language and publish a ISO spec and multiple
| compilers are available.
| bobmcnamara wrote:
| dozens caused by folks thinking pthread_cancel() was the
| right tool for the job
| FpUser wrote:
| I write high performance backends in C++. Works
| approximately as described in article and all data are in
| RAM and in structures specialized for access patterns.
| Works like a charm and runs 24x7 without a trace of
| problem.
|
| I've never had a single complaint from my customers. Well I
| do have bugs in logic during development but those are
| found and eliminated after testing. And every new backend I
| do I base on already battle tested C++ foundation code. Why
| FFS would I ever want to change it (rewrite in Rust). As a
| language Rust has way less features that I am accustomed to
| use and this safety of Rust does not provide me any
| business benefits. It is quite the opposite. I would just
| lose time, money and still have those same logical bugs to
| iron out.
| acdha wrote:
| How many other programmers have you trained up to that
| level of results? Can you get them to work on Windows,
| Chrome, etc. so users stop getting exposed to bugs which
| are common in C-like languages but not memory-safe
| languages?
| FpUser wrote:
| I do not train programmers. I hire subcontractors when I
| need help. They're all same level as myself or better.
| Easy to find amongst East Europeans and does not cost
| much. Actually cheaper than some mediocre programmer from
| North America who can only program using single language
| / framework and has no clue about architecture and how
| various things work together in general.
| npalli wrote:
| Rust is the future of systems programming and will always be
| for the foreseeable future. The memory issue will mostly be
| addressed as needed, see from John Carmack yesterday[1], the
| C++ ecosystem advantage (a broad sense of how problems whether
| DS, Storage, OS, Networking, etc. have been solved) will be
| very hard to overcome for newer programming languages. I think
| it is ironic how modern C++ folks just keep chugging along
| releasing products while Rust folks are generally haranguing
| everyone about "memory safety" and generally leaving half
| finished projects (turns out writing Rust code is more fun than
| reading someone else, who would have guessed).
|
| [1] https://x.com/ID_AA_Carmack/status/1935353905149341968
| wgjordan wrote:
| > The memory issue will mostly be addressed as needed
|
| I have no allegiance to either lang ecosystem, but I think
| it's an overly optimistic take to consider memory safety a
| solved problem from a tweet about fil-c, especially
| considering "the performance cost is not negligible" (about
| 2x according to a quick search?)
| npalli wrote:
| Performance drop of 2x for memory safety critical sections
| vs Rust rewrite taking years/decades, not even a contest.
| Now, if that drop was 10x maybe, but at 2x it is no brainer
| to continue with C++. I'm not certain Fil-C totally works
| in all cases, but it is an example of how the ecosystem
| will evolve to solve this issue and not migrate to Rust.
| hexaga wrote:
| What would you consider to be a non memory safety
| critical section? I tried to answer this and ended up in
| a chain of 'but wait, actually memory issues here would
| be similarly bad...', mainly because UB and friends tend
| to propagate and make local problems very non-local.
| secondcoming wrote:
| What's this Rust thing?
| caudena wrote:
| Because we are on the 'unsafe' territory. And Rust doesn't even
| have a defined memory model. Rust is a little bit immature. We
| have some other services written in Rust though.
| wslh wrote:
| Thank you for the AMA. A few initial questions:
|
| - Would it be possible to open source your DB in the future? I
| think there are challenges in blockchain analysis (e.g. internal
| transactions) that goes beyond the specific DB.
|
| - Having used Chainalysis and others, your product seems superior
| based on your presentation. Which blockchains do you support?
|
| - Is there a "HN Code" to test Prism?
| caudena wrote:
| Thanks for the questions! We don't currently have plans to
| open-source it. For anything else, feel free to reach out at
| pa@caudena.com - happy to discuss further there. We'd like to
| keep this thread focused on the technical side rather than
| product discussions :)
| Snoozus wrote:
| If the FBI tells you wallet A and wallet B belong to the same
| actor, how do you use that information, so that they can see it
| on their view, without leaking it to Europol?
| caudena wrote:
| Are you from CA or CT? :)
|
| FBI and Europol will work with the same forest (unless they are
| using on-premise setup), but with different "patches".
| BiraIgnacio wrote:
| I didn't even know there were companies doing work in the
| "blockchain services" space. Kinda cool, tech begets tech, begets
| tech.
|
| Love the C++ work, btw
| canyp wrote:
| You really had to call it Prism (PRISM), didn't you?
|
| It's great to see C++ resulting in orders of magnitude cost
| reduction anyway. Do you have more details on the various C++
| tricks done for optimization?
| caudena wrote:
| Yeah, we figured people would compare it to PRISM :)
|
| There are many possible optimizations, but they're all highly
| specific to the particular problems you're trying to solve.
| CharlesW wrote:
| > _" Built a custom in-memory columnar/graph database from
| scratch"_
|
| This seems like an odd place to spend your resources. What do
| Prism's benchmarks look like vs Memgraph, KX kdb+, Apache Ignite,
| TigerGraph, etc.?
| actionfromafar wrote:
| Can't it be related to that data field sizes are very fixed and
| never changes?
| kayamon wrote:
| why you spyin on folks
___________________________________________________________________
(page generated 2025-06-19 23:01 UTC)