hngopher.com

       [HN Gopher] In-Memory C++ Leap in Blockchain Analysis
       ___________________________________________________________________
        
       In-Memory C++ Leap in Blockchain Analysis
        
       Hey HN  We're the core engineering team at Caudena (which is used
       globally by investigative and intelligence agencies, including:
       Europol, Interpol, BKA, DHS, IRS-CI, FBI, NPA and others), and we
       just released the technical details behind Prism - our real-time,
       in-memory C++ database for blockchain analysis.  To tackle the
       massive scale and complexity of blockchain data, we had to get
       creative with low-level engineering:  - We utilize barebone servers
       with 2TB RAM and 48 Cores.  - Implemented lock-free concurrent data
       structures  - Developed a custom memory management system  -
       Leveraging CPU-level vectorization  - Built a custom in-memory
       columnar/graph database from scratch  We'd love to AMA about:  -
       the engineering choices we made  - crazy optimizations that paid
       off  - pitfalls we hit  Ask us anything about scaling, memory
       trade-offs, building real-time analytics on immutable data, or the
       crypto-forensics space.  Looking forward to a great convo!
        
       Author : caudena
       Score  : 64 points
       Date   : 2025-06-18 20:40 UTC (1 days ago)
        
 (HTM) web link (caudena.com)
 (TXT) w3m dump (caudena.com)
        
       | rubenvanwyk wrote:
       | This should be a Show HN?
        
         | caudena wrote:
         | We've been thinking about it, but according to Show HN rules,
         | blog posts are considered off-topic.
        
       | Snoozus wrote:
       | We built something very similar back in 2016, in the jvm with
       | unsafe memory and garbage-free data structures to avoid GC
       | pauses. The dynamic clustering is not too hard, are you able to
       | dynamically undo a cluster when new information shows up?
       | 
       | Are you running separate instances per customer to separate the
       | information they have access to?
        
         | caudena wrote:
         | Assuming by undoing you mean splitting the cluster:
         | 
         | A linked list can be split in two in O(1). When it comes to
         | updating the roots for all the removed nodes, there is no easy
         | way out, but luckily:
         | 
         | - This process can be parallelized.
         | 
         | - It could be done just once for multiple clustering changes.
         | 
         | - This is a multi-level disjoint set, not all the levels or
         | sub-clusters are usually affected. Upper level clustering,
         | which is based on lower confidence level, can be rebuilt more
         | easily.
         | 
         | If by undoing you mean reverting the changes, we don't use a
         | persistent data structure. When we need historical clustering,
         | we use a patched forest with concurrent hash maps to track the
         | changes, and then apply or throw them away.
         | 
         | We use a single instance for all clients, but when one CFD
         | server processes new block data, it becomes fully blocked for
         | read access. To solve this, we built a smart load balancer that
         | redirects user requests to a secondary CFD server. This ensures
         | there's always at least two servers running, and more if we
         | need additional throughput.
        
       | joshstrange wrote:
       | Do you see crypto as anything more than scams/crime/speculation?
       | 
       | Most people involved in crypto pretend it's the future and their
       | business models depend on pumping up crypto. That might be the
       | same for you all but I figure of anyone in the space, a group
       | dedicated to tracking down where coins are moving for government
       | agencies (I assume for scams/crime reasons) might not have the
       | wool so pulled over their eyes.
        
         | caudena wrote:
         | First of all, at Caudena, we are not involved in crypto
         | projects or investments ourselves. Our expertise lies in
         | analyzing blockchains and providing deep technical insights
         | into how various blockchains operate. We focus on tracking and
         | understanding the flow of digital assets, often in support of
         | government agencies investigating scams, fraud, and other
         | illicit activities.
         | 
         | That said, we absolutely believe that blockchain and
         | cryptocurrency will shape the future of the financial system.
         | When you look beyond the noise of scam tokens, speculative
         | NFTs, and high-profile scandals, there is significant and
         | meaningful financial innovation happening. This extends beyond
         | DeFi to include the tokenization of RWA, where major
         | institutions like BlackRock and JPM Chase are actively
         | exploring and implementing blockchain-based solutions. Numerous
         | projects are driving real progress, and there's a slow but
         | steady movement toward a more decentralized and transparent
         | financial ecosystem.
        
           | jnkl wrote:
           | Can you be a bit more specific about the practical aspects of
           | block chain technology regarding RWA?
        
         | germandiago wrote:
         | Who says that crypto is exclusively scams? There is that of
         | course, but not only that. I do not find Bitcoin to be a scam.
        
           | newswasboring wrote:
           | There are like attempts at non scam projects, but none of
           | them get any traction and usually end up closing. What, in
           | your opinion, is a success story in this space?
        
             | seviu wrote:
             | Cadena post above yours mentioned quite a few use
             | successful cases, all built on top of Ethereum or copy cats
             | (Ethereum is by itself a successful use case)
             | 
             | Without thinking too hard, Aave is shaping to be a giant by
             | its own as lending protocol.
             | 
             | Circle recently had a very successful IPO.
             | 
             | Farcaster and Lens are attempting to compete as social
             | network platforms (surprisingly they lack much of the
             | toxicity that comes on the most known ones)
             | 
             | And lastly don't forget Polymarket, which is pretty well
             | known beyond the crypto space.
             | 
             | The list goes on and on if you care to dig a bit deeper
        
               | newswasboring wrote:
               | All of these are at best nice prospects and most of these
               | are just services which are only useful if the space
               | itself is useful. I'm sorry but I'm not convinced of
               | utility by layers upon layers of "successful" protocols.
        
               | seviu wrote:
               | I wouldn't call polymarket a nice prospect though.
        
           | IshKebab wrote:
           | Apart from Bitcoin is there anything successful that isn't a
           | scam? I never heard of any.
        
             | drdrey wrote:
             | Stripe would like to have a word
        
               | IshKebab wrote:
               | What successful crypto products do they have?
        
       | plq wrote:
       | When implementing the lock-free stuff, was portability (across
       | processors) a goal? If yes, did you have to deal with anything
       | specific? Do you notice any difference in behavior of correct
       | implementations when ran on different processors? How do you test
       | for correctness of lock-free stuff?
       | 
       | EDIT: Oh and did you implement from scratch? Why not use eg. the
       | RCU implementation from folly?
        
         | caudena wrote:
         | We never targeted weakly-ordered architectures like ARM, only
         | x86. We never used a wide variety of different processors. We
         | are not developing the Linux kernel and are not into control
         | dependencies, just relying on the fences and the memory model.
         | There may be some CPU-dependent performance differences, like
         | discrepancy because of NUMA or false sharing being noticeable
         | on one processor, but not on another. RCU and hazard pointers
         | are nothing new. For the disjoint sets we don't need them. For
         | the forest patches and the tries we do. We are using TBB and
         | OpenMP whenever possible and trying to keep things simple.
        
       | folk111 wrote:
       | is it true that XMR / monero is untraceable?
        
         | caudena wrote:
         | No.
        
           | lossolo wrote:
           | "No" what? There are cases where it is traceable if someone
           | uses it in a certain way, but if you do everything by the
           | book, it's untraceable.
        
       | layer8 wrote:
       | > barebone servers
       | 
       | You mean bare-metal servers?
        
         | caudena wrote:
         | Ohh, you're absolutely right!
        
         | dboreham wrote:
         | barebone server is a thing fwiw: A product that comprises a
         | motherboard installed in a case with PSU. Customer adds CPU,
         | memory and storage devices to make a complete usable server. We
         | typically buy servers in this way because figuring out what
         | motherboard fits in which case is a pita, conversely buying
         | complete servers is more expensive and potentially runs into
         | inventory issues at the vendor. So possibly they are running
         | bare metal servers that were also barebone.
        
       | generalenvelope wrote:
       | Curious why you chose C++? Were there aspects of other
       | languages/ecosystems like Rust that were lacking? Would choosing
       | Rust be advantageous for blockchains that natively support it
       | (like Solana)?
       | 
       | To be clear: I don't mean to imply you should have done it any
       | other way. I'm interested mainly in gaps in existing ecosystems
       | and whether popular suggestions to "deprecate C++ for memory safe
       | languages" (like one made by Azure CTO years ago) are realistic.
        
         | kanbankaren wrote:
         | What is wrong with C++?
         | 
         | With POSIX semaphores, mutexes, and shared pointers, it is very
         | rare to hit upon a memory issue in modern C++.
         | 
         | Source: Writing code in C/C++ for 30 years.
        
           | wat10000 wrote:
           | What a terrifying statement.
           | 
           | Edit: to be less glib, this is like saying "our shred-o-matic
           | is perfectly safe due to its robust and thoroughly tested off
           | switch." An off switch is essential but not nearly enough. It
           | only provides acceptable safety if the operator is perfect,
           | and people are not. You need guards and safety interlocks
           | that ensure, for example, that the machine can't be turned on
           | while Bob is inside lubricating the bearings.
           | 
           | Mutexes and smart pointers are important constructs but they
           | don't provide safety. Safety isn't the presence of safe
           | constructs, but the absence of unsafe ones. Smart pointers
           | don't save you when you manage to escape a reference beyond
           | the lifetime of the object because C++ encourages passing
           | parameters by reference all over the place. Mutexes and
           | semaphores don't save you from failing to realize that some
           | shared state can be mutated on two threads simultaneously.
           | And none of this saves you from indexing off the end of a
           | vector.
           | 
           | You can probably pick a subset of C++ that lets you write
           | reasonably safe code. But the presence of semaphores,
           | mutexes, and shared pointers isn't what does it.
           | 
           | Source: also writing C and C++ for 30 years.
        
             | lisper wrote:
             | > Safety isn't the presence of safe constructs, but the
             | absence of unsafe ones.
             | 
             | Exactly. Here is a data point:
             | https://spinroot.com/spin/Doc/rax.pdf
             | 
             | Tl;DR: This was software that ran on a spacecraft.
             | Specifically designed to be safe, formally analyzed, and
             | tested out the wazoo, but nonetheless failed in flight
             | because someone did an end-run around the safe constructs
             | to get something to work, which ended up producing a race
             | condition.
        
             | FpUser wrote:
             | >"What a terrifying statement."
             | 
             | The statement may not be correct but calling it terrifying
             | is way melodramatic.
        
           | nesarkvechnep wrote:
           | The worst code is usually written by someone who's doing it
           | for 30 years and can't find a problem with their technology
           | of choice.
           | 
           | Especially with shared pointers you can encounter pretty
           | terrible memory issues.
        
             | kanbankaren wrote:
             | Dude, provide examples of "terrible" memory issues.
             | Otherwise, you are just repeating the folklore which is
             | outdated.
        
           | CharlesW wrote:
           | > _With POSIX semaphores, mutexes, and shared pointers, it is
           | very rare to hit upon a memory issue in modern C++._
           | 
           | There is a mountain of evidence (two examples follow) that
           | this is not true. Roughly two-thirds of serious security bugs
           | in large C++ products are still memory-safety violations.
           | 
           | (1) https://msrc.microsoft.com/blog/2019/07/we-need-a-safer-
           | syst... (2) https://www.chromium.org/Home/chromium-
           | security/memory-safet...
        
             | kanbankaren wrote:
             | Show me a memory issue that was caused by proper usage of
             | POSIX concurrency primitives.
        
               | jenadine wrote:
               | Proper usage is fine. The problem is that it is easy to
               | make mistakes. The compiler won't tell you and you may
               | not notice until too late in production, and it will take
               | forever to debug.
        
               | CharlesW wrote:
               | Here's two: CVE-2021-33574, CVE-2023-6705. The former had
               | to be fixed in glibc, illustrating that proper usage of
               | POSIX concurrency primitives does nothing when the rest
               | of the ecosystem is a minefield of memory safety issues.
               | There are some good citations on page 6 of this NSA
               | Software Memory Safety overview in case you're interested
               | ://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/CSI_S
               | OFTWARE_MEMORY_SAFETY.PDF
        
               | treyd wrote:
               | You're right, if you use the concurrency primitives
               | properly you won't have data races. But the issue is when
               | people don't use the concurrency primitives properly,
               | which there is ample evidence for (posted in this thread)
               | happening all the time.
               | 
               | But with this argument, the response is "well they didn't
               | use the primitives properly so the problem is them",
               | which shifts the blame onto the developer and away from
               | the tools which are too easy to silently misuse.
               | 
               | This also ignores memory safety issues that aren't data
               | races, like buffer overflows, UAF, etc.
        
               | wat10000 wrote:
               | Any reasonable meaning of "proper" would include not
               | causing memory issues, so you've just defined away any
               | problems. Note that this is substantially different from
               | not having any problems.
               | 
               | The great lesson in software security of the past few
               | decades is that you can't just document "proper usage,"
               | declare all other usage to be the programmer's fault, and
               | achieve anything close to secure software. You must have
               | systems that either disallow unsafe constructs (e.g. rust
               | preventing references from escaping at compile time) or
               | can handle "improper usage" without allowing it to become
               | a security vulnerability (e.g. sandboxing).
               | 
               | Correctly use your concurrency primitives and you won't
               | have thread safety bugs, hooray! And when was the last
               | time you found a bug in C-family code caused by someone
               | who didn't correctly use concurrency primitives because
               | the programmer incorrectly believed that a certain piece
               | of mutable data would only be accessed on a single
               | thread? I'll give you my answer: it was yesterday. Quite
               | likely the only reason it's not today is because I have
               | the day off.
        
               | kanbankaren wrote:
               | > And when was the last time you found a bug in C-family
               | code caused by someone who didn't correctly use
               | concurrency primitives because the programmer incorrectly
               | believed that a certain piece of mutable data would only
               | be accessed on a single thread? I'll give you my answer:
               | it was yesterday.
               | 
               | You answered my question. My original argument was using
               | concurrency primitives "properly" in C++ prevents memory
               | issues and Rust isn't strictly necessary.
               | 
               | I have nothing against Rust. I will use it when they
               | freeze the language and publish a ISO spec and multiple
               | compilers are available.
        
               | bobmcnamara wrote:
               | dozens caused by folks thinking pthread_cancel() was the
               | right tool for the job
        
             | FpUser wrote:
             | I write high performance backends in C++. Works
             | approximately as described in article and all data are in
             | RAM and in structures specialized for access patterns.
             | Works like a charm and runs 24x7 without a trace of
             | problem.
             | 
             | I've never had a single complaint from my customers. Well I
             | do have bugs in logic during development but those are
             | found and eliminated after testing. And every new backend I
             | do I base on already battle tested C++ foundation code. Why
             | FFS would I ever want to change it (rewrite in Rust). As a
             | language Rust has way less features that I am accustomed to
             | use and this safety of Rust does not provide me any
             | business benefits. It is quite the opposite. I would just
             | lose time, money and still have those same logical bugs to
             | iron out.
        
               | acdha wrote:
               | How many other programmers have you trained up to that
               | level of results? Can you get them to work on Windows,
               | Chrome, etc. so users stop getting exposed to bugs which
               | are common in C-like languages but not memory-safe
               | languages?
        
               | FpUser wrote:
               | I do not train programmers. I hire subcontractors when I
               | need help. They're all same level as myself or better.
               | Easy to find amongst East Europeans and does not cost
               | much. Actually cheaper than some mediocre programmer from
               | North America who can only program using single language
               | / framework and has no clue about architecture and how
               | various things work together in general.
        
         | npalli wrote:
         | Rust is the future of systems programming and will always be
         | for the foreseeable future. The memory issue will mostly be
         | addressed as needed, see from John Carmack yesterday[1], the
         | C++ ecosystem advantage (a broad sense of how problems whether
         | DS, Storage, OS, Networking, etc. have been solved) will be
         | very hard to overcome for newer programming languages. I think
         | it is ironic how modern C++ folks just keep chugging along
         | releasing products while Rust folks are generally haranguing
         | everyone about "memory safety" and generally leaving half
         | finished projects (turns out writing Rust code is more fun than
         | reading someone else, who would have guessed).
         | 
         | [1] https://x.com/ID_AA_Carmack/status/1935353905149341968
        
           | wgjordan wrote:
           | > The memory issue will mostly be addressed as needed
           | 
           | I have no allegiance to either lang ecosystem, but I think
           | it's an overly optimistic take to consider memory safety a
           | solved problem from a tweet about fil-c, especially
           | considering "the performance cost is not negligible" (about
           | 2x according to a quick search?)
        
             | npalli wrote:
             | Performance drop of 2x for memory safety critical sections
             | vs Rust rewrite taking years/decades, not even a contest.
             | Now, if that drop was 10x maybe, but at 2x it is no brainer
             | to continue with C++. I'm not certain Fil-C totally works
             | in all cases, but it is an example of how the ecosystem
             | will evolve to solve this issue and not migrate to Rust.
        
               | hexaga wrote:
               | What would you consider to be a non memory safety
               | critical section? I tried to answer this and ended up in
               | a chain of 'but wait, actually memory issues here would
               | be similarly bad...', mainly because UB and friends tend
               | to propagate and make local problems very non-local.
        
         | secondcoming wrote:
         | What's this Rust thing?
        
         | caudena wrote:
         | Because we are on the 'unsafe' territory. And Rust doesn't even
         | have a defined memory model. Rust is a little bit immature. We
         | have some other services written in Rust though.
        
       | wslh wrote:
       | Thank you for the AMA. A few initial questions:
       | 
       | - Would it be possible to open source your DB in the future? I
       | think there are challenges in blockchain analysis (e.g. internal
       | transactions) that goes beyond the specific DB.
       | 
       | - Having used Chainalysis and others, your product seems superior
       | based on your presentation. Which blockchains do you support?
       | 
       | - Is there a "HN Code" to test Prism?
        
         | caudena wrote:
         | Thanks for the questions! We don't currently have plans to
         | open-source it. For anything else, feel free to reach out at
         | pa@caudena.com - happy to discuss further there. We'd like to
         | keep this thread focused on the technical side rather than
         | product discussions :)
        
       | Snoozus wrote:
       | If the FBI tells you wallet A and wallet B belong to the same
       | actor, how do you use that information, so that they can see it
       | on their view, without leaking it to Europol?
        
         | caudena wrote:
         | Are you from CA or CT? :)
         | 
         | FBI and Europol will work with the same forest (unless they are
         | using on-premise setup), but with different "patches".
        
       | BiraIgnacio wrote:
       | I didn't even know there were companies doing work in the
       | "blockchain services" space. Kinda cool, tech begets tech, begets
       | tech.
       | 
       | Love the C++ work, btw
        
       | canyp wrote:
       | You really had to call it Prism (PRISM), didn't you?
       | 
       | It's great to see C++ resulting in orders of magnitude cost
       | reduction anyway. Do you have more details on the various C++
       | tricks done for optimization?
        
         | caudena wrote:
         | Yeah, we figured people would compare it to PRISM :)
         | 
         | There are many possible optimizations, but they're all highly
         | specific to the particular problems you're trying to solve.
        
       | CharlesW wrote:
       | > _" Built a custom in-memory columnar/graph database from
       | scratch"_
       | 
       | This seems like an odd place to spend your resources. What do
       | Prism's benchmarks look like vs Memgraph, KX kdb+, Apache Ignite,
       | TigerGraph, etc.?
        
         | actionfromafar wrote:
         | Can't it be related to that data field sizes are very fixed and
         | never changes?
        
       | kayamon wrote:
       | why you spyin on folks
        
       ___________________________________________________________________
       (page generated 2025-06-19 23:01 UTC)