[HN Gopher] Build C++ Graph Analytics Without Worrying About Memory
       ___________________________________________________________________
        
       Build C++ Graph Analytics Without Worrying About Memory
        
       Author : taubek
       Score  : 54 points
       Date   : 2022-10-06 13:49 UTC (9 hours ago)
        
 (HTM) web link (memgraph.com)
 (TXT) w3m dump (memgraph.com)
        
       | worthless443 wrote:
       | Automatic memory management is indeed the first thing one needs
       | to look for writing performance critical software, and that's a
       | first in my check-list. But
       | 
       | > in-memory storage of databases
       | 
       | Doesn't that sound a bit expensive to have large capacity memory?
       | Although the expense of R/W IO is far cheaper for in-memory
       | analysis. Is such trade-off worth it?
        
         | mbuda wrote:
         | Excellent observation/question :D It depends; sometimes, it's
         | worth it, and sometimes it is not (as always with tradeoffs).
         | Graphs are a bit specific because most of the traversals or
         | expensive graph analytics like PageRank touch the whole graph
         | (even multiple times) -> the entire graph will end up in memory
         | -> why not keep it in memory for faster performance?
         | 
         | But for a vast dataset, the hardware cost might be too much. I
         | think we are aware of the tradeoff. We'll probably provide disk
         | first storage option at some point because that's definitely a
         | valid setup (sometimes the only possible setup). Ofc, we'll
         | invest time in making it as performant as possible.
         | 
         | Do you have some specific workload in mind? :D
        
           | worthless443 wrote:
           | If a large graph is needed to be read multiple times, sure
           | memory bandwidth will result in the most performance possible
           | under the context of this workload like interacting with
           | PageRank (and going further with optimization techniques on
           | memory allocation and management, will boost the performance
           | even further).
           | 
           | So to my understanding (and a novice one at that), the graph
           | should be stored on disk first, upon initializing the objects
           | will have to be an one-time copy to volatile memory but I
           | question, memory regions are more likely to yield faults and
           | get corrupt and thus graph stored in-memory is also
           | completely flushed? (unless the results are being saved to
           | disk in-between specific intervals of time?) Does that make
           | any sense?
        
             | mbuda wrote:
             | I'm not sure I understand the part about corruption. How
             | would data in memory become corrupted?
             | 
             | How Memgraph currently works, it stores data in memory, and
             | async starts writing data to disk in small data chunks
             | called deltas, later these chunks are deleted and replaced
             | with the whole graph snapshot (there is also a sync option,
             | but that's slower in terms of committing a transaction,
             | letting the user know data is written, e.g., RocksDB works
             | similarly). All disk-related stuff is purely for durability
             | (recovery after the Memgraph process restarts and all
             | interactions with the disk are made automatically in the
             | background during standard system runtime and startup
             | time).
        
               | worthless443 wrote:
               | > it stores data in memory, and async starts writing data
               | to disk in small data chunks called deltas, later these
               | chunks are deleted and replaced with the whole graph
               | snapshot
               | 
               | Thanks, that fairly answers my question of recoverability
               | of in-memory graphs.
        
               | mbuda wrote:
               | Perfect!
        
       | timmy777 wrote:
       | Awesome. But how is this different from dgraph?
        
         | mbuda wrote:
         | If you are asking about Memgraph in general, overall it's a
         | graph storage + analytics system. DGraph is probably more on
         | the pure storage side, while Memgraph is more about graph
         | analytics (in-memory graph storage but it also stores data on
         | disk). In terms of the API, DGraph exposes GraphQL, while
         | Memgraph is Cypher + Bolt protocol. There is much more, which
         | aspect are you most intrested in? :D
        
       | mbuda wrote:
       | Is there any interest in detailed comparison between C++ and Rust
       | when it comes to different tradeoffs when implementing/using the
       | query modules?
        
         | ncmncm wrote:
         | Differences would be about staffing. For any given specialty,
         | having C++ skills too is common.
         | 
         | Finding somebody with needed skills and also Rust experience
         | will be impossible, so you would either need to plan on
         | training up some Ruster on the specialty, or hire somebody
         | already up on it with C++ skills and expect them to pick up
         | enough Rust to get by.
        
           | mbuda wrote:
           | Yep, from the business perspective that's by far the biggest
           | concern :D
        
       ___________________________________________________________________
       (page generated 2022-10-06 23:01 UTC)