[HN Gopher] JanusGraph - Distributed, open source, scalable grap...
       ___________________________________________________________________
        
       JanusGraph - Distributed, open source, scalable graph database
        
       Author : patternexon
       Score  : 68 points
       Date   : 2021-07-07 15:19 UTC (7 hours ago)
        
 (HTM) web link (janusgraph.org)
 (TXT) w3m dump (janusgraph.org)
        
       | mrdoops wrote:
       | A lot of comments not sure about what Graph DBs are good for:
       | 
       | * Flexible knowledge association i.e. Knowledge Graphing
       | 
       | * Modeling and querying associations / models with many-steps-
       | removed requirements
       | 
       | * Expert Systems / Inference Engines
       | 
       | * Lazy traversal for complex job scheduling
       | 
       | Graph DBs are not good at being a general purpose 95% of use
       | cases database. Just use Postgres/MySQL if you're not sure. We
       | use Neptune (AWS managed GraphDB) to model cybersecurity
       | dependencies between many companies and report on supply chain
       | vulnerabilities many steps removed. Those kinds of queries are
       | non-trivial and expensive on anything but a Graph Database.
       | 
       | As GraphDBs meet niche query requirements you usually have other
       | databases involved in the full application. If you want to
       | tractably manage many databases in a system you ideally want to
       | be in streaming / event sourced semantics. If you're already in
       | an imperative crud-around-data / batch pipeline you'll find
       | greater maintenance costs in adopting a GraphDB or any additional
       | DB for that matter.
        
         | jshen wrote:
         | I have yet to see streaming/event sourcing work well. Every
         | time I've seen it used it's caused more problems than it's
         | resolved. The main problems, out of order events and/or slow to
         | propagate events.
        
           | mrdoops wrote:
           | From a technical perspective there are plenty of ways to
           | serialize / order some stream of events in a reasonable way.
           | Whether that's implementing your event store on top of a
           | transactionally secure database (e.g. Postgres, not Mongo) or
           | using a higher throughput, less persistently secure solution
           | like Kafka. There's also ways to deal with eventual
           | consistency and distributed transactions (SAGAs) depending on
           | the need.
           | 
           | The hard part is that ES/Streaming systems work best / almost
           | necessitate a clean and clear domain model. A clean and clear
           | domain model requires a lot of discussion and consensus with
           | domain experts and product owners. Buy in to have the kind of
           | discussions needed is the source of the issues I've
           | experienced with these kind of systems. CRUD can paint over a
           | lot of cloudy abstract concepts for better or for worse.
           | These kind of discussions are energy intensive and mentally
           | painful to cast light on the cloudy thoughts.
           | 
           | There's not great streaming/es support on a language level
           | outside of the robust actor model systems (e.g.
           | Erlang/Elixir). There are systems like Akka that simulate
           | that to some extent on runtimes like JVM, but a cooperative
           | scheduler and an actor model don't mix great. For non-actor
           | model aspects I've been seeing more service level dataflow
           | systems like KSQL / MaterializeDB gain traction, but are
           | nevertheless a solution for read-models not application
           | logic.
        
             | jshen wrote:
             | In short, making streaming architectures work require great
             | central planing, a grand architect in the sky, or they fail
             | miserably. I think that is an argument against streaming
             | architectures ;)
        
               | mrdoops wrote:
               | I think that's a bit of a reduction/straw man to say they
               | need a lot of architecture/central-planning. An
               | individual developer can do it so long as they have
               | access to domain experts to ask the right questions. Lack
               | of proper planning will result in any architecture
               | failing miserably - best not to code until we know what
               | to code.
               | 
               | The implementation overhead as far as code-to-write for
               | streaming architectures is comparable to CRUD. But there
               | is less knowledge dispersed about practices on how to do
               | it, so there is the cost of learning. It is more cutting
               | edge after all.
        
               | jshen wrote:
               | It is admittedly an oversimplification for affect, but
               | it's more or less what you said. Maybe I've worked at
               | bigger companies, we have over 1000 devs in my division,
               | and trying to get them all aligned is very very
               | difficult. I think central planning is a good analogy for
               | that. The architecture that requires less central
               | planning/coordination will likely do better in such a
               | dynamic.
        
               | mrdoops wrote:
               | I expect streaming will be adopted like anything - first
               | in smaller more mobile teams working in greenfield
               | contexts, then later in large organizations when the idea
               | is less novel and lower risk/cost to apply at scale.
               | 
               | Regardless I feel streaming has been around long enough
               | now it should be urgently considered anywhere where the
               | words "data pipeline" and "scale" are thrown around
               | together frequently.
        
               | jshen wrote:
               | I've implemented it several times, and it hasn't been
               | clearly better. Don't get me wrong, the alternatives
               | haven't been great either. I think it's just hard when
               | you have large organizations, large number of systems
               | some of which are decades old, and you have high volumes
               | of interactions/transactions.
        
       | Graphguy wrote:
       | Detailed read from two contributors on the project-
       | https://www.ibm.com/cloud/blog/database-deep-dives-janusgrap...
        
       | AtNightWeCode wrote:
       | No
        
         | amitport wrote:
         | Elaborate?
        
           | AtNightWeCode wrote:
           | You elaborate. Graph databases cause expensive writes and
           | cheap reads. Something anybody typically never ever needs. I
           | worked with neo4j and cypher. Same garbage.
        
             | staticassertion wrote:
             | First of all, "expensive writes and cheap reads" is the
             | common case. Second, nothing about graph databases implies
             | expensive writes and cheap reads.
        
               | AtNightWeCode wrote:
               | This is the ONLY reason to use a graph databases.... Go a
               | head and down vote instead of learn something. :)
        
               | staticassertion wrote:
               | That is not the only reason to use a graph database, and
               | you're not exactly making a compelling argument, or a
               | coherent one. I doubt I have much to learn from you on
               | the subject.
        
             | bryanrasmussen wrote:
             | >Graph databases cause expensive writes and cheap reads.
             | Something anybody typically never ever needs.
             | 
             | I find this statement really surprising, just about every
             | application not dealing with money I've ever been on has
             | had lots more reads than writes and would thereby benefit
             | if the reads were cheap - obviously nobody wants expensive
             | writes but if the benefit is cheap reads and the
             | expensiveness of writes can be dealt with by batching etc.
             | I guess it's an acceptable tradeoff.
        
               | AtNightWeCode wrote:
               | An Orch like you will never understand the difference
               | between a fruit and berry. You better get off the
               | Internet before you break it all with your hairy feet.
               | Surprising? It is basic db design.
        
             | arthurcolle wrote:
             | How large were your datasets?
        
             | boxed wrote:
             | That's a bit much. I have a hobby project that is the graph
             | of all taxons. It is _not_ write heavy, to say the least.
             | (I use mysql for it but still :))
        
               | AtNightWeCode wrote:
               | Pretty much common knowledge...
               | 
               | Doc db -> very cheap reads / very cheap writes
               | 
               | Sgl db -> cheap reads / expensive writes
               | 
               | Graph db -> cheap reads -> very expensive writes
               | 
               | ...for the type of data.
        
               | spinningslate wrote:
               | If it were as simple as that, then surely the whole world
               | would be using document DBs exclusively.
               | 
               | The whole world isn't doing that. So maybe there's more
               | to it.
               | 
               | I'm doing some work right now where a graph is a good
               | conceptual fit to the problem space. Writes are much less
               | common than reads. A graph-theoretic approach is a good
               | fit for the queries it needs to support; transitive
               | closure and topological sort for example.
               | 
               | Would I use a graph DB for a heavily transactional system
               | like banking? No. Different problem, different
               | requirements, different tech choice.
               | 
               | But you seem to be suggesting that there are no problem
               | scenarios where the characteristics of a graph DB are a
               | good fit. That seems naive at best.
        
               | AtNightWeCode wrote:
               | I am a senior IT architect. I give you the junior card
               | but in the whole, you don't know at all what you are
               | babbling about. ALL databases have tradeoffs.
        
               | spinningslate wrote:
               | Please refrain from personal attacks, it doesn't
               | contribute to the discussion. And it's not consistent
               | with the community guidelines [0].
               | 
               | > ALL databases have tradeoffs
               | 
               | That was exactly my point.
               | 
               | [0] https://news.ycombinator.com/newsguidelines.html
        
               | staticassertion wrote:
               | This is practically parody...
               | 
               | I'm the founder of a company building a graph product.
               | I've talked to numerous researchers in the field, read
               | papers on the subject, I've even reviewed PHD candidates
               | thesis for universities. I routinely field offers for
               | consulting explicitly on graph database technology.
               | 
               | You haven't said anything of substance and the little you
               | have said has only served to convey your own ignorance on
               | the subject. I've flagged a few of your posts since they
               | break HN rules.
        
       | rektide wrote:
       | There were a couple years where it was mostly abandoned. Good to
       | see this solid graph database being well maintained. the
       | milestones[1] mostly show a lot of upgrading libraries, some
       | enhancements/features sprinkled in, but for a while Janus was
       | nearly abandoned.
       | 
       | Maintenance re-started in 2017, with IBM & Google stepping up to
       | back it[2].
       | 
       | [1]
       | https://github.com/JanusGraph/janusgraph/milestones?state=cl...
       | 
       | [2] https://architecht.io/google-ibm-back-new-open-source-
       | graph-...
        
       | speedgoose wrote:
       | Sorry to ask about it, but while deciding the name of your Java
       | graph database, how did you ended up with anus?
        
         | CameronNemo wrote:
         | https://en.m.wikipedia.org/wiki/Janus
        
           | speedgoose wrote:
           | Yes but a lot of Java projects start with a J and they must
           | have thought about it.
        
       | freewilly1040 wrote:
       | Graph databases have been the great white whale at my org for a
       | number of years. We gave a crack at Janus a while back. It (like
       | a few attempts at Neo4J) failed to deliver on the promise of
       | unlocking queries with more than a hop or two, while dramatically
       | underperforming on those one or two hop queries vs a graph
       | implemented in MySQL.
        
         | tschellenbach wrote:
         | I also don't see many valid use cases for graph databases.
        
           | freewilly1040 wrote:
           | Marketing and fraud detection are pretty valid imo. Just
           | inherently hard to scale.
           | 
           | I do think there's a valid question of how useful n-hop
           | queries are for an N that is greater than 2 or 3.
        
           | tejtm wrote:
           | Have you tried getting grants promising they will deliver?
           | </s>
        
           | alexott wrote:
           | There is a lot - customer support (aka customer 360), fraud
           | detection, some maintenance cases, inventory,
           | recommendations, etc. But it's heavily dependent on
           | requirements - what should be response time. Most of graph
           | databases are good in fast response time, but will do it only
           | for 2-3 hops from known start points. For many other things,
           | graph analytics with Spark or something like could be better
        
             | pphysch wrote:
             | 2-3 hops (JOINs) is well within RDBMS territory, no?
        
           | jshen wrote:
           | They are great for data exploration, data science, analytics,
           | etc. I would NOT put one as a dependency on a user experience
           | though.
        
           | rektide wrote:
           | > I also don't see many valid use cases for graph databases.
           | 
           | It's the most general purpose means I can see to model
           | entities. I can't see many invalid uses.
        
         | im_down_w_otp wrote:
         | We kind of ran into a similar problem. Our core data (multi-
         | context execution/event traces) is fundamentally graph shaped,
         | but it's also among the pathologically poor graph structures
         | for most general purpose graph databases and their associated
         | query DSLs and execution planners/optimizers, so we had to
         | build a solution that was tailored more for our domain
         | (verifying system properties of complex interacting
         | components).
         | 
         | Which I suppose kind of typifies the problem. Graph databases
         | are fantastic because they let you flexibly and coherently
         | model practically anything. But, perhaps principally because of
         | this, they can become an impediment once you better understand
         | the nuances and idiosyncrasies of your domain, and thus need
         | something that has more optimal (or perhaps predictable)
         | performance for the kinds of questions you know you need to ask
         | over a representation of your domain/data that you know is
         | sufficient?
        
       ___________________________________________________________________
       (page generated 2021-07-07 23:01 UTC)