[HN Gopher] JanusGraph - Distributed, open source, scalable grap...
___________________________________________________________________
JanusGraph - Distributed, open source, scalable graph database
Author : patternexon
Score : 68 points
Date : 2021-07-07 15:19 UTC (7 hours ago)
(HTM) web link (janusgraph.org)
(TXT) w3m dump (janusgraph.org)
| mrdoops wrote:
| A lot of comments not sure about what Graph DBs are good for:
|
| * Flexible knowledge association i.e. Knowledge Graphing
|
| * Modeling and querying associations / models with many-steps-
| removed requirements
|
| * Expert Systems / Inference Engines
|
| * Lazy traversal for complex job scheduling
|
| Graph DBs are not good at being a general purpose 95% of use
| cases database. Just use Postgres/MySQL if you're not sure. We
| use Neptune (AWS managed GraphDB) to model cybersecurity
| dependencies between many companies and report on supply chain
| vulnerabilities many steps removed. Those kinds of queries are
| non-trivial and expensive on anything but a Graph Database.
|
| As GraphDBs meet niche query requirements you usually have other
| databases involved in the full application. If you want to
| tractably manage many databases in a system you ideally want to
| be in streaming / event sourced semantics. If you're already in
| an imperative crud-around-data / batch pipeline you'll find
| greater maintenance costs in adopting a GraphDB or any additional
| DB for that matter.
| jshen wrote:
| I have yet to see streaming/event sourcing work well. Every
| time I've seen it used it's caused more problems than it's
| resolved. The main problems, out of order events and/or slow to
| propagate events.
| mrdoops wrote:
| From a technical perspective there are plenty of ways to
| serialize / order some stream of events in a reasonable way.
| Whether that's implementing your event store on top of a
| transactionally secure database (e.g. Postgres, not Mongo) or
| using a higher throughput, less persistently secure solution
| like Kafka. There's also ways to deal with eventual
| consistency and distributed transactions (SAGAs) depending on
| the need.
|
| The hard part is that ES/Streaming systems work best / almost
| necessitate a clean and clear domain model. A clean and clear
| domain model requires a lot of discussion and consensus with
| domain experts and product owners. Buy in to have the kind of
| discussions needed is the source of the issues I've
| experienced with these kind of systems. CRUD can paint over a
| lot of cloudy abstract concepts for better or for worse.
| These kind of discussions are energy intensive and mentally
| painful to cast light on the cloudy thoughts.
|
| There's not great streaming/es support on a language level
| outside of the robust actor model systems (e.g.
| Erlang/Elixir). There are systems like Akka that simulate
| that to some extent on runtimes like JVM, but a cooperative
| scheduler and an actor model don't mix great. For non-actor
| model aspects I've been seeing more service level dataflow
| systems like KSQL / MaterializeDB gain traction, but are
| nevertheless a solution for read-models not application
| logic.
| jshen wrote:
| In short, making streaming architectures work require great
| central planing, a grand architect in the sky, or they fail
| miserably. I think that is an argument against streaming
| architectures ;)
| mrdoops wrote:
| I think that's a bit of a reduction/straw man to say they
| need a lot of architecture/central-planning. An
| individual developer can do it so long as they have
| access to domain experts to ask the right questions. Lack
| of proper planning will result in any architecture
| failing miserably - best not to code until we know what
| to code.
|
| The implementation overhead as far as code-to-write for
| streaming architectures is comparable to CRUD. But there
| is less knowledge dispersed about practices on how to do
| it, so there is the cost of learning. It is more cutting
| edge after all.
| jshen wrote:
| It is admittedly an oversimplification for affect, but
| it's more or less what you said. Maybe I've worked at
| bigger companies, we have over 1000 devs in my division,
| and trying to get them all aligned is very very
| difficult. I think central planning is a good analogy for
| that. The architecture that requires less central
| planning/coordination will likely do better in such a
| dynamic.
| mrdoops wrote:
| I expect streaming will be adopted like anything - first
| in smaller more mobile teams working in greenfield
| contexts, then later in large organizations when the idea
| is less novel and lower risk/cost to apply at scale.
|
| Regardless I feel streaming has been around long enough
| now it should be urgently considered anywhere where the
| words "data pipeline" and "scale" are thrown around
| together frequently.
| jshen wrote:
| I've implemented it several times, and it hasn't been
| clearly better. Don't get me wrong, the alternatives
| haven't been great either. I think it's just hard when
| you have large organizations, large number of systems
| some of which are decades old, and you have high volumes
| of interactions/transactions.
| Graphguy wrote:
| Detailed read from two contributors on the project-
| https://www.ibm.com/cloud/blog/database-deep-dives-janusgrap...
| AtNightWeCode wrote:
| No
| amitport wrote:
| Elaborate?
| AtNightWeCode wrote:
| You elaborate. Graph databases cause expensive writes and
| cheap reads. Something anybody typically never ever needs. I
| worked with neo4j and cypher. Same garbage.
| staticassertion wrote:
| First of all, "expensive writes and cheap reads" is the
| common case. Second, nothing about graph databases implies
| expensive writes and cheap reads.
| AtNightWeCode wrote:
| This is the ONLY reason to use a graph databases.... Go a
| head and down vote instead of learn something. :)
| staticassertion wrote:
| That is not the only reason to use a graph database, and
| you're not exactly making a compelling argument, or a
| coherent one. I doubt I have much to learn from you on
| the subject.
| bryanrasmussen wrote:
| >Graph databases cause expensive writes and cheap reads.
| Something anybody typically never ever needs.
|
| I find this statement really surprising, just about every
| application not dealing with money I've ever been on has
| had lots more reads than writes and would thereby benefit
| if the reads were cheap - obviously nobody wants expensive
| writes but if the benefit is cheap reads and the
| expensiveness of writes can be dealt with by batching etc.
| I guess it's an acceptable tradeoff.
| AtNightWeCode wrote:
| An Orch like you will never understand the difference
| between a fruit and berry. You better get off the
| Internet before you break it all with your hairy feet.
| Surprising? It is basic db design.
| arthurcolle wrote:
| How large were your datasets?
| boxed wrote:
| That's a bit much. I have a hobby project that is the graph
| of all taxons. It is _not_ write heavy, to say the least.
| (I use mysql for it but still :))
| AtNightWeCode wrote:
| Pretty much common knowledge...
|
| Doc db -> very cheap reads / very cheap writes
|
| Sgl db -> cheap reads / expensive writes
|
| Graph db -> cheap reads -> very expensive writes
|
| ...for the type of data.
| spinningslate wrote:
| If it were as simple as that, then surely the whole world
| would be using document DBs exclusively.
|
| The whole world isn't doing that. So maybe there's more
| to it.
|
| I'm doing some work right now where a graph is a good
| conceptual fit to the problem space. Writes are much less
| common than reads. A graph-theoretic approach is a good
| fit for the queries it needs to support; transitive
| closure and topological sort for example.
|
| Would I use a graph DB for a heavily transactional system
| like banking? No. Different problem, different
| requirements, different tech choice.
|
| But you seem to be suggesting that there are no problem
| scenarios where the characteristics of a graph DB are a
| good fit. That seems naive at best.
| AtNightWeCode wrote:
| I am a senior IT architect. I give you the junior card
| but in the whole, you don't know at all what you are
| babbling about. ALL databases have tradeoffs.
| spinningslate wrote:
| Please refrain from personal attacks, it doesn't
| contribute to the discussion. And it's not consistent
| with the community guidelines [0].
|
| > ALL databases have tradeoffs
|
| That was exactly my point.
|
| [0] https://news.ycombinator.com/newsguidelines.html
| staticassertion wrote:
| This is practically parody...
|
| I'm the founder of a company building a graph product.
| I've talked to numerous researchers in the field, read
| papers on the subject, I've even reviewed PHD candidates
| thesis for universities. I routinely field offers for
| consulting explicitly on graph database technology.
|
| You haven't said anything of substance and the little you
| have said has only served to convey your own ignorance on
| the subject. I've flagged a few of your posts since they
| break HN rules.
| rektide wrote:
| There were a couple years where it was mostly abandoned. Good to
| see this solid graph database being well maintained. the
| milestones[1] mostly show a lot of upgrading libraries, some
| enhancements/features sprinkled in, but for a while Janus was
| nearly abandoned.
|
| Maintenance re-started in 2017, with IBM & Google stepping up to
| back it[2].
|
| [1]
| https://github.com/JanusGraph/janusgraph/milestones?state=cl...
|
| [2] https://architecht.io/google-ibm-back-new-open-source-
| graph-...
| speedgoose wrote:
| Sorry to ask about it, but while deciding the name of your Java
| graph database, how did you ended up with anus?
| CameronNemo wrote:
| https://en.m.wikipedia.org/wiki/Janus
| speedgoose wrote:
| Yes but a lot of Java projects start with a J and they must
| have thought about it.
| freewilly1040 wrote:
| Graph databases have been the great white whale at my org for a
| number of years. We gave a crack at Janus a while back. It (like
| a few attempts at Neo4J) failed to deliver on the promise of
| unlocking queries with more than a hop or two, while dramatically
| underperforming on those one or two hop queries vs a graph
| implemented in MySQL.
| tschellenbach wrote:
| I also don't see many valid use cases for graph databases.
| freewilly1040 wrote:
| Marketing and fraud detection are pretty valid imo. Just
| inherently hard to scale.
|
| I do think there's a valid question of how useful n-hop
| queries are for an N that is greater than 2 or 3.
| tejtm wrote:
| Have you tried getting grants promising they will deliver?
| </s>
| alexott wrote:
| There is a lot - customer support (aka customer 360), fraud
| detection, some maintenance cases, inventory,
| recommendations, etc. But it's heavily dependent on
| requirements - what should be response time. Most of graph
| databases are good in fast response time, but will do it only
| for 2-3 hops from known start points. For many other things,
| graph analytics with Spark or something like could be better
| pphysch wrote:
| 2-3 hops (JOINs) is well within RDBMS territory, no?
| jshen wrote:
| They are great for data exploration, data science, analytics,
| etc. I would NOT put one as a dependency on a user experience
| though.
| rektide wrote:
| > I also don't see many valid use cases for graph databases.
|
| It's the most general purpose means I can see to model
| entities. I can't see many invalid uses.
| im_down_w_otp wrote:
| We kind of ran into a similar problem. Our core data (multi-
| context execution/event traces) is fundamentally graph shaped,
| but it's also among the pathologically poor graph structures
| for most general purpose graph databases and their associated
| query DSLs and execution planners/optimizers, so we had to
| build a solution that was tailored more for our domain
| (verifying system properties of complex interacting
| components).
|
| Which I suppose kind of typifies the problem. Graph databases
| are fantastic because they let you flexibly and coherently
| model practically anything. But, perhaps principally because of
| this, they can become an impediment once you better understand
| the nuances and idiosyncrasies of your domain, and thus need
| something that has more optimal (or perhaps predictable)
| performance for the kinds of questions you know you need to ask
| over a representation of your domain/data that you know is
| sufficient?
___________________________________________________________________
(page generated 2021-07-07 23:01 UTC)