[HN Gopher] Possibly all the ways to get loop-finding in graphs ...
       ___________________________________________________________________
        
       Possibly all the ways to get loop-finding in graphs wrong
        
       Author : matt_d
       Score  : 188 points
       Date   : 2024-09-11 05:47 UTC (17 hours ago)
        
 (HTM) web link (www.chiark.greenend.org.uk)
 (TXT) w3m dump (www.chiark.greenend.org.uk)
        
       | wonnage wrote:
       | Any time you're looking for a graph algorithm there's a pretty
       | good chance Tarjan's name is on it
        
         | 082349872349872 wrote:
         | Tarjan's SCC is elegant with a heavily imperative flavour; has
         | anyone come up with an elegantly pure (yet still linear)
         | functional variation?
         | 
         | EDIT: looks like Okasaki has a linear-time functional; I'm
         | pragmatic enough that 2 passes are (even small constant passes
         | would be) fine for me -- my graphs are all tiny enough that I'm
         | not about to store them on tape.
         | 
         | EDIT2: my bad, not Okasaki himself, King & Launchbury:
         | https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...
         | scc' :: Graph -> Forest Vertex         scc' g = dfs g (reverse
         | (postOrd (transposeG g)))
         | 
         | (but their sol'n requires laziness to avoid generating an
         | infinite intermediate; has anyone done it linearly,
         | functionally, and eagerly?)
        
           | n4r9 wrote:
           | I'm not too deep into CS theory so I'm not sure if this is an
           | answer to your question but... SCC algorithms are useful when
           | building routeable road networks from source data (like OSM)
           | in order to filter out isolated sub-networks (like an
           | airfield). We use dotnet but have something that looks
           | possibly similar to what you wrote(?):
           | StronglyConnectedVertices(Vertex source) => ReachableVertices
           | (source).Intersect(ReverseReachableVertices(source))
           | 
           | The methods ReachableVertices and ReverseReachableVertices
           | are effectively depth-first searches on the graph.
           | 
           | The challenge - in terms of performance - is actually not
           | finding a single SCC but returning the full set of
           | potentially thousands SCCs in large networks. I originally
           | implemented an iterative (imperative) version of Gabow's
           | "Path-Based" algorithm but it was too slow on continent-sized
           | datasets. We did some experimentation with a parallel variant
           | of Tarjan which didn't improve things much.
           | 
           | Eventually I settled for something "good enough" - randomly
           | pick a vertex, calculate the SCC with the above code, return
           | it if it's large enough (>1k vertices) otherwise filter it
           | out, keep going until I've accounted for >90% of the graph.
           | The remaining <10% gets filtered out. That's basically
           | imperative (using a while loop) but could probably be
           | functionalised.
           | 
           | -----------------------------------------------------
           | 
           | EDIT I've taken a better look at the linked paper and what I
           | wrote is _definitely_ not an answer to your question. But I
           | 'll leave it in case anyone's interested.
        
       | Hendrikto wrote:
       | I think it is strange that there is no mention of either time or
       | space complexity, in an article about graph algorithms.
       | 
       | If you don't need to be efficient, any problem is relatively easy
       | to solve.
        
         | 082349872349872 wrote:
         | If you're not yet correct, complexity is moot.
         | 
         | > _When in doubt, use brute force._ --KLT
        
       | xelxebar wrote:
       | First time I've heard of the disjoint-set data structure. For a
       | DSF implementation, Wikipedia [0] gives the wildest asymptotic
       | complexity I've seen, using the _inverse Ackermann function_!
       | "Almost constant time" indeed.
       | 
       | [0]:https://en.wikipedia.org/wiki/Disjoint-set_data_structure
        
         | kragen wrote:
         | more commonly it's called union-find; it came up on here last
         | thursday: https://news.ycombinator.com/item?id=41450948
        
         | shepherdjerred wrote:
         | Disjoint sets are awesome! You can use them with a _very_
         | simple algorithm to form a minimum spanning tree.
         | 
         | https://en.wikipedia.org/wiki/Kruskal%27s_algorithm
        
         | kevinventullo wrote:
         | What's especially surprising to me is that inverse Ackerman is
         | provably optimal for this problem, regardless of
         | implementation.
        
       | tromp wrote:
       | The problem of finding cycles in (pseudo randomly generated)
       | graphs is actually one of the first examples of an _asymmetric_
       | Proof-of-Work puzzle [1]. Asymmetry means that while finding a
       | cycle takes a lot of memory and effort, a solution can be
       | verified instantly.
       | 
       | Several of the suggested algorithms were in fact implemented in
       | the various Cuckoo Cycle solvers [2]. But by maintaining directed
       | paths, my union-find based algorithm _did_ identify the entire
       | cycle and not just the cycle completing edge [1] (implemented in
       | [3]).
       | 
       | This algorithm suffers from three problems though. One, it only
       | finds one among a set of interconnected cycles. Second, it's
       | rather slow, being latency bound by the random memory accesses.
       | Third, it uses much more memory than needed.
       | 
       | So repeatedly trimming edges with only one endpoint is a much
       | faster approach. And for the random graphs that the PoW
       | generates, it can be done using only 1 bit per edge to record if
       | it's been trimmed.
       | 
       | Once sufficiently trimmed, a loop tracing algorithm (like [4] for
       | the Cuckatoo Cycle variant) can find _all_ remaining (simple)
       | cycles.
       | 
       | [1] https://www.semanticscholar.org/paper/Cuckoo-Cycle%3A-A-
       | Memo...
       | 
       | [2] https://github.com/tromp/cuckoo
       | 
       | [3]
       | https://github.com/tromp/cuckoo/blob/master/src/cuckoo/cycle...
       | 
       | [4]
       | https://github.com/tromp/cuckoo/blob/master/src/cuckatoo/gra...
        
         | pxeger1 wrote:
         | I initially took issue with "first asymmetric proof of work",
         | thinking that in practice any NP-complete problem would surely
         | be fine, even if we can't prove that it can't be solved
         | quickly. But is it difficult to find NP-complete problems which
         | can't also be quickly approximated? And I suppose it might also
         | be hard to find NP-complete problems in a "Goldilocks zone" of
         | difficulty to make for practical proofs of work, since their
         | complexity increases so sharply.
        
           | JohnKemeny wrote:
           | Approximation doesn't help, since you can require optimal
           | solutions, but yes, it's very difficult to come up with hard
           | instances. After all, you need to know the correct answer, so
           | you can't just randomly generate instances. In addition, SAT
           | solvers are ridiculously good these days.
        
           | tromp wrote:
           | For Proof-of-Work applications, at least in blockchains, you
           | need puzzles that
           | 
           | * can be pseudo randomly generated from a seed (usually the
           | hash of a partial blockchain header)
           | 
           | * have a known optimal solution method with a (nearly)
           | constant running time that's at most about one second (for
           | progress freeness this should be a small fraction of the
           | target block time)
           | 
           | NP-hard problems rarely satisfy both these constraints.
           | Finding fixed length cycles in random (bipartite) graphs with
           | billions of edges does.
        
         | JohnKemeny wrote:
         | After trimming, you can also branch on edges: either it is in a
         | cycle, in which you find one, otherwise you can remove it and
         | further reduce the instance by trimming more edges.
        
       | JohnVideogames wrote:
       | I had to tackle a problem much like this one when analysing the
       | polygons in planar graphs (analysis of planar chemical networks).
       | A fun approach I found was to use a Delaunay triangulation (the
       | dual of a Voronoi diagram; draw the Voronoi polygons and join
       | their centres if they share an edge). Then join the triangles
       | into larger polygons by removing edges that are in the Delaunay
       | triangulation but not in your original planar graph. Once you've
       | got no more edges to remove you've got all the polygons in the
       | graph and convenient triangles for rendering / analysis.
       | 
       | For planar graphs on the surface of a torus, like they faced
       | here, I just ended up adding extra images of the central unit
       | cell, and then removing the excess. Not elegant (lots of weird
       | corner cases) but convenient for visualisation.
        
         | JohnKemeny wrote:
         | Can you expand on what you wanted to analyze? Why didn't you
         | just output the polygons immediately?
        
         | gilleain wrote:
         | Chemistry does ring finding in various ways. I've always liked
         | the SSSR (smallest set of smallest rings) just for the name :)
         | 
         | https://depth-first.com/articles/2020/08/31/a-smallest-set-o...
        
           | slashdave wrote:
           | Unfortunately, as noted in your link, the SSSR solution is
           | not unique. Also unfortunate, Daylight (the inventor of
           | SMILES) added SSSR as an optional atom identifier in their
           | SMARTS query specification without realizing this. Because
           | OpenEye doesn't like SSSR, they actually leave out this
           | feature in their implement of SMARTS, breaking from standard.
           | 
           | https://daylight.com/dayhtml/doc/theory/theory.smarts.html
        
       | yohannesk wrote:
       | If this article triggers a desire to work on an actual problem
       | with graphs try https://leetcode.com/problems/freedom-trail
        
       | devit wrote:
       | Seems like a trivial problem to me.
       | 
       | If the graph is directed, do an SCC decomposition in linear time
       | using a graph library and then any SCC with size more than one
       | has at least a loop, trivially extracted by following any edges
       | in the SCC.
       | 
       | If the graph is undirected, compute a spanning tree in linear
       | time using a graph library and then any edge outside the tree
       | forms a loop, again trivially extractable from the edge and the
       | paths from its endpoints to their least common ancestor in the
       | tree.
       | 
       | Since those algorithms are already asymptotically optimal, it's
       | just an engineering problem of finding the solution with the best
       | constant factor given the precise data structures and goal in
       | question.
        
         | zeroonetwothree wrote:
         | The most obvious way is to just run a DFS, if you ever come
         | back to a node you've seen then you have a cycle (and by
         | storing parent pointers you can get the cycle back). That's
         | like CS 101 level.
         | 
         | However in the article it's actually a graph being modified. If
         | you don't want to run a full dfs every time, there are better
         | ways. This problem is called "online cycle detection" and you
         | can google some other algorithms.
        
         | layer8 wrote:
         | You obviously didn't read the article. It's about undirected
         | graphs, and about visualizing the loops, not just testing for
         | their existence. The final solution is indeed based on spanning
         | trees.
        
       | zeroonetwothree wrote:
       | This problem is called "online cycle detection" in the
       | literature. You can search for various better algorithms that
       | have been developed.
        
         | layer8 wrote:
         | The article isn't interested in the "online" aspect, but in
         | finding all (what turns out to be called) non-bridge edges, in
         | order to visualize the loops.
        
           | konstantinua00 wrote:
           | it is interested in both
        
       | karmakaze wrote:
       | I've always thought Floyd's tortoise and hare[0] algorithm was
       | handy as it can find a cycle in a DAG with low memory usage.
       | 
       | [0]
       | https://en.wikipedia.org/wiki/Cycle_detection#Floyd's_tortoi...
        
         | konstantinua00 wrote:
         | how does it work on DAG?
         | 
         | I always thought it's a linked list algo
        
           | tejtm wrote:
           | By never finding a cycle, it proves the graph is a DAG.
        
             | jerf wrote:
             | But you have to run it on a sequence, or perhaps even more
             | accurately, something that is putatively a sequence but may
             | have an infinite loop in it. Tortoise & hare requires an
             | unambiguous "next" function, which a graph does not have.
             | Modifying it to not require that so it can run on a graph
             | may be possible but that would constitute a different
             | algorithm for sure, with different complexity analysis,
             | etc.
        
           | pavon wrote:
           | It is useful for determining whether a given traversal of a
           | graph has encountered a cycle, rather than determining
           | whether the entire graph has any cycles. So different from
           | what the article was about, but still a nice tool to have in
           | the box.
        
         | water-data-dude wrote:
         | Here is where a smartass might point out that by definition you
         | won't find any cycles in your DAG
        
           | pavon wrote:
           | It's short for Directed Accidentally-cyclic Graph.
        
       | vouwfietsman wrote:
       | Funny I've had to do this occasionally and I always do the
       | following: for each edge E, attempt to construct a clockwise loop
       | from that edge by traversing each vertex and picking the edge
       | that is "most clockwise" compared to the incoming edge. Converse
       | for counter clockwise. This gives you, for each edge, the two
       | loops the edge is a part of (if they exist).
       | 
       | It sounds horribly slow, but isn't because the loop finding
       | process for one edge can mark all edges it traverses as either
       | being on the same loop, or not having a loop, for the orientation
       | its checking. These edges can be skipped henceforth.
       | 
       | This does not find all loops, only the smallest disjoint loops
       | (as every edge is at most part of two disjoint loops). But
       | basically results in a similar planar "face" partition the author
       | describes.
       | 
       | Its not clear to me whether my approach does not work (perhaps
       | the angle compares fail on a torus? I've never had to deal with
       | torii), or simply isn't suited because it misses some loops, but
       | I thought I'd share it anyway :-)
        
         | cvoss wrote:
         | Notwithstanding the limitation of assuming the graph is
         | embedded in a coordinate space where you can try to measure
         | angles, I think this algorithm still doesn't work.
         | 
         | I may be missing a detail, but it seems to fail on a graph
         | consisting of a loop with some extra edges branching off the
         | loop toward its interior and some edges branching off toward
         | the exterior. The left hand walk hits a dead end in one branch,
         | and the right hand walk dead ends in the other, no matter where
         | on the loop you start (except for some special cases).
         | 
         | Edit: strengthening the proposed counterexample
        
           | vouwfietsman wrote:
           | Hmm I'm having a hard time visualizing your counter example,
           | I'll try to think of it some more. An important detail I left
           | out though is that in my problem sets edges do not cross,
           | which I suppose simplifies the entire problem a lot :-)
        
             | mrgoldenbrown wrote:
             | I think you are trying to say your method only works if the
             | graph is "planar", ie it can be drawn on a flat piece of
             | paper with no lines crossing.
        
               | vouwfietsman wrote:
               | Indeed, thanks for the clarification!
        
             | eigenket wrote:
             | If I understand your algorithm correctly I think it fails
             | on this graph (drawn on my phone, using an online
             | implementation of Paint).
             | 
             | https://imgur.com/a/ihPrwrX
             | 
             | I think this is roughly the counterexample the person above
             | was suggesting.
        
         | konstantinua00 wrote:
         | how is it different from the "sidewalk" algo in the article?
         | 
         | you'd get the same pitfall of torus topology with 2
         | perpendicular loops
        
       | penguin_booze wrote:
       | I'm curious: If I were to share the same link again now, I'll be
       | forwarded to the latest submission instead. Interestingly, this
       | link was shared twice in the past few days (click the "past" link
       | above - and, yes, one of the submitters was I). How, then, was it
       | that neither of the recent submitters, including myself, were
       | forwarded to the latest at the time, but instead new submissions
       | were created?
        
       | sim7c00 wrote:
       | one of the more interesting reads ive had on here so firstly
       | thanks for that! love all the explanations of the intermediate
       | solutions. ive been working lately with graphs to deduce network
       | topologies and finding loops is one of my problems. excited to
       | learn about the final answer, and happy to see it involves
       | spanning tree :') kind of saves me reading through how that
       | protocol works (or should work) so i can get on with implementing
       | it! saving me likely all sorts of wrong solutions <3 thanks!
        
       ___________________________________________________________________
       (page generated 2024-09-11 23:00 UTC)