hngopher.com

       [HN Gopher] Show HN: Visualize the entropy of a codebase with a ...
       ___________________________________________________________________
        
       Show HN: Visualize the entropy of a codebase with a 3D force-
       directed graph
        
       Hi HN! I'm Gabriel, the author of dep-tree
       (https://github.com/gabotechs/dep-tree), and I wanted to show off
       this tool and explain why it's being really useful at my current
       org for dealing with code complexity.  I work at a startup where
       business evolves really fast, and requirements change frequently,
       so it's easy to end up with big piles of code stacked together
       without a clear structure, specially with tight deadlines. I made
       dep-tree [1] to help us maintain a clean code architecture and a
       logical separation of concerns between parts of the application,
       which is accomplished by: (1) Visualizing the source files and the
       dependencies between them using a 3D force-directed graph; and (2)
       Enforcing some dependency rules that allow/forbid dependencies
       between different parts of the application.  The 3D force-directed
       graph visualization works like this: - It takes an entrypoint to
       the codebase, usually the main executable file or a library's
       entrypoint (index.js, main.py, etc...) - It recursively crawls
       import statements gathering other source files that are being
       depended upon - It creates a directed graph out of that, where
       nodes are source files and edges are the dependencies between them
       - It renders this graph in the browser using a 3D force-directed
       layout, where attraction/repulsion forces will be applied to each
       node depending on which other nodes it is connected to.  With this,
       properly decoupled codebases will tend to form clusters of nodes,
       representing logical parts that live together and are clearly
       separated from other parts, and tightly coupled codebases will be
       rendered without clear clustering or without a clear structural
       pattern in the node placement.  Some examples of this visualization
       for well-known codebases are:  TypeScript: https://dep-tree-
       explorer.vercel.app/api?repo=https%3A%2F%2F...  React: https://dep-
       tree-explorer.vercel.app/api?repo=https%3A%2F%2F...  Svelte:
       https://dep-tree-explorer.vercel.app/api?repo=https%3A%2F%2F...
       Langchain: https://dep-tree-
       explorer.vercel.app/api?repo=https%3A%2F%2F...  Numpy: https://dep-
       tree-explorer.vercel.app/api?repo=https%3A%2F%2F...  Deno:
       https://dep-tree-explorer.vercel.app/api?repo=https%3A%2F%2F...
       The visualizations are cool, but it's just the first step. The
       dependency rules checking capabilities is what makes the tool
       actually useful in a daily basis and what keeps us using it every
       day in our CI pipelines for enforcing decoupling. More info about
       this feature is available in the repo:
       https://github.com/gabotechs/dep-tree?tab=readme-ov-file#che....
       The code is fully open-source.
        
       Author : gabimtme
       Score  : 48 points
       Date   : 2024-01-31 17:42 UTC (3 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | leetrout wrote:
       | Off topic but...
       | 
       | > I work at a startup where business evolves really fast, and
       | requirements change frequently, so it's easy to end up with big
       | piles of code stacked together without a clear structure,
       | specially with tight deadlines
       | 
       | That smells.
       | 
       | It sounds like the team could benefit from better stack
       | technologies and a bit more discipline in how it is applied to
       | solutioning.
       | 
       | > Enforcing some dependency rules that allow/forbid dependencies
       | between different parts of the application.
       | 
       | What is the alternative to this tool that lowers the cognitive
       | barrier / builds the right muscles for the team to understand
       | what they should / shouldnt depend on?
        
         | gabimtme wrote:
         | > It sounds like the team could benefit from better stack
         | technologies and a bit more discipline in how it is applied to
         | solutioning.
         | 
         | For our specific case it's actually pretty good, we've built a
         | lot of discipline around maintainability, but in general this
         | is a recurring problem in tech teams who might not be able to
         | afford the time it takes to gain discipline.
         | 
         | > What is the alternative to this tool that lowers the
         | cognitive barrier / builds the right muscles for the team to
         | understand what they should / shouldnt depend on?
         | 
         | Some programming languages allow you to split the codebase into
         | modular units (npm workspaces, cargo workspaces, etc..) which
         | forces developers to modularize things, and dependencies
         | between modules need to be explicitly declared.
         | 
         | This is good, but usually not enough, as nothing prevents you
         | to mess things up within a module/workspace.
         | 
         | There's some other tooling with similar functionality to dep-
         | tree, but language-specific and with visualizations not
         | suitable for large codebases (.dot files, 2d svgs...)
        
           | jonmoore wrote:
           | Indeed, and tools like dep-tree provide a combination of 1)
           | making module structure visible 2) making rules about this
           | structure concrete and 3) automatically checking for rule
           | violations.
           | 
           | These all help to lower the cognitive barrier to learning and
           | maintaining the code base effectively. For developers new to
           | the code base they help with learning and for those more
           | experienced they help with ongoing design and maintenance.
           | 
           | Most long-lived code bases I've seen have adopted or built
           | such tooling at some point, often with tools customized to
           | the code base. For example in one large code base (c. 250
           | devs) we built tooling that simulated and helped optimize the
           | changes to implement a major refactor of the overall module
           | structure.
        
         | gjgtcbkj wrote:
         | There's such a weird vane of do nothingness that runs through
         | this comments attitude. Yeah of course it's easy to pick
         | dependancies when you don't worry about deadlines. A programmer
         | without a deadline is like a fisherman going to grocery store
         | to buy fish and claiming it's "best practices" better results,
         | but what was the point?
        
         | crucialfelix wrote:
         | It's extremely common to get things twisted up. Even if there
         | is a good tech lead, that person may not be good at writing
         | documentation, may be too busy writing code, and may not yet
         | have a plan for how to keep things organized.
         | 
         | Maintaining a code base requires communication, PR reviews and
         | discipline. That doesn't always happen.
         | 
         | Having lint check rules is brilliant. Never mind discipline,
         | you just need a friendly error to say don't import services
         | into an ORM model file. I'm going to adopt this right away.
        
           | gabimtme wrote:
           | And even with discipline, sometimes introducing tech debt in
           | order to ship something fast is actually something desirable
           | at the short term, specially in the startup world, so I don't
           | think that anybody with deadlines is completely free from
           | twisting things up.
        
         | nyrikki wrote:
         | Stack technologies tend to bound contexts based on technologies
         | and not on domain boundaries.
         | 
         | This is why we see all these products targeted at companies
         | with 24 microservices with 26 developers who have to run end to
         | end testing on everything.
         | 
         | Architectural erosion is primarily a cultural issue and any
         | tool that helps people discover and call out architectural
         | violations is potentially useful.
         | 
         | Many companies can't just do the inverse Conway law, and if you
         | look at the state of devops report, note how they call out CAB
         | forums and controls being problematic for even high performing
         | companies to become elite.
         | 
         | This product as an example, which just really means you want to
         | keep k8s but have given up on loose coupling and high cohesion.
         | 
         | https://www.signadot.com/blog/how-uber-and-doordash-enable-d...
         | 
         | Throwing products at structure problems typically doesn't work.
        
       | contravariant wrote:
       | Is this just using the word 'entropy' as a stand-in for
       | complexity or is there some actual definition of entropy
       | involved?
        
         | gabimtme wrote:
         | Nah, nothing like that, "entropy" in the colloquial meaning of
         | level of disorder, it has proven to be a useful word for people
         | to understand what it is about, even though it's strictly
         | incorrect.
        
           | carrolldunham wrote:
           | please don't do that, especially when presenting something
           | that uses graphs, as entropy on graphs is an actual technical
           | concept that's currently widely used in very hot fields.
        
       | compacct27 wrote:
       | Love it, I think dependency trees are super underused data for
       | static analysis.
       | 
       | The visualization here is amazing in its own right as well, can I
       | ask what part of the codebase renders it and handled the force-
       | directed part?
        
         | gabimtme wrote:
         | The portion of the code in charge of rendering lives inside the
         | `internal/entropy` (https://github.com/gabotechs/dep-
         | tree/tree/main/internal/ent...).
         | 
         | Force-directed is an algorithm for displaying graphs in a 2d or
         | 3d space, which simulates attraction/repulsion based on the
         | dependencies between the nodes, the wikipedia page explains it
         | really well https://en.wikipedia.org/wiki/Force-
         | directed_graph_drawing
         | 
         | > Love it, I think dependency trees are super underused data
         | for static analysis.
         | 
         | Definitely, specially for evaluating "the big picture" of a
         | codebase
        
       | rikroots wrote:
       | I "think" I understand what I'm looking at - it's like a 3d
       | dependency tree with added flow of exports -> imports? It
       | certainly looks very pretty![1]
       | 
       | One piece of feedback, if I may. It's really difficult to read
       | the blue labels against the black background. Is there any way to
       | change the palette colors?
       | 
       | [1] https://dep-tree-
       | explorer.vercel.app/api?repo=https%3A%2F%2F...
        
         | gabimtme wrote:
         | Well, that's one of the drawbacks of the smart color auto
         | generation... it's not that smart.
         | 
         | That's definitely is an improvement point, I have just
         | calibrated things looking at my screen, which might have a high
         | saturation/brightness setting.
         | 
         | Thanks for the feedback!
        
       | sam_bristow wrote:
       | A tangentially related tool you can use to look at a repo over
       | time is Git of Theseus[1]. It shows things like "what percentage
       | of the code in this repo survives 6 months.
       | 
       | [1]https://erikbern.com/2016/12/05/the-half-life-of-code.html
        
         | gabimtme wrote:
         | That's really interesting!
        
       | daxfohl wrote:
       | I've always felt like instead of public, private, protected,
       | there should be something like security groups and acls on
       | classes and functions. That way it's very explicit when you are
       | newly coupling things, and brings tighter scrutiny to those
       | changes.
       | 
       | Edit: oh, looking at the docs, apparently that's exactly what
       | this tool does. Though it would be nice to have function level
       | granularity. Maybe by annotating the code itself.
        
         | sam_bristow wrote:
         | Build systems like Bazel provide mechanisms for controlling
         | access at the module-level. If you're disciplined about not
         | just making everything "public" it can be really powerful.
         | Bazel is a very big hammer though and might be overkill for
         | your projects.
        
       | MilStdJunkie wrote:
       | This is gonna sound weird as hell, but I would really dig
       | implementing this on a doc repo with CCS (component content),
       | where you re-use document modules[1]. Why do I care? Because some
       | modules support _way_ too much complexity, and entropy is a
       | pretty good measurement of that.
       | 
       | [1] Asciidoc/RsT (include directive for both), XML
       | (DITA/S1000D/DocBook/etc, each with different transclude
       | mechanisms), any markup that supports transclusion.
        
       | SushiHippie wrote:
       | Could it be, that this can't check absolute imports? My python
       | project, has many files which depend on each other, but are not
       | linked together in the generated graph. But one of my modules has
       | a __init__.py with relative imports, and this shows links between
       | the files imported in the __init__.py.
       | 
       | Lets say my project looks like this:
       | 
       | src/example/foo.py
       | 
       | src/example/bar.py
       | 
       | And If bar.py containse the statement "from example.foo import
       | Foo" there is no link between the files foo and bar. Though, if
       | the statement is "from .foo import Foo" it shows a link.
        
         | gabimtme wrote:
         | That's because dep-tree doesn't know it needs to resolve names
         | starting from `src/`, as your imports have that piece of
         | information trimmed. You can solve this by setting the
         | PYTHONPATH env variable like this:
         | 
         | export PYTHONPATH=src
        
           | SushiHippie wrote:
           | Perfect, that worked, thank you!
           | 
           | I thought this could be solved by changing the directory to
           | src/ and then executing that command, but this didn't work.
           | 
           | This also seems to be an issue with the web app, e.g. the
           | repository for the formatter black is only one white dot
           | https://dep-tree-
           | explorer.vercel.app/api?repo=https://github...
        
             | gabimtme wrote:
             | Yeah, the web app is quite limited, it doesn't accept any
             | kind of configuration. Implementing the Python absolute
             | path resolution mechanism was actually quite challenging,
             | as there is just too many ways you can handle absolute
             | imports.
             | 
             | I've seen people using tricks like the
             | `sys.path.extend(["src"])` in the main file for being able
             | to place source code into an `src` folder, but
             | unfortunately, dep-tree is not able to take that into
             | account.
        
       | ytjohn wrote:
       | This is really cool. And as OP pointed out, I really like the
       | pipeline integration. Like when linting catches function-level
       | complexity, but in a cross functional way. I prefer to think of
       | programs in layers where the top layers can import lower layers,
       | but never the other way (and also very cautious on horizontal
       | imports). Something like this would help track that.
       | Unfortunately, I'd really need to support Go. I find it
       | interesting the the code is written in Go, but doesn't support
       | Go. But I will watch this project.
       | 
       | From the visualization perspective, it reminds me a lot of
       | Gource. Gource is a cool visualization showing contributions to a
       | repo. You see individual contributors buzzing around updating
       | files on per-commit and per-merge.
       | 
       | https://github.com/acaudwell/Gource
        
         | gabimtme wrote:
         | The visualization is actually inspired by Gource, but taken to
         | the 3D space, it's a really cool project.
         | 
         | Golang is very challenging to implement, because dependencies
         | between files inside a package are not explicitly declared, you
         | can just use any function from any file without importing it as
         | long as they both belong into the same package, so supporting
         | Golang would probably require spawning an LSP and resolving
         | symbols.
         | 
         | The reason for implementing dep-tree in Go was because things
         | were going to get algorithmic af, and better to choose a
         | language as simple as possible, knowing that it also needed to
         | be performant.
        
       ___________________________________________________________________
       (page generated 2024-02-03 23:00 UTC)