hngopher.com

       [HN Gopher] Show HN: Instantly visualize any codebase as an inte...
       ___________________________________________________________________
        
       Show HN: Instantly visualize any codebase as an interactive diagram
        
       GitDiagram is an open-source micro dev-tool that I made this past
       week  Given any public GitHub repository it generates diagrams in
       Mermaid.js with Claude 3.5 Sonnet  I extract information from the
       file tree and README for details and interactivity (you can click
       components to be taken to relevant files and directories)  Also,
       you can replace "hub" with "diagram" in any repository URL to
       access its diagram  I created this because I wanted to contribute
       to open-source projects but quickly realized their codebases are
       too massive for me to dig through manually, so this helps me get
       started  I do still plan on adding other features like private
       repository access if that becomes a thing people want  This project
       was heavily inspired by https://gitingest.com/ so make sure to
       check that out as well!  Hopefully this tool can help you and
       feedback is always welcome!
        
       Author : ahmedkhaleel
       Score  : 101 points
       Date   : 2024-12-27 13:04 UTC (9 hours ago)
        
 (HTM) web link (gitdiagram.com)
 (TXT) w3m dump (gitdiagram.com)
        
       | whalesalad wrote:
       | I hope you have a cap set on your billing!
        
         | ahmedkhaleel wrote:
         | yup, i did that, i have it currently on balance mode so its not
         | using anything without my knowledge
        
         | visch wrote:
         | :D , he got a star from me for the ease!
        
           | ahmedkhaleel wrote:
           | haha thanks so much! really went for that with this project.
           | hopefully wont cause any issues
        
       | owenpalmer wrote:
       | Inputted llvm-project, resulted in this error:
       | Repository is too large (>200k tokens) for analysis. Claude 3.5
       | Sonnet's max context length is 200k tokens. Current size: 1448461
       | tokens.
        
         | ahmedkhaleel wrote:
         | 1.4M tokens... even if Claude can process that there would be a
         | crater in my wallet
        
           | d0mine wrote:
           | Deepseek v3 is ~1$ for 1M tokens (cheaper at the moment). It
           | is comparable to Sonnet in performance
           | 
           | https://api-docs.deepseek.com/news/news1226
        
         | karmakaze wrote:
         | I tried with sorbet/sorbet:
         | 
         | > File tree and README combined exceeds token limit (50,000).
         | Current size: 159829 tokens. This GitHub repository is too
         | large for my wallet, but you can continue by providing your own
         | Anthropic API key.
         | 
         | Without having an idea of the output it would produce, I can't
         | tell if it's worth it. I'm not particularly interested in this
         | test example so it's something I might try for an easy win, but
         | probably tweak and maintain whatever it produces--or discard it
         | and make something by hand. Showing something subjectively
         | incorrect is good motivation.
        
       | lor_louis wrote:
       | I tried it on a personal repo and it never ended up generating a
       | diagram.
       | 
       | Might be a bug, so here's the repo.
       | https://github.com/lorlouis/cedit
        
         | ahmedkhaleel wrote:
         | i see a diagram there now, might've taken a while to load
         | https://gitdiagram.com/lorlouis/cedit
        
           | billyp-rva wrote:
           | I guess to get the discussion going: the diagrams look nice,
           | but I think the erroneous and missed connections limit their
           | usefulness. In this example, there is a connection from "Line
           | manager" (line.h/.c) to "Memory Allocator" (xalloc.h/.c), but
           | it doesn't look like such a dependency exists in the
           | codebase. Meanwhile, "Syntax Highlighter" does directly
           | import _xalloc.h_ , but there is no dependency shown in the
           | diagram.
        
             | lor_louis wrote:
             | As the author of the repo I was surprised how close to my
             | mental model the graph was. (I haven't worked on this repo
             | in a while so my mental model is a bit fuzzy)
             | 
             | And yeah it seems to be missing a few connections but the
             | ones that are there are correct line.h depends on str.h
             | which links str.c which depends on xmalloc.c/h.
             | 
             | But had I been a new contributor to the project I would
             | have found the missing links between modules pretty
             | frustrating.
             | 
             | My guess is that the graph is only good enough to give and
             | overview of what things are dependent on what if I were
             | working on a monorepo and I had to justify to my manager
             | that team B needs to do something for us.
             | 
             | I like the idea though, a lot of direct code to graph tools
             | are too noisy and that tends scares non technical people
             | away.
        
       | mparnisari wrote:
       | It never generated a diagram for mine :(
        
         | ahmedkhaleel wrote:
         | thats unfortunate, whats the url? ill check it out
        
       | shahzaibmushtaq wrote:
       | I learn faster with visualization, whether it's a stat or a
       | codebase. Much appreciated!
        
         | ahmedkhaleel wrote:
         | yup, exact same here, glad to see my project helping others
         | alike
        
       | corysama wrote:
       | To save everyone a lot of time and OP some money, here is
       | https://github.com/id-Software/Quake-III-Arena diagrammed:
       | 
       | https://gitdiagram.com/id-Software/Quake-III-Arena
       | 
       | https://imgur.com/a/gwoabtk
       | 
       | Clicking on any box takes you directly to either a file or a
       | folder in the repo. AFAICT, the boxes, wires, groups, labels are
       | all inferred by the AI.
        
         | ahmedkhaleel wrote:
         | yup, if you look at the backend its basically just a pipeline
         | of information extracted from the file tree and readme using
         | Claude 3.5 Sonnet, sick diagram tho
        
       | jesse__ wrote:
       | Tried https://github.com/scallyw4g/bonsai and I believe it hung
       | after the green loading bar completed. Left it for several
       | minutes
        
         | ahmedkhaleel wrote:
         | yea i just checked the db, nothing there. thats weird, ill try
         | loading it myself
        
           | jesse__ wrote:
           | Looks like it works now, although the diagram it generated is
           | pretty watery.
           | 
           | It mis-judged that the "work queue system" is kind of off by
           | itself, when in fact almost all of the important work in the
           | engine goes through the work queue. It did do a good job of
           | at least approximately figuring out the render pipeline
           | stages. Somehow it thinks that "input processing" isn't
           | related to the platform layer, which doesn't make any sense
           | at all.
           | 
           | Seems like a pretty reasonable result for a weekend project,
           | nice work :)
        
       | ComputerGuru wrote:
       | Bug: url checker is case sensitive. For those of us that type out
       | the url from memory on a stupid touch-type device with auto
       | incorrect, you'll get things like Http(s)://Github
       | 
       | Also might want to coalesce https and http
       | 
       | Not sure if it queues jobs for processing so that when I refresh
       | after a failure it is continuing where it left off or if it is
       | starting over anew? "Progress bar" makes it hard to say.
       | 
       | Aside: I dislike the "modern progress bar" that's just a
       | scrolling marquee of pithy quips. One of the difficult problems I
       | worked on for a SW project was adding sane progress to a multi-
       | stage backup tool so that the completed percentage and ETA
       | correctly represented a mix of millions of single kb files and
       | random multi-gb files, backed up across multiple pipelines on
       | multiple cores, asynchronously piping from one stage to the next
       | with buffering. Needed to add a good progress metric without
       | poisoning cpu core caches or hurting the efficiency of how work
       | was being divided. This doesn't seem as hard by comparison!
       | 
       | Sorry for only having tangentially relevant things to report at
       | this time; still waiting for it to finish with the fish-shell
       | codebase so I can give some good feedback!
        
         | layer8 wrote:
         | It's also unclear why you have to enter the prefix
         | https://github.com in the first place.
        
       | blondin wrote:
       | love this idea! here's ghostty: https://gitdiagram.com/ghostty-
       | org/ghostty
        
         | ahmedkhaleel wrote:
         | ghostty is sick, and the diagram seems accurate on a higher
         | level
        
       | louthy wrote:
       | Doesn't like my repo [1]
       | 
       | "Failed to generate diagram. Please try again later."
       | 
       | [1] https://github.com/louthy/language-ext
        
       | billyp-rva wrote:
       | Didn't get a response (expected), but I would caution everyone to
       | keep expectations low. Generating system diagrams from code is
       | extremely difficult, if not impossible, even with AI [0].
       | 
       | [0] https://www.ilograph.com/blog/posts/diagrams-ai-can-and-
       | cann...
        
         | mulmboy wrote:
         | This is a poor quality blog. They "upload" source code to
         | chatgpt so who knows if it's in context or ragd plus it looks
         | like they reuse the chat from a previous prose -> code session.
         | They criticise LLM for high level diagram despite using a
         | garbage prompt and rejecting the idea of iterating on the
         | diagram.
         | 
         | Anecdotally I've had great success with code to diagram via LLM
         | including fine details. But as with anything LLM you need to
         | really get the context right. This can not be overemphasized.
         | And iterate with the LLM, goodness.
        
           | billyp-rva wrote:
           | As mentioned in the blog, iterating on generating a diagram
           | from a repo kind of defeats the purpose. The information is
           | in the repo; if the LLM isn't going to analyze it properly,
           | you might as well just tell it exactly what to diagram (like
           | in the previous section on "whiteboarding"). It is much more
           | capable at that.
           | 
           | If you have some examples of an LLM doing better, by all
           | means please share.
        
             | mulmboy wrote:
             | "properly" is the key word here. You've got to communicate
             | what you want. Or at least communicate something like what
             | your goal is from the diagram, why you want it, so the LLM
             | knows the audience it's targeting.
             | 
             | Like imagine giving the same prompt (instruction,
             | directive, task) to a human - you would in all likelihood
             | get out a similar high level diagram because you've not
             | provided even the slightest whiff of what you want to use
             | the diagram for.
             | 
             | The blog's takeaway is essentially "LLM didn't read my mind
             | so no good". They're tools to be used and you get out what
             | you put in.
        
       | chris_5f wrote:
       | This is just amazing. I tried out with my startup's repo and it
       | was a blast. Shared with my community
        
         | ahmedkhaleel wrote:
         | thank you so much!!
        
       | antonpirker wrote:
       | This looks really nice!
       | 
       | I tried it with mine: https://gitdiagram.com/getsentry/sentry-
       | python
       | 
       | A view things:
       | 
       | There are way more integrations in the integration layer, so
       | maybe they should be either shown or a "..." somewhere should
       | tell people that there is more.
       | 
       | The "Hub" is deprecated so it would be cool, that this fact is
       | shown somewhere.
       | 
       | Otherwise really cool!
        
       | nhatcher wrote:
       | I tried with mine, of course. It worked quite well I would say:
       | 
       | https://gitdiagram.com/ironcalc/IronCalc
       | 
       | I think the color coding for the legend is incorrect though.
       | 
       | Overall looks great, congratulations and thanks!
        
       | fsndz wrote:
       | I have been attempting to do this and failed. Good
       | implementation, but still fails in a lot of cases like my
       | implementation. For example fails for this:
       | https://github.com/stanfordnlp/dspy
        
       | WillAdams wrote:
       | Failed with:
       | 
       | https://github.com/WillAdams/gcodepreview
       | 
       | which is probably the weirdest structure one could imagine
       | (Literate Program as a .tex file containing Python and OpenSCAD
       | code for https://pythonscad.org/ ) there the Python file is the
       | core, there is an intermediate OpenSCAD file which wraps it, and
       | then a top-level OpenSCAD file which the user interacts with.
        
       | nulld3v wrote:
       | Hashicorp Vault (had to use my own API key as the repo is fairly
       | large): https://gitdiagram.com/hashicorp/vault
       | 
       | Diagram could maybe have a bit more detail but what is there
       | looks accurate! Really cool stuff OP!
        
       | Minervaskell wrote:
       | Final boss: https://github.com/torvalds/linux
       | 
       | Error message: Repository is too large (>200k tokens) for
       | analysis. Claude 3.5 Sonnet's max context length is 200k tokens.
       | Current size: 1334798 tokens.
       | 
       | Cool project though! Kudos!
        
       | initramfs wrote:
       | Wow, this is awesome:
       | 
       | https://gitdiagram.com/EI2030/Low-power-E-Paper-OS
       | 
       | https://gitdiagram.com/hatonthecat/Solar-Kernel
       | 
       | https://gitdiagram.com/hatonthecat/OpenSourceCondo
       | 
       | https://gitdiagram.com/hatonthecat/Open-Source-Car
        
       | Animats wrote:
       | Nice.
       | 
       | I put in a repository of mine that implements a UI in Rust, and
       | it gave me a reasonable diagram. It's just a top-level structure
       | of the program, though. No detail. Not much info about
       | connections between components. The layout was kind of weird.[1]
       | 
       | Another one, from a fork I have of a rendering library.[2] It
       | found the big parts, but provides little insight.
       | 
       | Here's a JPEG 2000 decoder. Even less insight.[3]
       | 
       | The progress messages are bogus. They have no relationship to
       | what's going on. Progress messages indicating progress appear for
       | a bad URL.
       | 
       | [1] https://gitdiagram.com/John-Nagle/ui-mock
       | 
       | [2] https://gitdiagram.com/John-Nagle/rend3-hp
       | 
       | [3] https://gitdiagram.com/John-Nagle/jpeg2000-decoder
        
       | thih9 wrote:
       | Note: the messages that are displayed during diagram generation
       | seem to describe the current progress - but actually they are
       | generic/comedic messages that run in an endless loop.
       | 
       | Also, I tried this with https://github.com/rails/rails and it
       | never finished.
        
         | dylan604 wrote:
         | Are any of them kicking the llama's ass while reticulating
         | spines?
        
           | thih9 wrote:
           | It's mostly re-reticulating splines that I find annoying. By
           | that I mean: running it all in a loop, endlessly - showing
           | neither a result nor an error message.
           | 
           | But you're right, reticulating splines and similar
           | generic/comedic messages do have a long tradition.
        
       | layer8 wrote:
       | I tried this with a larger project, and after looping through the
       | obviously-fake progress messages for a couple of minutes, it
       | resulted in "Failed to generate diagram. Please try again later."
        
       | btown wrote:
       | I really, really like this approach! Rather than trying to build
       | a full graph of related low-level components from function calls
       | etc., which is almost always overwhelming, this just finds the
       | "vibes" of how modules are named in the filesystem, and how they
       | relate to common design patterns - which in many cases is exactly
       | what you want for exploring a codebase or understanding the scope
       | of its offering.
       | 
       | https://github.com/ahmedkhaleel2004/gitdiagram/blob/main/bac... -
       | the prompts in question. Don't sell yourself short as per the
       | comments, they're very well designed prompts!
       | 
       | Using an inexpensive LLM to summarize each file might be an
       | interesting next step, putting few-word summaries alongside the
       | filenames in much the same setup you currently have! But,
       | honestly, it may not be particularly necessary for large existing
       | open-source projects that have already bikeshedded their file
       | naming over many iterations, and/or have highly intentional
       | structures for maintainability.
        
       ___________________________________________________________________
       (page generated 2024-12-27 23:00 UTC)