[HN Gopher] Ask HN: Why aren't code diagram generating tools mor...
___________________________________________________________________
Ask HN: Why aren't code diagram generating tools more common?
When I'm trying to get familiar with a new codebase it often takes
me a long time to build a proper mental model of the whole system.
Even with my own projects, it's easy to lose track of all the
components and their interactions since they're constantly
changing, and making hand-drawn diagrams is time consuming. So my
questions are: - Why isn't diagram generation automated as part of
the build process (UML or otherwise)? - Why aren't code
visualization tools more popular? The options out there seem
outdated - Would you want to use these tools? What would be your
ideal tool? Edit: looks like this is a duplicate question
https://news.ycombinator.com/item?id=31569646 I can't delete it so
feel free to discuss more
Author : lurker137
Score : 101 points
Date : 2022-06-04 13:20 UTC (9 hours ago)
| ooedemis wrote:
| the hype is producing working solutions over docs
| mtoddsmith wrote:
| Something like this?
|
| NDepend Dependency Graph
| https://www.youtube.com/watch?v=23fBxM2v22k
| lurker137 wrote:
| Thanks for this, I'll have to try it first but this looks like
| exactly what I hoped for (it's a shame that it is only for .NET
| and Visual Studio though)
| groffee wrote:
| I find those diagrams a lot more confusing than just reading and
| having a mental model of the code.
| lurker137 wrote:
| Seems like someone asked the same question just 3 days ago.
| Here is the discussion if you're interested:
| https://news.ycombinator.com/item?id=31569646
| altgeek wrote:
| It's a very valid criticism of diagram "systems" like UML. But,
| "read the code" doesn't scale. It makes it brutal when
| onboarding new team members who will not understand the "tao"
| of the system without a fireside chat with the original
| developers. And, usually, the OG devs are long gone...
| morelisp wrote:
| The tao that can be sequence diagrammed is not the true tao,
| either.
| necovek wrote:
| I always had the same feeling, and I found out about aphantasia
| in the last few years: I do wonder if these are related?
| charlieflowers wrote:
| I think it's kind of like the AI Winter -- there was a period of
| time when the software industry really went down a stupid path in
| regards to diagram-based code generation. A lot of kool aid was
| drunk over promises of making it so that everyone would be able
| to program.
|
| But, of course, it turns out, someone still needs to understand
| and be able to debug all the nuances that makes complex logic
| systems complex, especially when they're cobbled together from
| many underlying systems.
|
| The real goal should be to take good programmers and magnify what
| they can do. But since the industry bought so hard into the naive
| vision, the industry is behind where it should be on a smarter
| vision.
| rjsw wrote:
| The OP is asking for diagram generation not code generation.
| tehbeard wrote:
| Which are two sides of the same coin, transformation of code
| <-> Diagram.
|
| The industry couldn't get the simple one of those (diagram ->
| code) to work well enough, how would they do the much more
| complex reverse?
|
| Architectural and even library nuances can't be easily
| quantified to a particular UML symbol unless you go through
| the effort of classifying it for every library/design
| pattern, and keeping that up to date as well.
| mattm wrote:
| Not really the same thing. With diagram generation, the
| code is still the source of truth. There would just be a
| way to automatically visualize it.
|
| With code generation, you would program by diagramming up
| front and then the code would be generated from that.
| rsstack wrote:
| We have scripts that generate textual diagrams as part of our CI
| process (mermaid or PlantUML, depending on the case), so our
| Markdown documentation files (e.g. README.md) always have up-to-
| date diagrams and people don't have to complain "why did no one
| update this diagram in 6 months".
| prakashqwerty wrote:
| The GitHub repo visualizer was especially useful for me
| https://githubnext.com/projects/repo-visualization
| justsomeuser wrote:
| I think the main reason is that it is slower than having a mental
| model.
|
| One reason it is slower is that it is is difficult to create a
| map like diagram where you zoom in to get greater detail.
| chrismorgan wrote:
| A decade or so ago, I was tasked with auto-generating
| documentation, including diagrams (super- and subclasses, to
| begin with), of a fairly large system in a domain-specific
| language developed over the past couple of decades. The language
| had a kind of multiple inheritance (a traitlike system), and at
| the time around 1500 types/classes/traits/mixins/whatevers, with
| the entire system all queryable at runtime (and indeed that's how
| I generated it--someone else provided Python bindings to the
| system, then I traversed it all in Python). Just to amuse myself,
| I generated one class diagram of the entire system. It was around
| 30 metres wide and I think 30cm high when I zeroed all the
| margins and padding I could in GraphViz. Flipping its
| orientation, I got it to be 15 metres tall and almost 1 metre
| wide. I figured it could be fun physical wallpaper for the
| office, but in the end settled for just a labelless rendering
| with random line colours as a desktop background. It was pretty.
|
| But more seriously, it depends on how complex the system is and
| how it's _modelled_. The case I was working with then transferred
| _excellently_ to such diagrams (shallow and deep inheritance, and
| other forms of composition and linkage, with every box a link)
| and key-value property sheets about the types and the likes, but
| I don't think I've encountered another system where anything even
| vaguely like that would work particularly well.
| lurker137 wrote:
| That's a great point, I doubt there can be one single modeling
| solution for all systems. I imagine a modern tool would be more
| like a suite with many options for common use cases. And it
| would be essential to be able to narrow down on specific
| subsystems maybe using something like a gitignore file
| giaour wrote:
| I have used visualization generation tools but have found them of
| limited utility to me. When the generator is built into an IDE,
| the artifact it generates is less useful than the IDE itself,
| which typically provides a structural (symbol tree) view of the
| code as well as symbolic navigation ("Go to
| definition/references").
|
| A visualization can be helpful as an artifact for non-technical
| colleagues, but I always end up hand-rolling those diagrams to
| highlight a specific aspect of the system and hide irrelevant
| features.
| DantesKite wrote:
| It would be funny if there was a single explanation, like, "The
| tool for creating simple visualizations that helps you understand
| codebases doesn't exist yet."
| rmah wrote:
| Because most generated diagrams of _code_ has very little actual
| value. Sometimes it fools you into thinking you understand how a
| system works. Which I would argue has negative value.
| lifeisstillgood wrote:
| I am trying to answer a lot of these questions based on the idea
| of software literacy - so do we understand books using diagrams
| connecting pages and paragraphs? No, maybe concepts, maybe
| characters, but in the main any diagram involving time _plus more
| than one other dimension_ has never been successfully written in
| a flat piece of paper.
|
| Edit: another way of thinking about time is _mutability_ so
| perhaps functional languages are more amenable to graphing.
| dangoor wrote:
| As of February 2022, GitHub supports Mermaid diagrams directly:
| https://github.blog/2022-02-14-include-diagrams-markdown-fil...
|
| This may help with adoption.
| yashap wrote:
| Did not know about this, but thanks, seems quite useful!
| everythingabili wrote:
| I'd say the question was "solved" by Prograph CPX in the 1990s,
| except the diagram WAS the code.
|
| https://www.google.com/search?q=Prograph+CPX&rlz=1C5CHFA_enG...
|
| You'd have high level classes, and low-level nitty gritty. You
| could edit your code as it was running (and then continue).
|
| People prefer text (weirdly).
| irrational wrote:
| At one time I had a diagram of our database schema. Every table
| with its columns and data types was in its own now with arrows
| indicating foreign key relationships. Each table was color coded
| and then then grouped into related areas of concern (user tables
| over here, retailer tables over there, etc.) We had our print
| shop print it out at the largest size they could (about 3x5 feet)
| and hung it on the wall. We referred to it all the time, but it
| was a pain to maintain as the schema changed and overtime it
| became too much of a hassle so we stopped updating it. I think
| the same thing would happen with a system diagram. It sounds good
| and is helpful, but over time it won't be maintained and will be
| abandoned.
| SOLAR_FIELDS wrote:
| Theoretically a proper tool would be able to generate this on
| the fly - lots of tools to do this on the database side. I'm
| not aware of many good tools that can do this on the
| application side - perhaps someone can chime in with some
| platform specific examples?
| snarkypixel wrote:
| Yeah.. in a world with infinite time and no compromise,
| documentations + diagrams are nice. In practice, working on
| this means not working on something else, potentially more
| important. Trade-offs are hard and not fun :(
| photochemsyn wrote:
| One issue is that a general visualization tool might have have a
| lot of problems jumping from a codebase in language X to a
| codebase in language Y (let alone a mixed codebase). MS seems to
| have this Code Map tool but it looks like it's for C# / VB
| mainly, with some C++ support.
|
| https://docs.microsoft.com/en-us/visualstudio/modeling/map-d...
|
| In many cases it might really be faster and easier to just
| diagram things out with a pad of paper and a pencil compared to
| setting up a tool like this and getting all the parts working
| correctly without any bugs.
|
| That said, a virtual reality 3D tool for visualizing code base
| dependencies, internal structure, what parts call what other
| parts, internal exception handling etc. would be pretty cool.
| Maybe it's an area where AI machine learning could do something.
| lurker137 wrote:
| That tool from Microsoft comes closest to what I think would be
| ideal, generated diagrams that aren't static and where you can
| include/exclude and move around components. It would definitely
| have to be language specific, but once things catch on IDEs
| implement language specific plugins soon after.
| waynesonfire wrote:
| because a code diagram is no different than reading the actual
| code, it'll be just as confusing. the learning occurs when you
| create the diagram yourself.
| nealabq wrote:
| The diagram can also be a view into the thoughts and
| motivations behind the implementation. Which can help the next
| programmer know the design and why certain design decisions
| were made.
|
| But not everyone thinks about designs as 2-D pictures. I'd
| argue there's selection pressure that favors programmers being
| good spellers and symbol-manipulators. Some people find it
| intuitive to think of concepts as boxes and the relationships
| between those concepts as lines. Or maybe to think of a 2-D
| grid of actions/dependencies with deliverables on the vertical
| axis and work-steps on the horizontal.
|
| But not everyone likes these kinds of visualizations. Some
| prefer a text-based description. And some designs don't fit any
| obvious pictorial representation.
| renox wrote:
| When UML was 'cool' I remember that at work some poor soul was
| tasked with producing UML diagrams using a $$$ tool, of course
| the tool failed and the poor guy had to manually produce most of
| the diagrams, which were obsolete quickly..
|
| I was so happy not being the one doing this useless task..
| billconan wrote:
| they are not useful at all.
|
| I want to see the big picture, what they can generate are direct
| translations of the code down to the line level.
| ben30 wrote:
| I've felt the same problem. IntelliJ markdown files have support
| for mermaid is and plant uml diagrams. You have to Google/tick a
| box to enable it though.
|
| I find creation of a sequence diagram with class instances as
| columns and method names as arrows can help visualise things.
| abathur wrote:
| > - Why isn't diagram generation automated as part of the build
| process (UML or otherwise)?
|
| I've had a related thought/desire percolating... roughly: I
| wonder what interesting levers we could build if it was
| normalized (for both toolchains and projects) to create and
| publish the plaintext relationship graphs in a common easily-
| reused format.
|
| I'll self-reply to elaborate a bit.
| abathur wrote:
| For a concrete example, I've been developing a tool
| (https://github.com/abathur/resholve) that can ~build/link
| Bash/Shell scripts--i.e., rewrite them with external
| executables converted to absolute paths. (This helps ensure
| dependencies are known, declared, present, and don't have to be
| on the global PATH for the script to execute cleanly.)
|
| There's a devilish sub-problem, which is that any given
| executable can potentially exec arbitrary arguments. For now I
| handle this with a very crude automated binary/executable
| analysis that needs to be augmented by human source analysis.
| Deep multi-language source analysis wouldn't be very scalable,
| but I suspect fairly-standardized structural annotations could
| improve the results in a scalable way.
|
| I have to imagine there are other applications of the same
| information.
| mtkd wrote:
| Any high level diagram usually needs one of the architects to
| produce it manually in the context of the audience and what
| aspect they specifically need to know about the structure/logic
|
| In most complex systems the part where the magic happens is
| likely impossible for a tool to identify so would get lost in the
| noise of the cruft around it -- even for monoliths using
| frameworks and especially for anything distributed across
| microservices, and it's usually that aspect that is of most
| interest
| trixie_ wrote:
| Because no one has created a good enough one yet. Maybe you can.
| I am in desperate need for something like that and often just
| write diagrams on a piece of paper for code paths I am trying to
| understand or debug. Many things were just ideas for decades
| until someone figured out the right way to do it. This is one of
| them. You'll know you're close when coding with the diagram is
| faster/better/easier than without.
| mattm wrote:
| When I took on a tech lead role, I began to understand the
| importance better of having diagrams available for key parts of
| the system. It just becomes so much easier to explain to people
| and get them up to speed. I often find myself referring back to
| the diagrams as well to refresh my memory quickly.
|
| After trying various diagramming tools and dragging around boxes
| and lines, I settled on PlantUML which makes diagrams much much
| easier to create and modify. It cuts out a lot of the pain of
| diagramming with the mouse which means there is less resistance
| to creating diagrams and I do it more.
|
| To your question, "Why isn't diagram generation automated as part
| of the build process" - one thing I've found that would be
| difficult to solve is the level of detail you need in the
| diagram. For instance, in a very complex system with many
| decision branches, a diagram with every branch would not be
| helpful. There are cases where I want a high-level component
| overview but don't want to clutter up the diagram with lots of
| details. And yet there may b cases where I do want some more
| detail but may be only in a certain section of the code. I think
| this judgement of detail tradeoffs is what would be the hardest
| problem to solve for diagram generation tools. You want enough
| detail to be useful but not too much to be overkill.
| pshc wrote:
| Often these side artifacts turn into a make-work time-sink. Stuck
| on a hard problem? Distract yourself with a new ToDo app. Fix the
| auto diagram generator.
|
| Honestly, freehand diagrams are best. You'll exercise your own
| understanding of the code base as you draw.
| dahart wrote:
| Speculating based on both writing large systems from scratch, and
| joining a group where a large confusing system was being used...
|
| Diagrams are sometimes unnecessary overhead early in a project.
| Sometimes I've used them and seen other people use them for
| initial design planning, especially if management needs to be
| involved or approve the plans & schedule. But by a year later,
| the design has grown and changed, and everyone on board is so
| familiar with the code, but also so pressed for time and feature
| delivery, that making diagrams doesn't make sense: nobody
| involved at this point needs them. Two years later, when the code
| is getting complicated and slowing down, and you're onboarding
| some new people, that's when it might help to sketch the flow of
| code.
|
| FWIW, sometimes a good profiling tool will show you and let you
| explore call stacks, call graphs, execution charts, etc. I often
| reach for a profiler when I'm new to a codebase. Flame charts are
| a fave of mine. You can find flame charts in Chrome's debug
| tools, or in compiled language profilers like vtune or valgrind.
| Here's a decent article on how to use them
| https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
|
| Another issue is that well designed code bases diagram themselves
| by their module structure, while diagrams for poorly designed
| code bases may not help understand them at all. When code has too
| many side effects, or things are poorly or misleadingly named,
| when class boundaries aren't well defined or the code has a lot
| of spaghetti, diagrams might not really help.
|
| IMO two things worth doing are: get a mentor in any new codebase
| any time you can, and 2) start building your own arsenal of code
| diagramming tools, rather than wondering why or waiting for
| others to do it. Demonstrate the value of diagramming code to
| people around you and see if you can get it to catch on.
| lurker137 wrote:
| The thing with profiling tools is that they're more focused on
| the details than the big picture at the system level. I'll
| definitely be using flame graphs more though, thanks for the
| tip. Also you are absolutely right about waiting for others to
| make the tools, but sometimes the tools don't exist for a good
| reason that I wouldn't have realized otherwise.
| dahart wrote:
| Yep, that's totally true. Profiling & debugging tools are
| designed for sampling the behavior and performance, and not
| primarily for understanding architecture. Still, they're
| great for inspecting call stacks, which can be pretty damn
| helpful for certain parts of understanding the code.
|
| IMO there's no real substitute for getting the design
| explained by the people who designed it. (Of course this
| isn't always possible, but when it is, take advantage.)
| Automated tools can never prioritize the explanation nor
| summarize what parts are critical or tricky vs what parts are
| incidental or trivial, and they can't tell you which parts
| should be redesigned because they were slapdash vs which
| parts look ugly to a newbie but have a long list of hard to
| see requirements, and touching things should be done with
| extreme care. Good diagrams are very helpful, but best used
| as a supplement to in-person stories, in my opinion.
| AndyPatterson wrote:
| The problem facing these tools is a catch-22 really - diagrams
| are useful when understanding big messy codebases but big messy
| codebases are hard to visualise.
|
| For instance, I frequently build small paper diagrams of
| different code paths through a component and nearly always find
| leaky abstractions, mixed layers of abstractions, weird cyclical
| dependencies, etc. etc. and there really is no clear way to
| diagram this. Instead, you sort of have to make judgements and
| assumptions to make the diagram concise and understandable; the
| sort of decisions that machines just aren't that good at.
|
| On the other side, when code is simple and easy to follow then
| the pay off of building a diagram just isn't there.
| lurker137 wrote:
| I'm starting to see all the problems with static diagram
| creation. I still think there can be more work in this area,
| maybe a full dynamic solution separate from the IDE that just
| helps you navigate the maze and make your own diagrams. With
| all the work that's been going into making developers' lives
| easier surely someone could focus on the exploring codebase
| part.
| altgeek wrote:
| "... make the diagram concise and understandable; the sort of
| decisions that machines just aren't that good at. ..."
|
| Exactly. I've found that there is no perfect tool and I'm
| closing in on 40 years of writing software. I use a blend of
| UML, IE (James Martin's Information Engineering), etc. Anything
| that makes it "easy-to-grok". Draw.io generally does what I
| want it to do; there is no feedback loop back to the codebase.
| It's really just to help onboard new helping hands.
| etimberg wrote:
| Paper and pencil is definitely the way to go. If i make
| something that turns out to be useful I'll make a digital
| version after.
| CipherThrowaway wrote:
| Boxes and arrows are a bad representation for complex systems
| with detailed relationships. Legible diagrams are limited to high
| level representations of a system where many details are left
| out. Constructing useful high level views of complex systems
| requires human judgement.
|
| Generation of legible diagrams could be accomplished on a domain
| or framework basis where code is subject to local patterns and
| can be structured "for" generation. We see this with things like
| OpenAPI schema generation.
|
| Ultimately I think diagramming isn't prioritized because diagrams
| themselves aren't that valuable. They're just a medium for the
| actually valuable thing: high level representations.
| vidanay wrote:
| Because most code diagrams would look like a solid black square.
| lurker137 wrote:
| It's more the layers of wrappers around the black box I have
| trouble with. Maybe it's just a Java problem
| Kapura wrote:
| - Why isn't diagram generation automated as part of the build
| process (UML or otherwise)?
|
| It's another thing that can break, another element that needs to
| be maintained. In my experience there are very few pieces of code
| that will be able to run indefinitely without ever being updated,
| fixed, or re-examined at some point. The cost of adding more
| processes is not one-time, and it can be difficult to figure out
| what the time bounds are.
|
| - Why aren't code visualization tools more popular? The options
| out there seem outdated
|
| People who are interested in the structure of code are typically
| engineers, capable of writing and reading the codebase of
| interest. A UML diagram may be a way to understand an element of
| the system, but things such as in-line comments in the codebase
| itself are often more instructive on structure and function.
|
| - Would you want to use these tools? What would be your ideal
| tool?
|
| When I was in high school, if I didn't want to read, say Crime &
| Punishment, I could buy the Cliff's Notes version, and get a
| chapter-by-chapter summary of major characters, events, and
| literary techniques. In many ways, it contained all of the
| information of the book without the substance.
|
| But importantly, it took significantly less time to read and
| fully process than the book, while being written in the same
| language. In code, it is already extremely easy to look thru
| header files, or collapse every function in your IDE to get a
| high-level overview of what data and methods exist. You can then
| dive in immediately to anything you would like to understand
| better ("what does 'UpdateSignificanceValue' really mean") and
| there's no mental overhead in translating from an encoded diagram
| into whatever your mental model is. This is why I do not
| personally see value in code visualization -- outside of notes I
| take that are relevant to any specific problem I am working on.
| la64710 wrote:
| I have found ctags extremely useful and effective.
|
| http://logan.tw/posts/2015/03/10/trace-source-code-with-vim-...
| forinti wrote:
| Even a few very basic UML diagrams (Use Cases, Class, Sequence)
| can form a very effective introduction to a system.
|
| I feel that people have just embraced Agile blindly and simply
| forgot about basic modelling.
| hotcrossbunny wrote:
| Absolutely this! Communication of design is an enormous gap
| that has emerged in the last could of decades.
| rramadass wrote:
| >I feel that people have just embraced Agile blindly and simply
| forgot about basic modelling.
|
| Very Good Point!
|
| If you only look at things piecemeal and never holistically,
| the need for modeling and corresponding tools decreases.
| Frost1x wrote:
| Embraced or been forced into it? If you want to do long term
| planning and design, good luck. Everyone wants to continuously
| change their mind on what they want/need and have the software
| react yet they also somehow assume this approach creates well
| defined systems when it does quite the opposite. Adaptation can
| still create reliable and well defined systems but the rate of
| adaptation needs to be reasonable. In agile that simply isn't
| the case, it's just a way to pass consumer demand and
| responsibility for meeting that demand right down to developers
| while arbitrarily placing budgetary and time constraints around
| that process. Development teams are often acting as small
| businesses anymore with similar risks but less rewards with a
| middle men sitting between them and the consumer, unless you
| work at a large tech company where that's still a little bit
| insulated although not entirely when product lines are killed
| off.
|
| Ultimately, you just create and endless amount of complex work
| that keeps developers continuously busy. On the bright side
| there's a never ending amount of tedious work wrestling systems
| back into some manageable form, on the downside that work is
| miserable, in my opinion because much of it can be removed whth
| proper planning. At some point, expectations eventually meet
| reality no matter how many developers management burns through,
| at some point it's clearly not an issue with technology, it's
| an issue with approach and project management. By that time the
| organization has had enough turnover in those above and below
| those pushing agile that those issues too can be hand waived
| away and the cycle repeats.
| rramadass wrote:
| CASE Tools with round-trip engineering need to make a comeback.
|
| To answer your question, people do use various tools to extract
| Class Hierarchies, Call Graphs, Cross-Reference listing etc. The
| other HN thread that you have linked to contains some details.
| Lots of people do use them. You can easily add Doxygen/CFlow etc.
| to your make files to generate the diagrams during every build.
| The key thing for usage is that do not try to comprehend the
| entire system as a whole (all but impossible for large systems)
| but localize your study to a module at a time. Once you have the
| different pieces mapped out, you can combine them by hand.
| davidy123 wrote:
| Fully agree. A diagram is ok to describe a flow, but for
| complex code or systems, round tripping is a necessity,
| otherwise the diagram is or quickly becomes inaccurate and
| worse than useless. In the 90s I tried software called
| TogetherJ, it seemed to support round tripping really well, so
| well maintaining code was the same as maintaining diagrams, and
| led to better quality in both, along with documentation and
| other benefits such as relaying higher order concepts. I just
| did a search for togetherj and the only reference I could find
| was a 20 year old forum mention. Weird. I suppose there must be
| high end enterprise CASE software that still supports this
| approach, though I'm guessing most people abuse it so it's no
| longer respected. I think these things happen in cycles, as we
| can 'orchestrate' larger systems with meta descriptions of
| components, they become more valuable.
| rramadass wrote:
| I remember both Rational Rose and TogetherJ CASE tools.
| Round-trip engineering was supposed to be practiced by both
| Domain Experts and Programmers each modifying their models
| and still having them all consistent with each other. But
| what happened in practice was that only Programmers used them
| who did not see the utility of updating a UML class hierarchy
| instead of directly updating the Class itself. That and the
| exorbitant pricing is why they fell out of favour. It is
| really a shame because with the explosion of distributed apps
| using a variety of languages/tools a single uniform interface
| modeling all aspects of the software is sorely needed.
|
| PS: Came across the book _Software Visualization -
| Visualizing the Structure, Behaviour and Evolution of
| Software by Stephan Diehl_ which seems to provide a good
| survey of the field.
| icedchai wrote:
| Diagrams are traditionally used during the design process, if
| they are used at all. I've seen sequence diagrams on some recent
| projects. Class diagrams and others? Not so much. Often such
| diagrams represent an idealistic view, lacking detail, and often
| deliberately disconnected from reality (otherwise, the diagrams
| would be a real mess.)
|
| Also I have rarely seen diagrams generated from code, the main
| exception being database ERDs ("reverse engineering.") Usually,
| those diagrams are also a mess.
|
| Also, I almost forgot to mention: with "Agile", there usually is
| no design process. We'll just "fix it in the next sprint."
| rgoulter wrote:
| Maybe it'd be neat.. but, I think sometimes "the map is not the
| territory" goes both ways. - I probably want a diagram to be
| simpler than the actual system.
|
| With a manually constructed diagram, I have leverage to handwave
| irrelevant details away.
|
| Perhaps to compare with documentation: it's easy to automatically
| describe things like types, and maybe callgraphs, but there's
| value in having prose which explains details about the interface
| which the program's type doesn't reveal. - With diagrams to
| visualise a system, the significance (or incidental nature) of
| the relationships may be hard to pick automatically.
| nelgaard wrote:
| Yes exactly. There are no tools that can build proper mental
| models.
|
| Most of a system is either uninteresting or trivial. You need
| someone to tell you where the interesting part is.
| phailhaus wrote:
| > - Why isn't diagram generation automated as part of the build
| process (UML or otherwise)?
|
| This is very hard. And since it's hard, it's not automated. And
| since it's not automated, it goes out of date very quickly. I
| think that's the fundamental issue: keeping around evergreen
| documentation is a lot of overhead. There is no connection
| between the code and the diagrams, so it's too easy to change the
| code and not realize that the diagram needs to be updated too.
|
| Another thing is that it's really the most useful for new
| members. If you've been working on the infra for a while, you
| already know the structure and you don't need the diagram. So
| teams tend to just avoid the diagrams altogether.
| sidlls wrote:
| Diagram generation is plagued by the same problems as the
| "Rational Rose" fantasy of automatic code generation from
| diagrams: trivial applications are trivial to diagram, and non-
| trivial ones defy it, as the complexity (dependencies tend to
| form dense, multiply connected graphs in these applications)
| quickly outstrips any straightforward mapping to a visual
| representation.
|
| I wouldn't use these tools anyway, to be honest. They have some
| limited utility when constrained to small components/parts of an
| application (e.g., self-contained libraries), but for
| understanding systems as a whole there is too much to have
| effective reverse-engineering into a visualization (in my
| opinion).
| bullen wrote:
| I made this node/tree editor 20 years ago:
| http://move.rupy.se/file/logic.html
|
| It has been used for database schemas, game story creation,
| cutting up sprites among other things...
|
| Lately I made my own node database so I don't need this tool any
| longer, but I'm sure it will prove useful eventually again!
| zamalek wrote:
| > Why aren't code visualization tools more popular? The options
| out there seem outdated
|
| In my experience these tools _generally_ exist to facilitate
| bikeshedding. The academic nature of UML makes it pretty useless
| in the real-world.
|
| Something that could be useful is having a tool that uses
| knowledge about code to help you build a mindmap (but does not
| just puke the whole thing out). Huge bonus points for allowing
| the user to create late-bound relations and conceptual
| boundaries. Finally, one of these tools should be able to compare
| its output with the source, and indicate what has changed
| (through deletion/addition, or via VCS diff).
| high_byte wrote:
| for reverse engineers this is the norm. IDA, BinaryNinja, Ghidra,
| etc... which to see more of it for higher level languages.
|
| dot graphs are popular with many tools but often barely or not
| interactive at all.
| cheunste wrote:
| > Why isn't diagram generation automated as part of the build
| process (UML or otherwise)?
|
| I vaguely recall Visual Studio has this option where you can
| generate some sort of class diagram. It looked like shit the last
| time I used it (~2019) especially as your classes get more and
| more functions built into it. I also can't imagine how shitty it
| looks for codebases that have a significant coupling problem.
|
| Furthermore, creating a UML diagram is a documentation process
| rather than something that should be automatically built in. I
| put it on the same level as writing a document in a word doc or
| something that's done as the project gets closer to being
| finished. Some places can live with it, a lot of places (actual
| software companies) probably do not as they move unreasonably
| fast (Agile) which does not even allow time for documentation or
| they just purposely neglect documentation.
|
| > Why aren't code visualization tools more popular? The options
| out there seem outdated
|
| Because they look like shit. I tried mermaid with markdown, I was
| not happy with the results, I tried plantUML back in 2019, I
| hated how it ended up looking, I hated how I have to install java
| for it, and I gave up on it pretty quickly.
|
| The only code visualization tool I ever use is either draw.io or
| MS Visio. At lease there's a plugin for that for VS Code.
|
| > Would you want to use these tools? What would be your ideal
| tool?
|
| Markdown with vim option. It also must have an option to force a
| top-down flow approach and not freaking forcing it to be a left-
| right layout
| kevan wrote:
| >It looked like shit the last time I used it (~2019) especially
| as your classes get more and more functions built into it. I
| also can't imagine how shitty it looks for codebases that have
| a significant coupling problem.
|
| That's the point, right? Visually representing the complexity
| of the system. I've used IntelliJ to do this before to show why
| modifying certain behavior was so slow and error-prone. In that
| case there were 3-4 classes with heavily overlapping
| functionality because, surprise, in the past there were
| multiple teams contributing to the same codebase that all did
| their own thing.
| Weidenwalker wrote:
| I've already mentioned this on the other thread
| (https://news.ycombinator.com/item?id=31569646), but my friend
| and I have been working on https://www.codeatlas.dev as a
| sideproject - it's a tool for creating pretty (2D!)
| visualisations of codebases, while providing additional insights
| via overlays (e.g. commit density, programming language or other
| results from static analysis like dead code/test coverage/etc.).
| For example here's the Kubernetes codebase visualised using
| codeatlas: https://www.codeatlas.dev/repo/kubernetes/kubernetes
|
| At the moment, codeatlas is just the static gallery, but we're
| only a few weekends away from releasing a Github action that
| deploys this diagram on github pages for your own repos - if
| you're interested, feel free to watch this repo:
| https://github.com/codeatlasHQ/codebase-visualizer-action
|
| OP, how close is this to what you had in mind in your question?
|
| EDIT: fixed broken links :o
| [deleted]
| lurker137 wrote:
| I've since been convinced that what I had in mind initially
| (generating a bunch of static diagrams with each build) is not
| very useful. Your site comes closer to what I think would be
| the better solution, an interactive diagram, but at the level
| of classes/functions and their interactions instead of
| files/folders. Your project looks great for exploring a Github
| repository though.
| [deleted]
| mthoms wrote:
| Just a heads up: Your links are broken. I think it's because
| you are using Reddit's syntax which HN doesn't support.
| Weidenwalker wrote:
| Ah thanks!
| mariojv wrote:
| This isn't a tool for generating diagrams from actual code, but I
| have really enjoyed using PlantUML lately while putting together
| design or architecture proposals: https://plantuml.com
|
| As someone who is not a very visual person at all, I found it
| really nice to use to make my design docs more comprehensible to
| visual learners. I've gotten good feedback about designs every
| time I've used the tool.
| flohofwoe wrote:
| I think UML (etc...) was one of those things that look great on
| the surface but once you start diving deeper all the problems
| hidden under the surface become overwhelming. If a thing has been
| tried many times in the past and even with a lot of money thrown
| at it, yet it _still_ disappeared into obscurity, then it 's a
| pretty good sign that the idea wasn't great to begin with.
|
| In practice it's the same problem as "noodle graph" visual
| programming. It works well in some niches (e.g. creating shaders
| in graphics programming, or sometimes describing AI tasks in game
| programming), but it completely breaks down outside those niches.
| la3lma wrote:
| It's hard because programming is hard :-). I still believe UML
| is great, but the difficulty is to make the diagrams so precise
| that they convey crucial understanding, yet so abstract that
| they hide as much detail as possible.
|
| That is nontrivial, and it is very hard to do well. But it is
| also the essential job necessary when designing software and
| then communicating the essence of that design.
|
| My favourite tool btw is plantuml. It lets you describe
| diagrams (class, sequence, deployment) with text/algebra.
| Plantuml works well up to a point where the diagrams becomes to
| complex for the layout algorithm to do well.
|
| I used to think of this as an annoyance, but now I think of it
| as a feature: It is a way for the universe to tell me that the
| model is becoming too complex. The layout algorithm serves as a
| proxy for everyone else that should parse the diagram, and if I
| can make the diagram better by simplifying, so be it.
|
| Now, a human can do diagram layout better than plantuml, so a
| human can easily concoct diagrams that are both more complex
| and better looking than plantuml, but it is my firm belief that
| this usually not a good thing: It more often than not means
| that the message is lost in the complexity of the diagram.
|
| Keep it simple!
| NonNefarious wrote:
| Plus, if it's as much work to create a diagram as it is to do
| much of the programming (not to mention maintaining it), you're
| just not going to do it.
|
| One type of diagram I have found to be truly useful, though, is
| the sequence diagram. I needed to integrate someone else's
| library into my application, and having this was a huge help.
|
| If anyone has a pointer to a good sequence-diagram generator
| (that runs on Mac, preferably), I'd be happy to hear about it!
| HighlandSpring wrote:
| One of the problems is: what is the language to describe these
| diagrams? We do have UML and it's various variants: PlantUML,
| Mermaid but these are too low level to prescribe conventions over
| how to use these to describe complex architectures. A sequence
| diagram could describe anything through customer journeys, rest
| api call patterns to call stacks within a VM. Granularity/level
| of abstraction needs to be captured or else you end up with
| metres squared of boxes that cannot be parsed at a glance unless
| you're Rainman.
|
| The closest I found that solves this problem is
| https://c4model.com/ but you still need the code to turn your
| code into these markups. Can this be well inferred from code
| alone without framework specific interpreters? I doubt it.
|
| And then you still need a frontend to zoom and navigate the
| ridiculous amount of hierarchy found within any modern software
| architecture, e.g microservices.
|
| It also doesn't help microservices patterns also prescribe that
| you don't share repositories or code. So now you also need to
| pattern match untyped references across these codebases.
|
| This is a lot of convention and tooling that I'm not sure exists.
|
| Edit: and this is before even getting into version control and
| reconciling the target->as-is iterative loop.
| pjot wrote:
| I really enjoy(ed) using c4. Still need to figure out how to
| protect against screenshots though!
| HighlandSpring wrote:
| Protect against screenshots? Could you elaborate please
| Dangeranger wrote:
| I believe the parent commentator is referring to how
| screenshots of a diagram generated at build or runtime are
| almost always out of date.
|
| It's better to generate a diagram as needed than to archive
| an outdated artifact that could lead to confusion.
| isbvhodnvemrwvn wrote:
| We just treat them as any other artifact, a plantuml job
| builds it and it gets published under a certain URL.
| humbleMouse wrote:
| AtlasBarfed wrote:
| They can't even really diagram databases effectively, and that's
| "just" data.
|
| Maybe you'd need a three dimensional model (really it's likely
| n-dimensional/hyperdimensional), 2D might not be enough.
|
| Programming models get so convoluted with regards to state and
| interactions, both in-memory/in-process state and the stored
| state in databases/files.
|
| Jurassic Park's 3D filesystem was a pie in the sky idea, what, 30
| years ago? Holy crap it was 29 years ago or so. We've had
| REVOLUTIONS in 3D processing and games, and never even stratched
| the surface of basic 3D visualizations of code or data or
| filesystems or machine networks or the like.
|
| And then even if you represent a diagram, it's useless without
| time visualization/traces, as kind of referred to by the RR
| debugger post. So for active code, you'd need simulation or
| actual run data to show what it does visually to be effective.
|
| Really what's being dealt with here is probably related to theory
| of computation, and various results like the undecidability of
| the halting problem. The halting problem shows that even for very
| basic languages that are minimally Turing complete, the
| complexity shoots VERY QUICKLY to massive degrees of
| infinity/uncomputability.
|
| So some catch-all visualizer for even general classes of Turing
| complete languages is probably impossible.
|
| Maybe something like "this is a java spring app with well
| regimented separationg of data/domain classes and service
| classes"...
|
| Even then once you get to database persistence ... wow.
|
| And the amount of data you'd need to store for test runs.
|
| Spring + TDD enforces a certain simplicity to a codebase, so
| perhaps you could make effective classes of visualization and
| tracing/replay visualization for that.
|
| But it is telling these tools don't really exist, and attempts
| like UML were largely abandoned.
| ataylor284_ wrote:
| A while back I did a demo and wanted a diagram to show what was
| going on. I stumbled on http://www.plantuml.com which did exactly
| what I wanted: it took mark up and turned it into an image that
| could be embedded in a github markdown document.
|
| That said, diagrams can either be rare, focused, and useful; or
| common, unfocused, and distracting. Automated processes tend to
| generate the latter.
| mynegation wrote:
| At the beginning of my career I worked for a company that was
| started with two reverse-engineering tools: one to produce low-
| level, single method/function flowcharts and another for
| automatic extraction of high level components and connections
| between them. It retired the former and later - the latter and
| pivoted to static analysis tools: finding logical errors,
| security vulnerabilities, enforcing coding standards. So I have
| first-hand knowledge of what worked and what did not work.
|
| The main problem with low-level code visualization was that it
| did not add much to the well-formatted code representation in
| most cases. As for the high-level architecture extraction tool,
| which is more close to the question in the article, many links on
| the diagram do not just involve header inclusion, module import,
| method calls etc that are relatively easy to extract (not without
| its own challenges with virtual and indirect calls though). Users
| wanted to see Inter process communications (socket, queues,
| pipes, http connections) and extracting those is an uphill battle
| though we introduced some of it (lots of custom, platform
| specific code). Between this and knowing which connections are
| important and which are less so, automatically extracted diagrams
| were of limited value.
| rramadass wrote:
| What were the two tools used?
| mynegation wrote:
| Not sure I understand the question but the two tools company
| started with were (1) visualizing control flow graph of a
| single method/function (2) extracting hierarchical components
| from directory or package structure and relationships between
| them (imports, includes, calls etc). The ones that ended up
| being used were byproducts of the "compilers" that we had to
| implement: finding logical errors and security
| vulnerabilities based on control flow and data flow analysis
| (pretty much what conventional compilers like clang and gcc
| do for code generation and optimization).
| porcoda wrote:
| I've had this need a few times. Just a couple weeks ago I needed
| to quickly understand the set of package dependencies within a
| codebase and wrote some scripts that extracted a report as well
| as a graphviz file. I've done that a few times over the years.
| The biggest obstacle to a general purpose tool usually is the
| compiler front end that is needed to correctly parse the code to
| get the entities and relations you need to visualize. Without
| that it's hard to write a reliable tool for extracting the
| information, and if you care about multiple languages you need
| multiple front ends.
|
| People do want it (contrary to the common HN refrain of "well,
| _I_ don't want it so clearly nobody wants it"). We've had
| customers where I work specifically ask for these kinds of tools.
| They're just harder than they seem to write, not only for the
| parsing reason I mention above. For many codebases you see a
| giant ball of spaghetti if you look at the full graph, or the
| layout algorithm gives you something gigantic and hard to browse.
| That's a deficiency in graph visualization tools: again, a hard
| problem with little good tooling out there.
|
| I'd love to see more work in this area since there do exist
| people who see value in it, contrary to the skeptics.
| enos_feedler wrote:
| I would say the strongest use case in my experience is the
| reverse engineer who is trying to understand an executable. In
| this case tools like IDA Pro have diagram generation built in and
| all kinds of plug ins. However, it's because staring at machine
| code listings makes your eyeballs bleed. I have a feeling we
| don't have this more generally because high level source is
| decent to read and navigate and developers are the only one who
| needs to do so.
| ooedemis wrote:
| i think the hype is producing working solutions over
| documentation
| tonnydourado wrote:
| I think the problem is much harder than we think it is. In my
| experience, when you "hand build" a model of a codebase, either
| in your head, in your head plus notes, or in an actual diagram,
| you make *a lot* of executive decisions: what relationships you
| focus on, at what level, whether to include implicit or dynamic
| relationships, etc. None of this is easily automatable, and some
| might be virtually impossible to.
|
| Compare that to something like a call graph, or a module
| dependency diagram. The last will be more complete, but will
| convey *much less* information than the later.
|
| This varies with technology, some will be more friendly than
| others to this kind of tool, I think that the more dynamic, the
| worse, but even in very static and consistent language, I would
| not bet on any tool being better than the brain's parser for a
| long time.
___________________________________________________________________
(page generated 2022-06-04 23:01 UTC)