[HN Gopher] On navigating a large codebase
       ___________________________________________________________________
        
       On navigating a large codebase
        
       Author : mooreds
       Score  : 248 points
       Date   : 2021-02-14 02:41 UTC (20 hours ago)
        
 (HTM) web link (blog.royalsloth.eu)
 (TXT) w3m dump (blog.royalsloth.eu)
        
       | jordanbeiber wrote:
       | Tangentical, but relevant for complex systems and organization of
       | them and their code:
       | 
       | I've started to look at BPMN, a thing I used to shun (bloated
       | java enterprise junk that just slows coding down), as a way to
       | actually help organize code.
       | 
       | If you have a process layer on top that describes exactly what is
       | supposed to happen and you organize code accordingly it makes
       | changes to complex systems easier to reason about.
       | 
       | I know there's a lot more to it than this, but in my mind
       | concepts from domain driven design, minimal service scopes and a
       | de-coupled process layer really can help.
       | 
       | I guess this is why "tech" versions of bpmn-ish like systems have
       | appeared with for example netflix conductor, uber
       | temporal/cadence.
       | 
       | Split stuff up in relevant domains and describe their
       | relationships and processes. Try to stay away from massive code
       | bases spaning domain boundaries if possible.
       | 
       | No silver bullets anywhere ofc, but this is currently a topic at
       | my employment this very moment. :)
        
       | davidhyde wrote:
       | This is a great article, I can relate. I specialise in replacing
       | large parts of codebases with code that does the same thing from
       | a business point of view but that makes future changes cheaper to
       | make. One thing I thing that is worth mentioning is the political
       | aspect of this sort of work. The people in power need to be
       | comfortable with the fact that you will be introducing risk
       | without immediate reward. That is a tough sell to someone who is
       | used to putting out fires and writing root cause analysis reports
       | to management. Sometimes this can't be done and you have to hide
       | your refactor in real business change work. This is not fun
       | because it usually makes it look like you're a slow dev.
       | 
       | In addition, most developers are fiercely defensive of their code
       | and you need to be aware of that when you chose to replace it. A
       | trick I find useful is publicly declaring, in your team meeting,
       | how useful you found x members tests in covering your refactor.
       | Or their comments or documentation or domain knowledge. When you
       | are picking their brains for implementation specifics try to
       | sympathise with them when you see a bit of hacky or confusing
       | code. Say "I've had to do something like that before because of
       | xyz". It will save face and you will get more out of the
       | developer. Never criticise, they will know what they have done
       | wrong without you telling them. Just, be nice.
       | 
       | If you are looking for devs that can do this sort of work
       | effectively then get them to read code in an interview and
       | explain what it does to you. They can even offer suggestions and
       | you get to see how they deliver criticism. Not complex
       | algorithmic code but simple vast swaths of business junk.
        
         | sgtnoodle wrote:
         | That's basically been my job the last five years. As soon as I
         | joined the company (a scrappy startup of 30 people), I started
         | refactoring large portions of mission critical code prioritized
         | mostly by how terrified other coworkers were of touching it. In
         | the beginning, folk were rather skeptical, but now that the
         | company has grown by an order of magnitude, my earlier work has
         | apparently become a topic of folklore in other parts of the
         | company. As it usually goes, I spend most of my time these days
         | mentoring and in meetings, but I still try to find time to
         | refactor more fragile bits of code before they fall over
         | completely. I encourage my team to tackle technical debt head
         | on rather than work around it whenever possible. Far too often,
         | folk spend more time avoiding solving a problem by patching
         | around it, and it's usually because they're too afraid to dive
         | in and change code that's hard to understand and therefore
         | scary. For whatever reason, I've always had a can-do attitude
         | when it comes to that type of work. You're paying me to get the
         | job done, so give me an impact driver and the biggest hammer
         | you've got. If I break something, it usually means it wasn't
         | built strong enough to begin with. I've broken a lot of stuff
         | over the years...
        
           | zealsham wrote:
           | Is this Jeff Dean ?.
        
             | sgtnoodle wrote:
             | Lol, maybe the crappier Arduino version. My name is Jeff
             | though.
        
           | plif wrote:
           | > If I break something, it usually means it wasn't built
           | strong enough to begin with. I've broken a lot of stuff over
           | the years...
           | 
           | Based on my experience, this statement scares me :)
           | 
           | Not to pass any judgment on your impact or abilities, however
           | the types of devs that have been the most challenging for me
           | to work with are those with this attitude that aren't quite
           | as good as they think they are. It can be incredibly toxic to
           | the rest of the dev team and generally bad for business.
           | 
           | You need to have a very strong handle of both the business
           | side an tech side to do this type of work effectively.
           | Meaning: no matter how much technical debt there may be, some
           | stuff cannot afford to be broken. Judging risk there is quite
           | challenging as you need a holistic view. I would strongly
           | caution people from diving in and making sweeping changes if
           | they don't have this.
           | 
           | The other internal flag that went off is refactors that
           | improve parts of the codebase in isolation while leaving a
           | less cohesive / congruent codebase a whole. This is often
           | worse in the long run than just patching it and actually can
           | make changes harder.
           | 
           | Disclaimer: I am in mostly a management role now so you can
           | take the above with an appropriately sized grain of salt.
        
             | sgtnoodle wrote:
             | Certainly you want a refactoring effort to improve
             | reliability and maintainability rather than harm them. I am
             | a strong proponent of writing an "architecture document"
             | before touching code to do anything but patch a straight
             | forward bug, and soliciting feedback on it long before code
             | review. This is precisely what develops that holistic view
             | you mention. One of the first things I tackled in this
             | codebase was to introduce abstractions to enable unit
             | testing of code that was previously considered not unit
             | testable. As the team has grown, we've developed processes
             | to ensure that everyone explicitly considers risks and how
             | to mitigate them whenever they make a code change.
             | 
             | I also agree with you that it's best to be pragmatic when
             | it comes to developing software for a business. Code that's
             | ugly but works is perfectly fine. When it no longer works
             | one day, patching it to keep the lights on is the right
             | course of action. When the same ugly code breaks over and
             | over, though, it's time to solve the root of the problem.
             | Sometimes there's inherent risk in doing that, and things
             | break; it's necessary to do it for the long term good,
             | though.
             | 
             | I try to write code that doesn't need to be touched again,
             | but is pleasant enough to dive back into should you
             | inevitably need to extend or debug it. I also try to reuse
             | existing code and improve it as needed rather than create
             | what I call "parallel codebases". I try to mentor my
             | coworkers to do the same. If achieved, then it's a huge
             | productivity multiplier.
             | 
             | I think I'm pretty easy to work with. I am confident in my
             | abilities as a software engineer, but I'm also relatively
             | modest. I try to respect work that was done before me and
             | carry the good parts forward if it ends up needing
             | refactoring. I prefer to let less experienced coworkers
             | tackle problems similar to problems I've solved in the past
             | while providing mentorship, so that they can learn similar
             | lessons. I've avoided management because I know I'm bad at
             | it, but I try to support management however best I can. I
             | also throw the occasional team homemade pizza party when
             | there aren't pandemics. Notably, I also tend to be able to
             | work with the stereotypical difficult-to-work-with devs
             | that you mention. My coworkers generally seem to say nice
             | things about me to my face and behind my back, and upper
             | management seems to reflect their appreciation financially.
             | Honestly my biggest interpersonal problem at work right now
             | is that newer employees seem to hesitate reaching out to me
             | for fear of wasting my time. Therefore I try to make it
             | known that I spend as much time staring at the wall as
             | possible during work hours.
        
         | hackeredje wrote:
         | Given time, every developer will end up in some company, doing
         | projects, having to go trough other vast requirements,
         | documentation, databases, interfaces and codebases to
         | understand these.
         | 
         | So I think you refer to persons who do not do projects but who
         | stay i one company for a large time versus people who do
         | projects (and are sometimes called consultants) and in general
         | how they communicate with each other.
        
         | cornel_io wrote:
         | Yes, absolutely never criticize - as a manager, the #1 thing
         | that makes me start to hate a report is when they complain
         | about other people's work. Most of the time they don't
         | understand why code was written the way it was (Chesterton's
         | fence), and even when they do and are making valid complaints
         | it's just a dick move that doesn't help.
         | 
         | Trust me, the lead and manager both know when someone sucks,
         | they don't need to hear more about it. And if you're wrong with
         | your criticisms, you just demonstrate that _you_ suck.
         | Literally lose lose.
        
           | Erlich_Bachman wrote:
           | > Trust me, the lead and manager both know when someone sucks
           | 
           | Through which mechanism do you think they know it? If nobody
           | ever tells them that some piece of code is bad, this
           | mechanism doesn't do its job.
        
       | sillysaurusx wrote:
       | I navigate codebases by cat'ing all the files together (prefixed
       | with filename) and piping it into vim.
       | 
       | I was shocked how much I learn by this seemingly-horrible
       | technique. For example, the python files that actually get
       | deployed are often quite different from the ones in source
       | control. For tensorflow, at least.
       | 
       | I regularly read 1M+ lines of code this way. Not an exaggeration;
       | vim scales, nothing else does.
        
         | Gehinnn wrote:
         | If you read/scan 10 lines a second, you still need over 24
         | hours non-stop to read a 1M+ code base. I doubt a random file
         | ordering is helpful! Especially if you lose code navigation
         | features like "go to definition".
        
           | sillysaurusx wrote:
           | You'd be surprised.
           | 
           | Suppose I want to "go to definition" for a class named Saver.
           | /^class Saver\>
           | 
           | 19 times out of 20, this works. It's also instant; my vim
           | will likely get me there faster than your IDE's go to
           | definition functionality. (Looking at you, pycharm!)
           | 
           | Here's my flow.                 >>> import tensorflow as tf
           | >>> tf.train.Saver       <class
           | 'tensorflow.python.training.saver.Saver'>       >>> from
           | tensorflow.python.training import saver       >>> saver
           | <module 'tensorflow.python.training.saver' from
           | '/usr/local/lib/python3.7/site-
           | packages/tensorflow_core/python/training/saver.py'>
           | 
           | Then I open /usr/local/lib/python3.7/site-
           | packages/tensorflow_core/python/training/saver.py.
           | 
           | Suppose I want to know: Where are all the places that Saver
           | is used in all of tensorflow?                 time find .
           | -type f -name '*.py' | xargs merge | ft py       Vim: Reading
           | from stdin...            real 0m0.930s       user 0m0.243s
           | sys 0m0.414s            /\<Saver(       :%v//d
           | 
           | Boop: https://i.imgur.com/EJ5bZSW.png
           | 
           | Literally every usage of Saver in all of Tensorflow.
           | 
           | So let's say you're interested in a specific line. This one,
           | for example:                 # being added to the
           | GLOBAL_VARIABLES collection, so that Saver()
           | 
           | Boop: https://i.imgur.com/YSAzhCJ.png
           | 
           | I did that by highlighting "being added to the
           | GLOBAL_VARIABLES", ctrl-c, then pressing "u" to undo the
           | :%v//d, then / followed by ctrl-v.
           | 
           | That might sound hard, but with muscle memory I don't even
           | think about it -- it's like explaining how you open a can of
           | food. Do you really think about where you place your fingers,
           | or the pressure of your nail on the flap of the can? No, you
           | just open it up. Same thing here; it's automatic.
           | 
           | Way faster than IDEs, and I get just as much (or more) info.
           | 
           | I'd love to use pycharm, but the slowness keeps pushing me
           | back to this technique.
        
             | fraculus wrote:
             | I agree that things like "Go To Definition" can be pretty
             | bad (especially when a codebase has code that is auto-
             | generated, but you haven't figured out how yet).
             | 
             | But I'm curious, what benefits do you see in your approach
             | over just doing a grep in the folder (or e.g. Ctrl+Shift+F
             | "Find in Files" in Pycharm)?
        
               | disgruntledphd2 wrote:
               | grep type approaches are really good when part of your
               | application is generated as SQL strings to make tables
               | (which is a pathology sadly common in most data science
               | codebases).
        
             | mejutoco wrote:
             | Nothing wrong with this (big fan of vim myself). I can see
             | the appeal and simplicity.
             | 
             | I just want to add that in Intellij you can do Ctrl + Shift
             | + R "Saver" and it will search in all files (as dumb text
             | matches, not usages), plus optional checks on extensions,
             | etc. This is pretty fast too, and quite convenient, since
             | it has a preview for each file. Not saying it is better,
             | but it is an alternative.
        
               | sillysaurusx wrote:
               | I'm happy you mentioned that, because this highlights a
               | very important difference: Your IDE would show you all
               | "Saver" files in some checkout of Tensorflow, right? But
               | you usually don't want to see the latest version of
               | Tensorflow. You want to see the _current version_ that 's
               | installed, which is somewhere under /usr/lib.
               | 
               | I haven't found any IDE that can easily and effortlessly
               | do that.
        
             | amw-zero wrote:
             | How do you handle finding occurrences of a specific
             | variable with the same name as another one? Simple text
             | search has the power equivalence of a dull butter knife.
        
             | minusf wrote:
             | i do something similar but with tags and without the
             | merging. vim's tag navigation is very powerful. split a
             | window with the target tag and see side by side the class
             | and its descendant for example.
             | 
             | i use `venv`s for every project inside the project folder
             | so uctags also generates tags for all libraries installed
             | and i can "drill all the way up" to classes and
             | definitions.
             | 
             | it's also possible to have the system libraries show up in
             | the tag database, it's just a matter of telling uctags
             | which path's, files to include/exclude or alternatively use
             | another tag file for that, vim can use multiple tag files.
             | 
             | the real pain point is keeping the tag file update.
             | gutentags makes this a bit more easy for me.
        
               | sillysaurusx wrote:
               | _i use `venv`s for every project inside the project
               | folder so uctags also generates tags for all libraries
               | installed and i can "drill all the way up" to classes and
               | definitions.
               | 
               | it's also possible to have the system libraries show up
               | in the tag database, it's just a matter of telling uctags
               | which path's, files to include/exclude or alternatively
               | use another tag file for that, vim can use multiple tag
               | files._
               | 
               | Oh?
               | 
               | Yours is the first system I've found that has this very
               | important feature -- the whole reason I do it my way is
               | because I can dill down into the actual installed
               | libraries, whereas IDEs almost always fail. (It's hit or
               | miss. Yeah, theoretically you can configure the IDE
               | properly if you spend your life becoming an IDE master
               | and have a 96-core workstation, but it never seems to
               | "just work.")
               | 
               | If you ever do a writeup of how precisely you've set up
               | your environment, do ping me! I'm
               | https://twitter.com/theshawwn. I'd be very interested and
               | would happily retweet it.
        
               | 00N8 wrote:
               | I use pycharm and as long as I set the virtualenv I'm
               | using as the Project Interpreter in Preferences it lets
               | me "drill down" to the library code as well
        
           | disgruntledphd2 wrote:
           | With the language server protocol, this works reasonably well
           | in Python code bases for me in Emacs.
        
         | coderdd wrote:
         | I have great success using Zoekt, a Google codesearch
         | reincarnation, for instant grepping.
        
       | penguin_booze wrote:
       | I'm of the opinion that use of IDE and non-essential tooling must
       | stay in the private realm, much like one's idea of the almighty,
       | and politics.
       | 
       | "This project doesn't build/can't run from command line! What
       | the...".
       | 
       | "Oh well, I use my [favorite IDE]. It works for me".
       | 
       | "The call paths aren't intuitive!"
       | 
       | "My [favorite IDE] shows cool/advanced visualization. Maybe you
       | should try it, too".
        
       | varajelle wrote:
       | In my opinion, apart from very tiny codebase that can fit in
       | someone's head, there is not so much differences between medium
       | and large codebase. You just have to use grep and the IDE to get
       | around the parts of interests for your task at hands
        
       | wwwigham wrote:
       | The article mentions the importance of comments and documentation
       | inline in code. I tend to agree - well-written code is great and
       | all, but a good comment can bring in context external to the code
       | and make _why_ code is what it is more clear to future readers.
       | And reviewers. Comments explaining _what_ code does largely
       | aren't needed - that's evident from usage. But the _why_? Some
       | people would say code which can't be explained by one liner
       | comments is "too clever". Well, I'm inclined to disagree - it's
       | hard to fit a full historical justification for an awkward
       | handling of an edge case into a single line. I once wrote a 17
       | line long comment above an 8 line diff; that much context felt
       | justified to explain the odd code. A reviewer, hilariously, had
       | this to say:                   This right here is "here be
       | dragons" commenting level Double Dragon.
       | 
       | When I come back to code I've written long in the future, I think
       | I'll be happier to have written the detailed "commenting level
       | Double Dragon" long comments, over the more ambiguous yet still
       | traditional `// HERE BE DRAGONS`. Mostly because that comment
       | will give me the context I need to know if any of that _why_ has
       | changed and thus in what way it's likely safe to change the
       | commented code.
        
         | gigatexal wrote:
         | Please, please comment code. Junior devs like myself will thank
         | you senior devs for throwing us a bone.
        
         | tetha wrote:
         | To me, there is also the question of change frequency. Of
         | course, it's a guesstimate all the time, but things that look
         | to be rarely-changing (and usually ends up being high-impact)
         | deserve over-documentation. Something like our AWS VPC, DNS and
         | DHCP setups and their expected change modes and their effects
         | are very well documented. It rarely changes and if it breaks,
         | everything breaks. The fundamentals of our disk setups and disk
         | encryption in ansible is carefully documented because if that
         | breaks, things will go hairy.
         | 
         | In those cases, having a lot of documentation, speeds up
         | changes because you can store months and years of deliberation
         | and decisions in these comments.
        
       | zoomablemind wrote:
       | Tons of complex code and noone to share the local knowledge?
       | Isn't it a setup for a failure... yet the org is still in
       | business, probably generating revenue.
       | 
       | So, I'd say the leverage is as always in understanding the
       | dynamic and politics that often brews in places with the
       | "monstrous" codebases.
       | 
       | Debug the people functions, so to speak, and it may eventually
       | help you navigate that codebase. No one needs to be another hero,
       | no one needs to burn-out while single-handedly fighting the
       | beast!
       | 
       | Well, politics often are messy and more unpleasant than the code
       | at hand, so we dig and rant...
       | 
       | In such case, I would try to limit the scope, instead of trying
       | to learn the secret language of gods that stitched the whole
       | system so it would make money.
       | 
       | My rule #1: WTF?!...But they must have had a reason for that.
       | 
       | Rule #2: Keep your code changes in-style; bug-free, of course,
       | but similar idiomatically.
       | 
       | Rule #3: Try to do something, then try to do it together with
       | others.
       | 
       | Maintaining a large codebase is not so much about tools as it is
       | about finding ways not to do it alone.
        
       | hackeredje wrote:
       | This was a nice read and is recognizable, probably a large part
       | of Dilbert comics could fit in here...
       | 
       | There are projects that last for multiple years with larger TEAMS
       | with the only job to entangle existing complex landscapes. Most
       | of them fail.
       | 
       | Since these teams do consist of pretty smart people... I think
       | one of the funny things you could do is list the things these
       | people say when they start this adventure on day 1 "ah yes lets
       | just grep stuff" or "i will start examining tests" and "i will
       | make a spreadsheet of all interfaces" and "i will do interviews
       | with older developers". About 6 months later the spreadsheet has
       | become a separate application that is so complex that it is a
       | complexity project on its own. The amount of documentation found
       | is now about a couple of million separate documents and
       | realization drops in that the lifetime of the universe is
       | probably nearer as end date. The datamodels found for the
       | gazillion databases now covers a library in itself. the end date
       | of the universe is closer by than the end date of the project
       | trying to understand what the environment is. And no it does not
       | help that any developer or business person ever involved long
       | left the company.
       | 
       | Comments: yes i agree. 80% is logical does not need a comment.
       | 20% are the pieces of code coming out of meetings that lasted
       | hours and which ended with strange outcomes that no-one will ever
       | understand without understanding why things were setup in the way
       | they were setup. And then there is the 20% added by junior
       | developers who had no clue but just changed stuff here and there.
       | It is hard to make that distinction because from the outside they
       | look alike. Anyone trying to change the code to make it "logical"
       | will remove the 20% illogical code and produce something maybe
       | even working but no longer in line with desired results, also a
       | junior mistake.
        
       | TameAntelope wrote:
       | Just rewrite the dang thing! Planned obsolescence is so important
       | for all of these reasons listed in this article.
       | 
       | Know when its time to kill your services, and have a plan well in
       | advance for how it'll go down, and what will take its place.
        
         | tluyben2 wrote:
         | For very large codebases, this is often not an option. I know
         | of very large 'let's write Cobol mainframe to Java' projects,
         | burning 10s of millions of euros, that were just thrown away
         | because they could not actually get it working in the end.
         | 
         | And this is not limited to mainframe projects; it happens with
         | (large) more recent projects (Java/C# mostly) as well.
        
           | TameAntelope wrote:
           | For sure, it's not an option for these existing systems.
           | 
           | But when building a new system, design it and plan for it to
           | be retired when certain criteria are met (e.g. when it hits
           | 100 reqs/sec, 3 years from first release, when LOC hits
           | 100k).
        
             | tluyben2 wrote:
             | Absolutely: I was only commenting on a blind 'just rewrite
             | it' as that will always be the first reaction of tech
             | people and quite often it is simply not feasible. But
             | agreed, by design it can work.
        
           | barbarbar wrote:
           | That raises an interesting question. Are there any such
           | rewrites that have succeeded? I have mainly heard that it is
           | either failures or not done.
        
             | TameAntelope wrote:
             | If you are asking me if I've experienced a successful
             | rewrite? Yes, numerous, and the more successful ones have
             | happened as a result of planning.
             | 
             | I've also been part of rewrites that have gone poorly, due
             | to a lack of planning, where the legacy software fails in
             | unexpected and unanticipated ways, which requires a rushed
             | attempt to fix the issue (which fails) and a subsequent
             | rushed attempt to replace the core functionality when the
             | fix doesn't work (and also fails because "core" tends to be
             | larger than you initially think).
             | 
             | Knowing ahead of time when software isn't going to be
             | useful any more isn't really an option, it's just an
             | acceptance of what is already going to happen.
        
         | astura wrote:
         | This is the worst idea I ever heard.
         | 
         | This only "works" for companies that have unlimited VC funds to
         | light on fire, for companies who have to actually make money,
         | this is in no way something you can do.
         | 
         | This is the equivalent of bulldozing your house and building
         | another because your hot water heater broke.
        
           | TameAntelope wrote:
           | It works for every company who has software they maintain,
           | and while I'm sorry you don't think it's a good idea, I think
           | the issue is more with your lack of understanding than the
           | idea itself.
           | 
           | Specifically, your analogy to construction is a bad one -
           | software is not construction, and one critical difference is
           | the cost of rebuilding is many orders of magnitude cheaper.
           | 
           | When you include the reality of obsolescence into your
           | design, you are actively anticipating and accounting for
           | problems as they're outlined in this article, which is always
           | a good thing, and will always improve your planning and its
           | outcomes.
           | 
           | Burying your head in the sand by expecting to never have to
           | rebuild something is very poor project management, and not
           | how competent software shops operate, period.
        
       | tucif wrote:
       | I found cscope essential when working with a 10+M loc C codebase.
       | I wish there were more cscope like tools for other languages,
       | easy to setup and editor agnostic.
        
       | Quickshooter wrote:
       | There is a powerful (and somewhat overlooked) tool
       | https://www.sokrates.dev/ written by an Ebay engineer Zeljko
       | Obrenovic.
       | 
       | Allows to look at the code-base (and the history of the codebase)
       | from different perspective - complexity, volume, developers
       | contributions, etc.
        
       | chromatin wrote:
       | OpenGrok, while an older tool, is web based and a very nice way
       | to explore large codebases; and it is multi language as well [2]
       | 
       | To get you started, here are public instance with source code for
       | Illumos [3] and multiple BSDs [4]. Used to have a Linux one but
       | cannot find atm.
       | 
       | [1] https://oracle.github.io/opengrok/ [2]
       | https://github.com/oracle/opengrok/wiki/Supported-Languages-...
       | [3] http://src.illumos.org/source/ [4] http://bxr.su/
        
       | magicalhippo wrote:
       | This is one of the big reasons I prefer static typing.
       | 
       | When looking at some unfamiliar code in an unfamiliar codebase, I
       | can reason about the code much faster when I can see what
       | functions return, and quickly go check their types out if the
       | type is unknown. This makes me much more productive.
       | 
       | I helped maintain a 250kLOC Python program. I came in when it
       | already at over 200kLOC. I spent _so much time_ , every time,
       | just trying to figure out what's going on because I never knew
       | what something returned.
        
         | baby wrote:
         | That's really painful indeed when you review python/erlang/etc.
         | code or Golang with interface{}
        
         | rajacombinator wrote:
         | 250kLOC Python sounds scary. But that would easily be 1mLOC+
         | lines in Java ...
        
           | magicalhippo wrote:
           | I'd much rather have a million lines of Java. And I'm not a
           | huge fan of Java.
           | 
           | It can be a bit tedious going down the
           | AbstractWidgetInterfaceFactoryFactory rabbit holes, but at
           | least I have a fighting chance.
        
             | joelbluminator wrote:
             | To each his own...
        
         | gameswithgo wrote:
         | What little quality research there is on programming
         | productivity, does support the idea that dynamic typing is a
         | productivity hindrance as code bases get larger, for exactly
         | this reason. Some studies actually have video data of the
         | programmers at work and they can seem them having to hop around
         | to function definitions more to figure out what they are
         | supposed to pass in, etc
        
       | baby wrote:
       | > Resist the temptation of fixing the parts that you find
       | horrifying, because first you can't fix it all and second you
       | will get crushed by the complexity of the system. Mark those
       | places down as a horrifying place to be and keep them in mind
       | when it's time to refactor.
       | 
       | Or you will run into the territory of someone else.
        
         | mariusmg wrote:
         | Turf Wars : Code Edition
        
       | paulodeon wrote:
       | "Towel of Babel" made me laugh. Not something you want to dry
       | your face with!
        
       | lordnacho wrote:
       | Great article, my two cents:
       | 
       | - Areas vs perimeters: perimeters are linear, areas quadratic.
       | This is why you really, really want tests. Tests will black box a
       | component and test it from the perimeters, basically the external
       | API. Only once something needs to be changed do you need someone
       | who understands the insides of that component. But the testing is
       | kept small, and the error domain is kept small, so that you might
       | have different people fixing different components.
       | 
       | - AvP, part 2: people's brains can index a lot, eg you know where
       | the tests are, you know what the components are called, but they
       | can't map that much. Your engineers will know what line to change
       | for the parts they've mapped, but they'll have to spend time if
       | they only have an index to where it might be.
       | 
       | - AvP, part 3: documentation can mean a map or an index.
       | Rewriting the implementation in prose is bound to go wrong. The
       | version control method makes a lot of sense here, it connects
       | locations to technical decisions.
       | 
       | - Visualising is to ensure you have held down the complexity. If
       | the 2D box-and-line chart of your project is just a huge blob,
       | you've done it wrong.
       | 
       | - You need to comment code, but try to keep it to one-liners. If
       | you can't explain in one line what some snippet does, it's
       | probably too clever. Also don't think that everyone will
       | understand it just because you gave everything sensible names.
       | Your code might be read by someone used to reading a different
       | language. Or more importantly there's some domain specific reason
       | why something needs to be done a certain way, and you don't want
       | the next person to forget that.
        
       | code-scope wrote:
       | I have similar issue and wrote code-scope to tackle the same
       | issue:
       | 
       | Check the demo screencast: https://www.code-scope.com/cs-
       | demos.html
       | 
       | Document the bpfcc tools cross reference Linux Kernel source
       | here:                   https://www.code-
       | scope.com/s/s/u#c=sd&uh=0f2c2fa280a2&h=a2f7c69d&di=35&i=60
       | 
       | A web document that analyze and document the USB driver source
       | code in kernel source tree: https://www.code-
       | scope.com/s/s/u#c=sd&uh=cf979192e856&h=ede5...
       | 
       | It has fast source code search engine and can search any symbols,
       | function in GBs of source base in milliseconds.
        
       | frenzyhome wrote:
       | Best Diapers for Crawling Babies | Top 10 Baby Diaper Brands
       | https://frenzyhome.com/best-diapers-for-crawling-babies/
        
       | mtzet wrote:
       | The advice about using both grep /and/ the IDE is very good.
       | Often they are framed as in opposition to each other, but in
       | reality they're just tools. IDE's are great when they work, but
       | it's entirely possible to make it confused.
       | 
       | I keep hearing to get better IDE's, especially from Java
       | developers who seem to have nicer IDE's than us C++ schmucks, but
       | even the best IDE will not save you when your program is really
       | an interpreter for some ad-hoc, unspecified dynamic language
       | implemented on top of YAML or XML.
       | 
       | I highly recommend having shortcuts for both ripgrep, fd and
       | clangd in your editor. Also remember you can use the .rgignore
       | file.
        
         | cmckn wrote:
         | [deleted]
        
           | Donckele wrote:
           | I can assure you netbeans will change and then not be your
           | No. 1 java ide.
        
             | [deleted]
        
         | gambiting wrote:
         | Speaking of IDEs - I work in video games development, huge
         | codebase that's over a decade old, heavily templated C++ code -
         | I've switched off the IDE "suggestions" long time ago, visual
         | studio is just wrong about incorrect/missing code like 90% of
         | the time. Just hit compile and read the errors, I have files
         | that VS shows as nearly entirely wrong, squiggly lines
         | everywhere, and yet they compile and link fine. And the
         | opposite where VS doesn't see any issue at all but they don't
         | build. Or they build fine using MSVC but not in Clang, or vice
         | versa, and VS has no idea.
        
           | astura wrote:
           | Yeah, visual studio is absolute and complete garbage with C++
           | code, it always identifies correct code as having errors, and
           | it's not just a "big project" thing, it happens in very small
           | projects, even "projects" that have a single file. I really
           | don't get it... I also don't understand why intellisense
           | doesn't update itself with the results from the compiler.
        
             | phillipcarter wrote:
             | If you get the chance, when you encounter something like
             | this that is reproducible (or at least seems obvious what's
             | going on), you can use the Report a Problem tool and
             | capture as many relevant diagnostics as possible. I don't
             | work on the C++ tools team, but generally the folks working
             | on VS are highly interested in getting detailed bug
             | reports.
        
         | spinny wrote:
         | Visual Studio Code is nice in this aspect. When you use the
         | terminal window, the output is parsed and you can ctrl+click on
         | a filename or (filename:line) to jump to it
        
       | digdugdirk wrote:
       | Are there any visual "code flow" interpreters? Something that
       | would separate the 1000s of interactions between functions and
       | show flow lines between them?
        
         | dsbyrne wrote:
         | If you use VS Code with Ruby or Java, check out the AppMap
         | extension. Its core function is similar to what you've
         | described: diagramming execution flow and component
         | relationships. It's dynamic analysis, so it captures data
         | snapshots as well.
         | https://marketplace.visualstudio.com/items?itemName=appland....
        
         | mapme wrote:
         | IntelliJ data flow analysis does this to a certain extent (paid
         | feature though)
         | 
         | https://www.jetbrains.com/help/idea/analyzing-data-flow.html
        
         | gdoptimizer wrote:
         | Sourcetrail?
        
           | lstamour wrote:
           | Yep, Sourcetrail can do that, for the languages it supports.
           | (It has an SDK so additional languages can be added, with
           | effort.) Give it a method, another method, and it will draw a
           | line from point A to B (with all the functions in between)
           | using static analysis plus you can explore before and after
           | to see what calls what. You can even see field usage though
           | there it can be confused sometimes, understanding varies by
           | language. But it's still really useful. Doesn't yet support
           | cross-language integrations but it has a lot of potential now
           | being open source. It works great for individual developer
           | use, for team use I'd want to try my hand porting it to React
           | or the web in order to more easily share views with others,
           | and perhaps use a central database. For now you can make
           | Sourcetrail projects as part of a CI system to share them
           | with other team members.
           | 
           | In addition to Sourcetrail, I also recommend adding
           | OpenTelemetry for distributed projects or flame graphs for
           | less distributed ones. Some of the videos Honeycomb.io put
           | together really highlight the value of distributed tracing,
           | such as this one: https://youtu.be/GuIWQ-EF7YE and the
           | OpenTelemetry Collector makes it simple to filter telemetry,
           | route it to services or drop a majority of traces which don't
           | have exceptions, for example.
           | 
           | One day I hope OpenTelemetry tracing can be baked into any
           | language the way flame graphs tend to enjoy first-class
           | support in Java, and that tools like Sourcetrail can be baked
           | into IDEs such that runtime metadata is available just by
           | hovering your mouse over modules and functions. Kind of like
           | CodeLens shown here, but for understanding the code:
           | https://docs.microsoft.com/en-us/azure/azure-
           | monitor/app/asp...
           | 
           | Something like https://www.codestream.com/use-cases/code-
           | documentation works as a social network and documentation hub
           | but doesn't necessarily bring in production telemetry or
           | models/ontology from code (such as Lattix, but that's
           | specialized to code organization in a way...) Maybe Project
           | Cortex but for source code? https://techcommunity.microsoft.c
           | om/t5/microsoft-365-blog/in...
           | 
           | JetBrains Space or GitHub doesn't yet analyze code beyond
           | dependencies/security issues/CI but might in the future.
           | 
           | Finally, there are tools like https://backstage.io/ which
           | hint at a future where developers build their own infra tools
           | for the rest of the company to use... but that hasn't
           | extended much into the realm of modelling, documentation or
           | telemetry yet. Folks might be lucky if they have a hosted
           | copy of SourceGraph right now... the future, I think, builds
           | on all of these ideas.
        
             | aste-risk wrote:
             | Can you use sourcetrail on properitery codebases as well? I
             | see it's GPL and according to my understanding, it's okay
             | to use it on properitery software as long as you don't make
             | any modifications to the sourcetrail software itself. Are
             | there any hidden commercial licenses before I try it out on
             | my company's codebase?
        
               | lstamour wrote:
               | I'm not a lawyer but if you're not embedding GPL code
               | output into your code, you're fine. Using GPL code to
               | write or reason about code under a different license is
               | not the same thing as having GPL software output a copy
               | of its own GPL-licensed code, for example: https://softwa
               | reengineering.stackexchange.com/questions/5221...
               | 
               | The only other risk is letting your company's proprietary
               | code be visible by third-parties but Sourcetrail runs
               | locally on your computer and can run completely offline.
               | 
               | As to Sourcetrail's licensing-- it previously had a
               | closed license and was supported by a startup with a
               | number of employees. It recently went open source and can
               | be supported financially through Patreon:
               | https://www.sourcetrail.com/blog/open_source/
        
         | nyellin wrote:
         | Came here to ask the same question. I've tried to find
         | something like this for golang and the tools out there can't
         | handle large codebases.
        
         | m463 wrote:
         | I think this is a fantasy (that many of us have).
         | 
         | "I run this magical tool, and voila! a satellite view of the
         | code!
         | 
         | It clearly shows this river runs into this bay, and there's a
         | dam over there! Now I'm miles ahead of everyone!
         | 
         | Lookie, this river runs in circles..."
         | 
         | Ha.
         | 
         | What really happens is you run some tool and what comes out
         | looks and smells like hair you pulled out of your clogged
         | drain.
         | 
         | I've tried this. Anything graphing shows ... well...
         | 
         | It will look like this:
         | 
         | https://upload.wikimedia.org/wikipedia/commons/9/9b/Social_N...
         | 
         | I think the thing is - the code that gets thing done will
         | confound automated visualization tools.
         | 
         | It's basically like expecting decompiler output to be super
         | helpful. It may help your understanding a _little_ bit, but
         | much will be lost. Yes there are heroic decompiler stories, but
         | time _is_ involved.
         | 
         | Also, most code has macros or helper functions or automatic
         | code generation or _something_ that obfuscates what you 're
         | really looking for. You will have to develop a system to
         | unblock this organically.
         | 
         | What will help:
         | 
         | Peruse the source code. Try to follow the flow. if you have
         | tools to jump back and forth between a function call and
         | definition use it.
         | 
         | fix some bugs. Follow the stack traces up and down.
         | 
         | ask people how stuff works. put in the time. and the other
         | stuff mentioned in this article. The osmosis method is really
         | how you'll get it.
        
           | antpls wrote:
           | Maybe one could limit the graph to the happy path to begin
           | with.
           | 
           | The output of a run of profile-guided optimisation could be
           | used to discover that graph, and only the touched functions
           | would be drawn. This graph could be useful to start with the
           | codebase, without being too overwhelming with all the edge
           | cases.
        
             | m463 wrote:
             | I think this depends on what you're graphing. If it's an
             | application, I think it has a better chance of working.
             | 
             | If it's something like a driver or something with a hal or
             | tables of functions or callbacks it might be harder. If
             | your codebase is large... hmm.
             | 
             | Last time I tried something like this was a decade ago so
             | things might have gotten better.
        
           | kaba0 wrote:
           | There is no use viewing the whole graph at large. But
           | "zooming" into and seeing the connections going to and from a
           | node is really useful and not at all "magical". As it is
           | mentioned in the sibling post, sourcetrail can do it really
           | well.
        
         | Phil-bitplex wrote:
         | I wrote one for Tcl years ago, as I started working on a really
         | complicated product with around 50k lines of code (or more,
         | distant memory).
         | 
         | Maybe I should give it another crack for modern languages -
         | there is an even greater need for it these days with dependency
         | injection and microservices being common.
         | 
         | The way I see it the need stems from needing to understand what
         | is REALLY going on, as opposed to what the code is saying
         | should be going on.
        
       | stinos wrote:
       | _It's worth spending some time learning the grep or similar CLI
       | tools that can quickly find the files containing the relevant
       | keywords you are looking for._
       | 
       | Even though it is great advice, there is some sadness in the fact
       | that it should even be advice. Apart from knowing how to enter
       | code, the other most basic thing should be knowing how to lookup
       | code?
       | 
       | Nitpick: it shouldn't be a CLI tool, a proper text editor or even
       | IDE allows you to perform the same thing as well. Many text
       | editors also have a pretty good indexing which will show matches
       | on hovering the mouse. Never good enough to blindly trust, but
       | usually faster and does a good job as 'quick win' for the first
       | attempt.
        
         | peterkos wrote:
         | I love keeping Sublime Text installed even for iOS work for
         | just this reason -- no (smart) autocomplete and debugger make
         | it hard to use day-to-day, but the raw speed at which it can
         | navigate code is just awesome.
        
         | petepete wrote:
         | Or better still, you should be able to integrate a decent
         | search tool into your editor.
         | 
         | I'd take ag or rg over the built-in ones I've seen and being
         | able to use them alone or in vim (or in a script) is a huge
         | benefit.
        
         | fphilipe wrote:
         | This is a skill I've noticed that many developers don't have,
         | or don't have sufficiently. This lack manifests itself e.g.
         | when I review a PR that removes feature XYZ. I do `rg xyz` and
         | `fd xyz` to see if there's anything that was forgotten to be
         | removed related to that feature. Very often there is.
        
           | stinos wrote:
           | _This is a skill I 've noticed that many developers don't
           | have, or don't have sufficiently_
           | 
           | Yes, and I have trouble understanding how that is possible.
           | Ok if you've never programmed and are just a beginner, but
           | otherwise? Or does it depend on the kind of code? I assume
           | this gets taught in programming / CS course, no? Or maybe
           | not, and that is the problem?
        
             | Aeolun wrote:
             | It's much easier to leave in some extra code because the
             | compiler won't complain.
        
             | xh-dude wrote:
             | https://missing.csail.mit.edu/ is on point here.
             | 
             | I think there's an argument that instructors' time is
             | better spent on other things, but, yeah, students should be
             | exposed to this stuff somehow or other.
        
       | bluefirebrand wrote:
       | This is a great article. I felt like it was describing a job I
       | recently left, especially this piece:
       | 
       | > It's fine to have less experienced people working on a large
       | system as long as they have the elders overseeing their work. In
       | the world where senior titles are handed left and right, that is
       | often not the case and it's how you end up with a very fragile
       | system that is suitable for a replacement as soon as it was built
       | 
       | Then I got to the advice part of the article and had to laugh.
       | 
       | Read the documentation? What documentation. Not a single scrap
       | existed.
       | 
       | Look at the tests? I'd love to, but they never wrote any.
       | 
       | Code comments? Nah.
       | 
       | Use the IDE for intellisense? Great idea except the database
       | models are in a different project so the furthest you can get is
       | the compiled definitions that were copied into this project.
       | 
       | The method that eventually kind of worked was "use the debugger
       | for absolutely everything."
       | 
       | It was honestly one of the most miserable experiences I have ever
       | had.
        
         | knuthsat wrote:
         | I'm currently in exactly that situation. No comments on tiny
         | workarounds, no high level docs on any feature, any kind of
         | code criticism interpreted as personal attacks. I've worked at
         | teams where I can easily add 1000+ lines of well tested and
         | incremental code a week, but here I can barely reach 200.
         | 
         | Although, it's not just programmers fault, the product team is
         | just pushing for changes and never allowing any time for code
         | simplification.
        
         | hyeomans wrote:
         | Gaming company in San Mateo? Lol
        
         | sydd wrote:
         | > Code comments? Nah.
         | 
         | This is one of my biggest gripes. Someone (I think uncle bob)
         | said that good code is self-documenting, which is bs in 95% of
         | the cases. Yeah, you don't need to document the
         | convertMinsToSecs() method, but most real life codebases are
         | full with edge cases, shortcuts, temporary solutions, half-
         | complete reorganizations. So people use this for writing no
         | comments at all, whereas a few words of comments would save
         | hours of investigative work for future developers working on
         | the codebase.
        
           | atomashpolskiy wrote:
           | While I agree to some extent, the problem with comments is
           | that they need to be maintained in order to be helpful: code
           | comments - updated when the code changes, general comments -
           | when the context changes, etc. This is a work in itself:
           | developer has to remember to do it, reviewer has to remember
           | to look for it. In my experience, people tend to forget to do
           | it or just don't bother, which means that someone else finds
           | himself with a contradictory, outdated, confusing comment
           | further down the road.
        
           | Aeolun wrote:
           | > good code is self-documenting
           | 
           | I still believe this to be more or less true.
           | 
           | The important part there is that you have to write _good_
           | code though.
        
             | disgruntledphd2 wrote:
             | So, the advice is bad, as most people (including myself, no
             | doubt), will write mediocre code, just by the shape of the
             | distribution (assuming it's normally distributed, which is
             | a strong assumption, but without data it's probably
             | reasonable).
             | 
             | Advice that relies on people caring about their
             | craft/having the skills to do the work well doesn't scale,
             | so it's bad advice where those things aren't true.
        
               | magicalhippo wrote:
               | You come a _long way_ with avoiding direct calls in  "if"
               | clauses and similar, using descriptive variable names,
               | and not trying to be clever for the sake of being clever.
               | 
               | For example, instead of                   if
               | (order.version > 1) ...
               | 
               | assign it to a descriptive variable
               | bool orderHasChanged = (order.version > 1);
               | if (orderHasChanged) ...
               | 
               | IMO this makes it much faster to read and understand,
               | because it says something about the intent. It can also
               | be easier to spot bugs.
               | 
               | It's a bit more to write, but I find it makes a big
               | difference when coming back to the code later on, and
               | typing is usually not the limiting factor when writing
               | code.
        
               | sydd wrote:
               | Also in lots of cases you are in a hurry to meet that
               | deadline that compromises code quality. Better to leave a
               | comment in this case than nothing
        
               | disgruntledphd2 wrote:
               | Yeah, especially if you do something weird. People have
               | often done weird strange stuff that made sense when I
               | finally figured out the reason.
               | 
               | Time pressure (and particularly with contractors) can
               | lead to some horrific long-term burdens of maintenance.
        
           | spinningslate wrote:
           | my priority for comments is that they should answer "why?"
           | and "why not?" questions. Why does the method/function do it
           | this way? Why didn't it choose that other, perhaps more
           | obvious route?
           | 
           | That's not necessary in every case. But it's true in a good
           | number of them. The code alone can never tell you that - but
           | it's often invaluable during evolution/refactoring.
        
             | wiredfool wrote:
             | Why does this code exist?
             | 
             | What are you working around?
             | 
             | What are the assumptions? What limitations?
             | 
             | If it's complicated enough that I'm only understanding it
             | because of the context of the last week, we need the
             | comments. Anything that can speed the reverse engineering
             | in 6 months when it breaks is helpful, because then you can
             | quickly decide that we got different input or if we missed
             | an edge case or whatever.
        
             | Pokepokalypse wrote:
             | Yah, and what's gross is when your team-mates criticize you
             | for leaving comments at all. Let alone a lengthy discussion
             | on why.
        
           | scollet wrote:
           | I'm really grateful for my current team because of this.
           | 
           | It takes 5 minutes to have thoughtful naming and "this is why
           | because..."
           | 
           | They have done a stellar job at that.
        
           | KajMagnus wrote:
           | > _Someone (I think uncle bob) said that good code is self-
           | documenting, which is bs in 95% of the cases_
           | 
           | Agreed. For example, comments about Why-do-this, and Why-Not-
           | do-that can be necessary, even if the code shows _what_
           | happens.
           | 
           | Imagine you're in a taxi, and it suddenly takes the wrong
           | turn, now instead heading towards _Surprise-City_. Then --
           | you know _what_ is happening. You 're going to _Surprise-
           | City_.
           | 
           | But would't you also want to know _Why_?
           | 
           | So then it's nice if the taxi driver explains Why: "I buy
           | milk to kitten."
           | 
           | I think the 'Linux kernel coding style' explains comments
           | pretty well:
           | 
           | https://www.kernel.org/doc/html/v4.10/process/coding-
           | style.h...
        
           | sverhagen wrote:
           | I think Uncle Bob got cancelled /s, but politics aside, I
           | don't think he was necessarily wrong about good code
           | documenting itself, but it came with a lot of direction about
           | what then exactly constitutes good code. If people don't
           | bother to hone the skills of good types and methods, well
           | named and with clear responsibilities, of course they're not
           | clearing the bar to drop the comments. Having grown more
           | senior, I believe I have gotten a little better at expressing
           | meaning and intent through the code itself, and I'm surely
           | writing a lot less comments because of it, which seems a win
           | overall.
        
         | mattmanser wrote:
         | _Use the IDE for intellisense? Great idea except the database
         | models are in a different project..._
         | 
         | If for whatever reason you can't link the project properly, as
         | a tip for the future, I've ended up just making a separate copy
         | of the project which included a reference to the uncompiled
         | project.
         | 
         | And in the worst instance I actually used a reverse generated
         | project from the binary. I actually gradually refactored that
         | auto-generated code into more readable code too!
        
         | [deleted]
        
       | OCHackr wrote:
       | The only thing I know of is Visustin.
        
       | nexthash wrote:
       | An interesting way to approach the documentation issue discussed
       | in this article is 'wiki bankruptcy': when a wiki goes stale,
       | simply tell all devs to save what they think is important before
       | deleting the whole thing outright. Then, they can recreate those
       | pages into a new wiki. Read more about it here:
       | 
       | https://critter.blog/2020/08/10/wiki-bankruptcy/
       | 
       | He also talks about using this 'bankruptcy' philosophy in other
       | aspects of life, which I thought was intriguing:
       | 
       | https://critter.blog/2020/09/24/declare-bankruptcy-and-dont-...
        
         | wdfx wrote:
         | I think this approach should not be feared.
         | 
         | Over the years I have 'bankrupted' several supporting systems,
         | some more than once. I've deleted shit like
         | 
         | - old tickets - documentation / wikis - old infrastructure -
         | old backlogs
         | 
         | I'm actually going through this process now with my current
         | team. There's so much stuff we have written by our predecessors
         | that is just no longer relevant. So, I've set up or renamed our
         | Jira/confluence spaces and then move/copy back in only that
         | content which is still relevant to us. Everything else will be
         | archived. In this way, everything which comes out the end of
         | this process:
         | 
         | - is ours - has recently been seen/reviewed by at least one
         | pair of eyes - is still relevant to the business and the
         | product
        
       | RMPR wrote:
       | > In 1980's Tim Berners-Lee realized that the documents are hard
       | to find at CERN, so he started imagining a system of
       | interconnected documents that would supposedly solve this thorny
       | problem for good. Nowadays we know this invention as the
       | internet. > Despite 40 years of improvements and the internet
       | becoming a part of our daily life, we still face the same
       | problems. You can talk to another person half way across the
       | world while watching a funny cat videos, but somehow we still
       | struggle with finding the important project documents
       | 
       | TIL. But I think the author meant the Web[0]. Iirc internet
       | originates from ARPA.
       | 
       | 0: https://home.cern/science/computing/birth-web
        
         | petepete wrote:
         | To be honest, most _normal_ people call the web the internet.
         | It 's the only bit most people see.
        
       | 37ef_ced3 wrote:
       | GNU Global:
       | 
       | https://www.gnu.org/software/global/
       | 
       | For example, here is the Linux kernel in Global:
       | 
       | http://www.tamacom.com/tour/kernel/linux/
        
       | RoyalSloth wrote:
       | Well, that's a surprise for sure. I wrote this article a few
       | months ago and it gained no traction. Today I woke up and boom,
       | front page.
        
         | cpb wrote:
         | I really connected with the writing. Thank you for taking that
         | time.
         | 
         | There is a lot in my current context the writing resonates
         | with. Nice to find that others have been on this path too, and
         | that we benefit from a lot of the same techniques.
         | 
         | Thanks for putting this out there as an invitation to draw
         | people together.
        
           | RoyalSloth wrote:
           | I am glad you liked it.
        
       | abhishekjha wrote:
       | And then you try navigating an Akka Framework based source with
       | their ask pattern. No way to navigate where the control flow
       | would go with inheritance in the mix.
        
         | valenterry wrote:
         | Yes! Maybe it changed a bit with akka typed, but I stick with:
         | use as few actors as you need to get your job done.
        
       | hyko wrote:
       | This is a great article, full of practical advice. Bookmarked to
       | show anyone coming into a reasonably large codebase.
        
       | ramino wrote:
       | The article describes pretty much what I'm facing at my job. We
       | have a monolithic Ruby on Rails application with lines of code in
       | the millions.
       | 
       | Still we general mantra is that comments are not allowed and I
       | can only agree with the author that this makes non standard parts
       | of the code extremely hard to comprehend. I would definitely love
       | to work once on an application that size with a few comments here
       | and there.
       | 
       | In my day to day work I depend a lot on our test suite. If I
       | can't even find the tests that cover this part of the code I just
       | break the code on my branch and let the CI tell me which tests
       | fail. There are probably better ways to do this with test
       | coverage tools but this method seems fairly straight forward to
       | me.
       | 
       | Documentation is something we started to do recently. I feel as
       | long as the documentation is not directly connected with the code
       | it is hard to keep it in sync. We even have PR templates that
       | mention to update the documentation but the shape of the
       | documentation is just too different from the code to have a
       | straight forward Intuition at which point it needs to be updated.
       | What happens for us is mostly that the feature owner at some
       | point realizes that the documentation pages are not accurate at
       | all anymore and rewrites them.
       | 
       | Sadly our commit messages are 50% of the time useless so that
       | they serve more to know who to talk to than to understand why the
       | change was done. PRs and commit messages are great documentation
       | I wish we would use them more. In my company the idea is more
       | that the change should be so small that no explanation is needed
       | but I feel this idea misses the point that code can't explain
       | *why* something was done.
       | 
       | This is definitely an area for further improvements. Are there
       | best practices someone could point me to?
        
         | RoyalSloth wrote:
         | I think that the only thing that really works are code reviews.
         | The best engineers on the team should have enough time to
         | review what is being committed and provide suggestions for
         | improvement. Things will start gradually improving, but it
         | usually takes a very long time before you see any progress.
         | 
         | Like someone already mentioned in this thread before, an
         | important part of this transformation is to not forget about
         | the political aspects of such cleanup process. People don't
         | like to hear criticism, so you will probably encounter a lot of
         | pushback in the beginning.
        
         | joelbluminator wrote:
         | How to you write your commit messages? For us in most projects
         | there's a git hook that forces you to put the jira ticket
         | number in the commit message (and the branch). So if you have
         | to know why a change was made you at least a context of what
         | the task was, which helps.
        
       | bob1029 wrote:
       | We manage a codebase that is well over a million lines of code,
       | and has a history dating back >5 years.
       | 
       | One of our answers to this problem is extreme amounts of
       | standardization. We might have 1mm LOC in platform services
       | alone, but it is spread across 50+ types and each looks almost
       | identical. Everything uses the same persistence mechanism,
       | migration technique, error handling, configuration provider, etc.
       | Dependency injection + reflection + standardization
       | (interfaces/abstract types) is where you can get into some really
       | powerful leverage regarding keeping things organized and sane.
       | Ultimately we have ~8 "flavors" of thing that developers usually
       | need to worry about.
       | 
       | Our end game answer is to get away from the code altogether. We
       | are starting to view code as glue between what would ideally be
       | configuration-based implementations and the nasty real world
       | which must be mutated in icky ways. So, instead of writing code
       | for a module every time you need to implement it, make it once
       | and in a generic way, have it take a configuration object, and
       | then expose a web UI around configuring that thing. Then, all
       | that code is reduced to JSON being passed around. When you are
       | dealing with pure data, you can get away with the most ridiculous
       | things. Cloning objects, versioning, validations, relational
       | queries, et. al. becomes trivial. If you have 1 stable domain
       | model throughout that is 3NF or better, you can use SQL to do
       | basically everything.
        
         | bob1029 wrote:
         | Edit: One more thing I would note is that a big part of why we
         | are able to support this codebase is because we have adopted a
         | sort of "hive mind" developer mindset, where everyone tries to
         | role play this ideal of a developer who would best be suited
         | for the task. We acknowledge that our codebase is not a place
         | for much "fun" and the best analogy I could come up with is its
         | like doing something in a nuclear power plant control room. You
         | just gotta do it by the book every time, and then you get to go
         | home to a safe and happy community. It's not like we employ
         | volunteers.
        
         | AdamCraven wrote:
         | How was the culture of the "hive mind" developed and maintained
         | in the organisation? I can imagine there are challenges you've
         | faced to keep it working
        
           | bob1029 wrote:
           | Start small and grow carefully. Not every developer is a good
           | fit for this type of approach and the amount of discipline we
           | require.
           | 
           | We actually started looking at an approach where new hires
           | would come in on a 6-12 month contract basis. The whole idea
           | would be that there would be no hard feelings either way at
           | the end if it didn't work out. If both sides felt like this
           | was a good fit, we explore longer-term options with more
           | benefits.
           | 
           | The way we do software is unconventional. We are in a very
           | constrained environment from a security perspective. No
           | containers, nothing can be in the cloud, all data must live
           | on the same physical host, software delivery is tricky, etc.
           | These constraints make the work we do somewhat unappealing to
           | a certain crowd of developer who seeks to maximize their
           | exposure to shiny new things.
           | 
           | Put differently, we use boring old technologies (with a few
           | exceptions) and set expectations that we are going to
           | continue to use those indefinitely. Any hopes of "mixing
           | things up" should be reserved for future endeavors on our
           | roadmap and personal side projects (which we encourage). I
           | don't think any of this is unreasonable or unrealistic. We
           | are in the business of selling software to other businesses
           | in a sensitive market. We are not making DLC for AAA
           | videogames.
        
         | trias wrote:
         | sounds like an ideology lock-in. Let's hope you never get a
         | problem which does not fit your current architecture well, or
         | else you'll end up spending weeks or even months solving an
         | otherwise trivial problem.
         | 
         | I've worked with "configuration-based implementations" and in
         | my experience they are hard to work with (no debugging,
         | incomplete documentation and implementation, little
         | flexibility), require an staggering amount of infrastructure,
         | are hard to test and will approach a programming language over
         | time.
        
           | bob1029 wrote:
           | I agree with the concern, but we have had a very long time to
           | refine our architecture. Some would call it an ideology lock-
           | in, I would say we solved our problem domain in a deep and
           | meaningful way and would prefer to stick with these proven
           | approaches. Our entire codebase was rewritten approximately 4
           | times before we got to the point of being confident enough to
           | push forward with a data-driven/configuration approach.
           | 
           | When you are writing the same business logic hundreds of
           | times and only 10-20 discrete things are different between
           | each implementation, it starts to make a hell of a lot of
           | sense to expose those things as parameters to be configured.
           | It's simple economies of scale at this point for us. Despite
           | our small size, we are trying to get out of a "move fast &
           | break things" startup mindset into a more stable "lets take
           | this to 1k customers now" mindset (we provide a B2B
           | application in a small market, so 1k is a huge target).
           | 
           | For us, our company doesn't become profitable until we can
           | scale our operations by 5-10x without any more headcount. The
           | only thing we could come up with that would allow for this is
           | configuration-driven techniques in which entire customer
           | implementations can be cloned as simple JSON contracts for
           | purposes of bootstrapping the next customer. Developers are
           | removed from most of the product implementation process, and
           | can focus more on core product value which is now levered
           | hundreds of times over due to being exposed as configuration
           | contract.
           | 
           | I am NOT arguing that one should seek out to build a
           | configuration-driven system from day one. That would probably
           | be the biggest mistake you could make. You have to already
           | have a mostly-functional product that people already want to
           | buy/use before you can even consider this approach. Even
           | then, you should probably expand your target market and
           | inject a few more use cases & rewrites before you jump over
           | that chasm. Having a squeaky-clean domain model that
           | addresses all potential use cases is the bare minimum
           | prerequisite, IMO.
        
             | trias wrote:
             | sounds like we are working at the same company ;)
             | 
             | Good to see that this approach seems to be working for you,
             | i wish you the best.
        
       ___________________________________________________________________
       (page generated 2021-02-14 23:02 UTC)