[HN Gopher] Measuring Software Complexity: What Metrics to Use?
___________________________________________________________________
Measuring Software Complexity: What Metrics to Use?
Author : bsmth
Score : 90 points
Date : 2021-11-28 10:05 UTC (12 hours ago)
(HTM) web link (thevaluable.dev)
(TXT) w3m dump (thevaluable.dev)
| quantified wrote:
| I propose a challenge, similar to the Obfuscated C challenge, to
| devise code that is impenetrable to the human mind and yet
| fantastically clean to all metrics.
| flaratt_ljos wrote:
| There are a lot of things you can do to lower these sort of
| metrics without addressing actual complexity, sweeping the
| problem under the rug. Perhaps a better metric for software
| complexity would be the amount of work the computer has to do, or
| the number of instructions it has to execute.
| nmehner wrote:
| Isn't this the true for all metrics? They are useful for
| pointing out problematic areas, but as soon as you start
| optimizing only based on the metric things will get worse.
|
| Number of instructions is probably not a good metric. Without
| any loops/jumps you might have a lot of instructions, but a
| very low complexity.
|
| An endless loop executes a lot of instructions, but does not
| have to be complex.
| drewcoo wrote:
| The problem is not computer processing speed. The problem is
| human cognitive load.
| preseinger wrote:
| > Perhaps a better metric for software complexity would be the
| amount of work the computer has to do, or the number of
| instructions it has to execute.
|
| Complexity is a measure of human difficulty in comprehension,
| not mechanical difficulty in execution...
| hutzlibu wrote:
| "Perhaps a better metric for software complexity would be the
| amount of work the computer has to do"
|
| By that definition a loop doing 1 billion times a simple
| calculation would count as very complex, even though it is very
| easy to understand.
|
| LOC would be better, even though code can be dense and
| complivated, or very verbose and simple.
| magicalhippo wrote:
| Complexity isn't a single thing though. The problem can be
| complex, a specific implementation can be complex, the codebase
| can be complex.
|
| I recall implementing some linear algebra numerical code. The
| problem was a bit complex, resulting in a bit of code
| complexity.
|
| However I realized I had some extra information I hadn't used,
| and I spent half a day going over the math again. After a
| couple of pages of derivations I could narrow down the result
| to a couple of dot products.
|
| So, I ended up with a commit where I had 100 or so lines of
| comments including equations to justify my two lines of code.
|
| The implementation became super-simple, but why it worked was
| suddenly not so simple. I had effectively moved complexity from
| code-space to problem-space.
| lngnmn2 wrote:
| Verbosity.
|
| Number of unnecessary abstractions.
| exabrial wrote:
| The number one metric: Consistency. If an app is similar to
| itself in all places, it's very easy to understand. Better yet,
| if it's similar and consistent with how other things have been
| built, we can call it "clean code".
|
| Back in the day we called these things architecture, but I'm old
| and salty.
| oblak wrote:
| What if the code is consistently crappy and complex all around.
| Would that make it simple? Trick question, old man. It would
| not. I am not exactly young but damn I am sweet
| jackblemming wrote:
| You're not wrong. "Consistency" is a poorly defined principle
| and basically used to justify "what I'm doing is good
| (consistent). What you're doing is bad (inconsistent)"
|
| And it was called Uniformity back in the day, not
| "architecture".
|
| A better name for "consistency" is "following project
| conventions" which IS well defined.
| tharkun__ wrote:
| I agree that consistency is good in general. I don't think
| "project conventions" is well defined.
|
| In most places project conventions are either not actually
| defined anywhere or if they are defined in writing, they're
| usually either very very old and outdated vs. the actual
| conventions that everyone is currently using or it's just
| one guy updating the text and hitting everyone else over
| the head with the document to push his opinion through.
|
| I personally like to be 'locally consistent'. I don't care
| how old and crusty the code base is. If the file that I
| have to change or add to calls everything a "giraffe", I
| will call my stuff "giraffe" as well, even if it really is
| a "gorilla". If I start calling it a gorilla, nobody will
| understand that the gorilla is the same as the giraffe if
| they don't have the same background knowledge I have.
| Unless I do a refactoring and I am changing the giraffes to
| gorillas. Which might either be a first PR to "clean up" or
| a follow up PR.
|
| Unfortunately I see so many people not doing that and it
| wreaks havoc with the code base. Especially if we're now
| outside of the place that defines the giraffes and
| gorillas. It's really hard for the caller to figure out
| that they're one and the same thing.
| jackblemming wrote:
| >In most places project conventions are either not
| actually defined anywhere or if they are defined in
| writing, they're usually either very very old and
| outdated vs. the actual conventions that everyone is
| currently using or it's just one guy updating the text
| and hitting everyone else over the head with the document
| to push his opinion through.
|
| So either admit your project doesn't have conventions, in
| which case, don't nit people who don't follow whatever
| convention exists in your head but isn't documented, or
| document the project conventions. You cannot have your
| cake and eat it too.
|
| That "one guy updating the text" is at least explicitly
| documenting expectations.
| hacoo wrote:
| Ugh, no. I've worked in a codebase where CI would reject changes
| that had too much 'code complexity'. You'd constantly have to
| find clever ways to split up your code, when doing so did not
| make sense, to appease the complexity checker. Oh yeah, and if
| you ever make a one-liner change, you might end up being forced
| to do a full refactor because that one line pushed the complexity
| threshold over the edge. The results: PITA for developers and
| worse code. What a crock of shit.
| okl wrote:
| Use the tool, don't become its slave.
| bradgessler wrote:
| I've found most CI checks like this are a crock. Forgot empty
| parentheses for that function definition with an arity of 0?
| Bzzzzzzzt! Sorry! Build failed.
|
| It's just foolish.
|
| The only reason a CI should ever fail is if it catches a defect
| from making it into production.
| preseinger wrote:
| > The only reason a CI should ever fail is if it catches a
| defect from making it into production.
|
| Mechanical checks for things like formatting rules, linting
| errors, and reasonable tools to verify code complexity, as
| long as they don't produce false positives, are all important
| to run as part of CI.
| juancb wrote:
| Did you also have the same feedback from your IDE?
|
| In other words would it have been less painful if you didn't
| have to suffer the long iteration times required to get
| feedback from some remote CI job?
| nikita2206 wrote:
| The thing is that cyclomatic complexity for example (most
| popular complexity measure that linters use), it doesn't make
| sense. Most of the time high cyclomatic complexity of a
| method is indicative of high business logic complexity...
| which is fine. And dogmatically saying that methods shouldn't
| have that many lines and branches and that you should just
| come up with better abstractions doesn't help anyone, whereas
| having closely related functionality concentrated in a single
| place, rather than synthetically exploded into N different
| files, well this does help.
| rhn_mk1 wrote:
| I'm sorry about your experience, but from what it seems, your
| problem was in your team. Any measure can be turned into a bad
| policy, code complexity is a red herring here.
| Forge36 wrote:
| I was removing unused functionality, work became held up as the
| code complexity was too high (never mind I'd just reduced it).
| I don't know what tool they were using, and they didn't share
| the output. I asked them to document what they found, got a
| mouthful on this not being their responsibility.
| szundi wrote:
| Just run.
| hutzlibu wrote:
| Sounds like a place to leave as soon as possible.
|
| Unless the tool just meassured for amount of change and
| flagged it for review, which might make sense, as you can
| also mess up by removing things, you _think_ are unused.
| pydry wrote:
| Couldnt yout just raise the threshold by tweaking the config
| with your PR instead?
|
| I do this all the time with other automated checkers (linters,
| etc.). I don't see why this should be different. If another
| human agrees it shouldnt be a problem.
| sobkas wrote:
| Because if you change that config file, people responsible
| for it will be added to review, and will beat you with a
| stick for touching it without consulting change with them?
| throwaway81523 wrote:
| Some blog linked from here (maybe jvns.ca?) made the case that
| depth of your project's software dependency tree is an important
| metric. The more crap you have to pull in, the more things can go
| wrong. You're better off with a large program with no
| dependencies, than a somewhat smaller program with a ton of
| dependencies.
|
| Language features on the other hand can let you develop complex
| programs quickly and reliably, by catching errors before the code
| gets deployed and so on.
| juancb wrote:
| What about the depth of dependencies of any part of the
| standard libraries for a given language?
| throwaway81523 wrote:
| I think that doesn't count, as long as the install is in one
| piece. That's how Python got its popularity. Its "batteries
| included" approach meant you got lots of stuff in the stdlib
| instead of having to chase it all over the interweb.
| Unfortunately they seemed to have abandoned that approach in
| more recent times.
| preseinger wrote:
| Isn't the standard library definitionally depth=1?
| cjfd wrote:
| This talks about code complexity a lot. This, however, is not the
| chief source of complexity in many code bases. The number of
| tools needed to build things is the bane of the modern software
| developer. Also, the use of microservices where none are needed
| results in and enormous increase in complexity. Code complexity
| is relatively easy to fix compared to all of this.
| brodo wrote:
| Exactly. Every dependency is a liability.
| nine_zeros wrote:
| Yup. Long gone are the days of concise packaging in single
| executables.
|
| Quite literally, software engineers today spend most of their
| time fighting dependencies and poorly built delivery machines.
| [deleted]
| kello wrote:
| second this. toolchain complexity is way more of a PITA than
| code complexity for me these days.
| kitd wrote:
| Toolchain simplicity is a key reason for why I like Go. One
| binary to do it all.
| defanor wrote:
| Indeed; I tried to compose yet another list of factors
| contributing to (or approaches to estimating) software project
| complexity in the past, with code complexity being just one out
| of 18, and not seeing it as particularly outstanding. Perhaps
| just a bad title.
| epylar wrote:
| I have seen a team be forced to use microservices for things
| still running on the same machine years later, with absolutely
| no benefit other than the powerpoint architecture slides
| looking fancier.
| jillesvangurp wrote:
| I always liked the notions of coupling and cohesion because they
| are simple to understand and you can see at the glance of an eye
| if a particular bit of code would have good metrics for those,
| without actually bothering with the metrics. E.g. long list of
| parameters or imports == high coupling, large number of fuctions
| in a module == low cohesion. Specifying exactly how much isn't
| that useful. It's easier to think in terms of "relative to the
| rest of the code". Debating what is too high or too low even less
| so. But if you are struggling with working with a particular bit
| of code, being able to identify why it is hard to deal with is
| useful; especially if you know how to fix it.
|
| But mostly metrics should not be telling you things you can't
| already know just looking at the code; if it looks complicated,
| it probably is. Metrics only become useful when you need to tell
| without looking. Sometimes that's useful.
| mikro2nd wrote:
| I didn't see anything on Function Point counts in the article. No
| advocating it _per se_ but it was, at one time, considered one of
| the more useful ways to evaluate codebase complexity.
| okl wrote:
| FP analysis is a measure for the "amount of functionality" not
| complexity. Maybe in relation to, e.g., the number of
| statements (statements/FP) it could make sense.
| kqr wrote:
| Number of function points is, given implementation language,
| strongly correlated with lines of code. Lines of code is, in
| turn, strongly correlated with all other complexity measures.
|
| In other words, function point count is a measure of
| complexity.
| throwawaybutwhy wrote:
| No mention of function points.
| yetanother-1 wrote:
| I agree. Certains functions are way more complicated and hard
| to show value for than others, which is really hard to
| comprehend for some managers.
|
| Functionalities like undo/redo take a lot of planning,
| coordination and integration efforts than others like a simple
| export function, but good luck selling that to any marketing or
| product owner for xx manndays.
|
| I still think that this subject is way too specific to be
| generalized like this, but general thumb of rules still apply
| like good estimation and technical planning.
| marcosdumay wrote:
| Function points measure the complexity of the problem you are
| solving, not of the software you are writing.
| exdsq wrote:
| My first job as a dev was for a consultancy and I had to review a
| code base for a client to support their argument that they
| require a rewrite. I had no idea about any of this so googled
| metrics, found cyclomatic complexity, and wrote a load of
| bullshit about the results of analyzing the code base showing it
| was complex. It served its purpose - they got corporate to accept
| a rewrite - but I've never used those metrics again.
| streamofdigits wrote:
| The Einstein quote is that "something should be a simple as
| needed but no simpler", but how to be pin down "required" in an
| objective / quantifiable way?
|
| Somehow it is more important to measure "gratuitous" complexity,
| redundant complexity that is not justified by present or
| plausible future requirements...
|
| The problem is that the code itself does not capture
| requirements, so code analysis can give you absolute indicators
| but never an "efficiency" measure (how efficient and justified
| the measured complexity)
| okl wrote:
| Isn't the general goal rather to avoid/eliminate overly complex
| cruft (that every programmer should be able to recognize) than
| to find the maximally simple solution to a problem?
| streamofdigits wrote:
| that seems like a more tractable goal if confined to local
| analysis (something like lines of code in a function or file
| unit) but people generally try to come up also with overall
| complexity that seems harder
| abacadaba wrote:
| tldr, blood pressure
| pfdietz wrote:
| There were attempts to predict bugs by looking at complexity
| metrics. As I recall, the research found that when you adjusted
| for code size, none of the metrics mattered. In other words, just
| use LOCs as your metric.
| marcosdumay wrote:
| AFAIK the one unambiguously relevant metric for how many bugs
| you will find in a codebase is how many its "clients" define as
| acceptable.
|
| Any kind of internal code quality affects the productivity of
| the debugging procedure, not the final number of bugs.
| kqr wrote:
| This is correct.
| https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=code...
| [deleted]
| sam_bristow wrote:
| The last section on coupling reminded me of the concept of
| connascence[1] which I've found really helpful when talking about
| code.
|
| [1] https://connascence.io
| vinodkd wrote:
| Thanks for this link. TIL. Also, the videos linked from the
| site were useful
| [deleted]
| SKILNER wrote:
| The real answers about complexity come from thinking about why we
| even care. It's because our feeble minds have to build internal
| models of the code so we can work with it. The cognitive aspects
| of building those models is why complexity matters.
|
| What things make it more difficult to build those models? A
| partial list, mostly as others have mentioned:
|
| - tool and library dependencies - nested conditions - loops and
| especially nested loops - asynchronous processing, callbacks, etc
| - non-descriptively named variables and functions - using non-
| standard code patterns for standard functionality - delocalized
| code, as in, you have to navigate somewhere else to see it
| (throws off your working memory)
|
| By one study, developers using Eclipse for Java spent 27% of
| their time just doing code navigation.
|
| The starting point for code complexity is about how our minds
| work.
|
| As many people have said, "it's easier to write a program than
| read it."
| [deleted]
| ip26 wrote:
| I would argue complexity is also the wellspring of corner
| cases. The more corner cases, the more mercurial- and that
| isn't just due to the limitations of our minds.
| kristov wrote:
| Not just build internal models of the code, but also an
| internal model of the execution of the code. For example, a
| mental model of the scoping rules. As a developer you have to
| read a function and build a model of _when_ variables are
| assigned values - you have to execute the code in your head to
| understand what the state will be at a particular point in
| time. I think a big part of complexity is having to imagine
| what the current in-flight state is - the larger the current
| in-flight state, the harder to reason about how the code will
| interact with it.
| lincpa wrote:
| It is a simple, systematic, math-based method that `The Math-
| based Grand Unified Programming Theory: The Pure Function
| Pipeline Data Flow with Principle-based Warehouse/Workshop
| Model`, it makes development a simple task of serial and parallel
| functional pipelined "CRUD".
|
| ### Mathematical prototype
|
| - Its mathematical prototype is the simple, classic, vivid, and
| widely used in social production practice, elementary school
| mathematics "water input/output of the pool".
|
| ### Basic quality control
|
| - The code must meet the following three basic quality
| requirements before you can talk about other things. These simple
| and reliable evaluation criteria are enough to eliminate most
| unqualified codes. - Function evaluation: Just
| look at the shape of the code (pipeline structure weight), and
| whether the function is a pure function. - Functional
| pipelined dataflow evaluation: A data flow has at most two
| functions with side effects and only at the beginning and the
| end. - System evaluation: Just look at the circuit
| diagram, you can treat the function as a black box like an
| electronic component. - Code Quality Visualization:
| - For Lisp languages, S expression is contour graph, can be very
| simple transformation into contour map, or 3D mountain map.
| - If the height of the mountains is not high, and the altitude
| value is similar, it means that the quality of the code is good.
| - For non-Lisp languages, you can convert the source code into an
| abstract syntax tree (AST), and then into a contour map, or a 3D
| mountain map.
|
| ### Programming Aesthetics Simplicity, Unity,
| order, symmetry and definiteness. ---- Lin
| Pengcheng, Programming aesthetics The chief
| forms of beauty are order and symmetry and definiteness,
| which the mathematical sciences demonstrate in a special degree.
| ---- Aristotle, "Metaphysica"
|
| My programming aesthetic standards are derived from the basic
| principles of science. Newton, Einstein, Heisenberg, Aristotle
| and other major scientists hold this view.
|
| The aesthetics of non-art subjects are often complicated and
| mysterious, making it difficult to understand and learn.
|
| The pure function pipeline data flow provides a simple, clear,
| scientific and operable demonstration.
|
| Simplicity and Unity are the two guiding principles of scientific
| research and industrial production.
|
| - Unification of theories is the long-standing goal of the
| natural sciences; and modern physics offers a spectacular
| paradigm of its achievement. It can be found from the knowledge
| of various disciplines: the more universally applicable a unified
| theory, the simpler it is, and the more basic it is, the greater
| it is.
|
| - The more simple and unified things, the more suitable for
| large-scale industrial production.
|
| - Only simple can unity, only unity can be truly simple.
|
| In the IT field, only two systems fully comply with these 5
| programming aesthetics:
|
| - Binary system The biggest advantage is that it
| makes the calculations reach the ultimate simplicity and unity,
| so digital logic circuits are produced, and then the large-scale
| industrial production methods of computer hardware are produced.
|
| - The Math-based Grand Unified Programming Theory: The Pure
| Function Pipeline Data Flow with Principle-based
| Warehouse/Workshop Model
|
| ### Others
|
| - Software and hardware are factories that manufacture data, so
| they have the same "warehouse/workshop model" and management
| methods as the manufacturing industry.
|
| - From the perspective of system architecture, it is a
| warehouse/workshop model fractal system. It abstracts every
| system architecture into a warehouse/workshop model .
|
| - From the perspective of component, it is a pure function
| pipeline fractal system. It abstracts everything into a pipeline.
|
| - It adheres strictly to 10 principles and 5 aesthetics, and it
| consists of 5 basic components.
|
| - It uses the "operational research" method to schedule the
| workshop to complete tasks in optimal order and maximum
| efficiency.
|
| ### Reference
|
| The Math-based Grand Unified Programming Theory: The Pure
| Function Pipeline Data Flow with Principle-based
| Warehouse/Workshop Model
|
| https://github.com/linpengcheng/PurefunctionPipelineDataflow
| unbanned wrote:
| If it looks ugly, or you have to navigate too much (long files or
| a lot of external dependencies/dependency chains)... it's
| complex.
|
| That's all you need to know really
| ok_dad wrote:
| One thing I hate is having like 20 different Python repos for one
| small company. At most places I've worked at, you have basically
| one thing you do business-wise, but it's split into what I
| believe are arbitrary repo delineations. This causes trouble with
| dependency resolution in your build systems and IDEs and
| increases cognitive load and merge complexity for changes across
| repos.
|
| Just put the whole folder structure into one repo! This would
| reduce build complexity, and you can set up the configs for your
| tools and CI one time rather than 20. You can still have several
| services out of one repo, if you want to, but it's easier to
| reason about and easier to change those service delineations
| later, where in 20 repos you're having to "clone and cut code" to
| separate things. In one repo, you just move code around as you
| split or merge services.
|
| I routinely create a "super repo" for myself at these companies
| using submodules, so that I can actually work with the code more
| easily, but that still requires me to check in maybe 5 or more
| PRs for one feature, so it's not ideal. This only solves the
| developer's problems with local tools and still requires more
| complex debugging since the services are not actually in one repo
| under one config for deployment.
| jpswade wrote:
| What if I told you that most software complexity doesn't come
| from the code but from the software requirements?
| cryptica wrote:
| Number of lines of code is a good metric for complexity. The hard
| part is estimating how many lines of code is reasonable for a
| specific feature. Some complexity is necessary. The problem is
| unnecessary complexity.
| okl wrote:
| > Number of lines of code is a good metric for complexity.
|
| I don't agree with that premise. LOC are a metric for code
| size, not for complexity. I've found that in practice the
| number of statements is a more reliable indicator for code size
| than lines of code. (For typical imperative languages anyways.)
| kqr wrote:
| That's not a premise -- it's a data-driven conclusion. Lines
| of code correlates strongly with all other complexity
| metrics, and is way easier to compute and explain to someone
| else. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q
| =code...
| cryptica wrote:
| Lines of code is a very good approximation so long as each
| line is compiled into a similar number of bits across
| different systems and languages. The total size of the
| compiled source code in bits is literally the entropy of the
| system so it's the definition of complexity.
| okl wrote:
| I think you applying that definition in your way to the
| issue of source code complexity is outlandish.
| cryptica wrote:
| Why is it outlandish? You're confusing the reliability of
| using source lines of code as a metric for measuring the
| productivity of developers with measuring the complexity
| of a system. It's a bad metric for measuring productivity
| but a good metric for measuring complexity.
|
| The problem with using lines of code as a metric for
| developer productivity is precisely that it leads to
| developers introducing unnecessary complexity into the
| system since they try to add as many lines of code as
| possible for implementing any feature.
|
| There is no drawback in keeping the source lines of code
| to the minimum amount necessary to get the job done.
| cryptica wrote:
| I find this to be true even in cases when you have large
| projects. Sometimes the scale of the project will
| necessitate adding 'extra complexity' to a specific set
| of core modules but this is only in order to reduce the
| complexity of other peripheral modules. IMO, you should
| never add extra lines of code to a module beyond what is
| immediately necessary unless it allows you to reduce even
| more lines of code in other parts of the code.
| okl wrote:
| I've never proposed using LOC as a measure of
| productivity. I won't entertain your dishonesty any
| further.
| cryptica wrote:
| Well in that case I really don't understand your
| reasoning. I can't think of any other reason why you
| cannot see that lines of code is the closest and most
| measurable representation of the information content of
| the system's logic. Information content/entropy is the
| most rigorous way to measure complexity.
|
| IMO, lines of code is even better at measuring complexity
| than compiled bytecode because it accounts for complexity
| from the developer's point of view (which is what the
| question is asking).
|
| While some lines of code require more effort from a
| typical developer to understand than other lines, it
| doesn't matter so much once they're averaged out over
| thousands of lines and thousands of different developers
| (each with their own slightly different perception of
| complexity). It's reasonable to factor out individual
| perception of complexity.
| tromp wrote:
| It may be outlandish because minimizing the number of
| bits, when taken to the extreme as in Algorithmic
| Information Theory, leads to very obfuscated code [1].
|
| [1] https://www.ioccc.org/2012/tromp/hint.html
___________________________________________________________________
(page generated 2021-11-28 23:01 UTC)