[HN Gopher] Why Security Defects Go Unnoticed During Code Review...
___________________________________________________________________
Why Security Defects Go Unnoticed During Code Reviews? [pdf]
Author : chx
Score : 41 points
Date : 2021-02-04 19:33 UTC (3 hours ago)
(HTM) web link (amiangshu.com)
(TXT) w3m dump (amiangshu.com)
| jdlyga wrote:
| Is it a bikeshed problem? It's easy to pick out the obvious
| problems, but not enough people know about how to spot security
| vulnerabilities.
| tptacek wrote:
| I'd be a little cautious about this, as it seems to be plug-and-
| chug from CWE, followed by a breakdown of the "CWEs least and
| most likely to be detected in code review". I don't think CWE is
| a particularly good taxonomy, and, in any case, apart from
| obvious cases (yes, use of banned functions is easy to spot),
| it's pretty much orthogonal from the real factors that make
| vulnerabilities hard to spot in code review.
|
| I haven't, like, done a study, but I have done _a lot_ of
| security code review, and my instinct is that the basic issues
| here are straightforward: bugs are hard to spot when they 're
| subtle (ie, they involve a lot of interactions with other systems
| outside the one under review) and when they involve a large
| amount of context (like object lifecycle bugs).
|
| Which, another issue with this study: I get why Chromium OS is an
| interesting data set. But I don't think it's a particularly
| representative case study. Another factor in effectiveness of
| security review: how much of the whole system you can fit in your
| head at once.
| carlmr wrote:
| >Another factor in effectiveness of security review: how much
| of the whole system you can fit in your head at once.
|
| This is why I think pure functions as much as possible, i.e. a
| part of the functional programming mindset, is so important for
| making code reviewable. Yes, you can make nice abstractions in
| OOP, but at least in my experience OOP with it's stateful
| objects interacting makes you need to know a lot more about the
| system than pure functions.
|
| And yes, it's not a panacea, and large allocations may take too
| long to copy, which is why the next best thing is mostly
| functional, most techniques don't work in every case.
| WrtCdEvrydy wrote:
| > how much of the whole system you can fit in your head at
| once.
|
| It's a consistent issue that's brought up... the majority of
| bugs are found in large releases. The smaller the release, the
| less likely you'll create a bug (this is where you get some
| methodologies moving away from waterfall).
|
| I wonder if the future of security practices will just be
| "deliver small pieces of functionality / changes"
| tomlagier wrote:
| Doesn't that seem like a tautology? I think the metric needs
| to be "bugs per feature" rather than "bugs per release" - it
| seems obvious that more changes will be buggier than fewer
| changes.
| hsbauauvhabzb wrote:
| +1. I got asked to focus on CWE top 25 in a node app. Problem
| is, half the top 25 are not applicable to node (buffer
| overflows etc, which matter in memory unsafe languages).
|
| Uncontexualised security is bad security. Frequency of bugs is
| not a good indicator of severity.
| Spearchucker wrote:
| Psychologically the enterprise software industry is actively
| discouraged from creating secure code. The most-heard mantra is
| to not do it because it's difficult, making it a pointless
| exercise. Another is that security testing is too disruptive to
| production systems. The software industry is good at recommending
| specific design ciphers and algorithms, and ignoring symmetric
| and public key protocols (Alice and Bob).
|
| Also, many attacks today target security protocols rather than,
| for example, cracking passwords.
|
| Another huge impediment to creating secure systems is agile,
| which discourages up-front design. The average sprint duration of
| two weeks is too short for meaningful security testing. Testing
| is not feasible with nightly builds or pre-release sanity checks,
| either.
|
| Product owners or customer representatives are too often non-
| technical stakeholders that have the authority to make technical
| design decisions. While never a good idea, this has bad
| consequences on architecture and particularly on security.
| hinkley wrote:
| > According to the model, the likelihood of a security defect
| being identified during code review declines with the increase in
| the number of directories/files involved in that change.
|
| I think most of us who understand code reviews know about this
| problem and push back against it.
|
| > Surprisingly, the likelihood of missing a vulnerability during
| code review increased with a developer's reviewing experience.
|
| Hold up. I need more information about 'experience'
|
| > reviewer's coding experience and reviewer's reviewing
| experience,
|
| still no criteria...
|
| > Another threat is the measure we take to calculate the
| developer's experience. We can interpret the term "experience"in
| many ways. And in many ways, measuring of experience will be
| complex. For example, we cannot calculate the amount of
| contribution of a developer to other projects. Although a
| different experience measure may produce different results,we
| believe our interpretation of experience is reasonable as that
| reflects the amount of familiarity with current project.
|
| Wrong "experience" and still no criteria. If I may proceed from a
| notion that they mean the same class of experience even though
| they didn't _define_ either of them until almost the conclusions,
| experience is here measured as time on Google projects.
|
| New people are trying to establish themselves. Finding a bug is
| 'counting coup' and raises their status. They also bring in
| outside notions of good code reviews that may be more diverse or
| rich than those inside the group.
|
| An 'experienced' reviewer may be suffering from any or all of the
| following:
|
| * under-training (outside experience/echo chamber)
|
| * over-training (confirmation bias)
|
| * over-extension (too many reviews, competing priorities)
|
| * lack of familiarity
|
| * lack of motivation
|
| * burnout
|
| * hubris
| lmilcin wrote:
| The cause is usually the same and it is looking for things you
| can spot vs proving the code is correct.
|
| Looking for things you can spot means you read the code as if it
| was novel, and you discuss only the things that stand out or that
| seem to be important.
|
| Many people interpret "doing the code review better" as reading
| the novel more slowly and spotting more things that stand out.
|
| Imagine bridge engineering plans were reviewed this way. Somebody
| would look at the numbers and react only if for some reason the
| numbers looked incorrect.
|
| Asked to do the review more thoroughly -- they would just stare
| at the plans for longer, but never actually performed the
| calculations, looked up the tables, consulted standards, etc.
| leetcrew wrote:
| the diff-based code review emphasizes very local changes to
| code. it inherently directs one's attention toward what has
| changed, rather than how the changes affect the larger system.
| I notice this a lot with the (browser-based) tool we use at
| work. if I expand the view to see the whole file, rather than
| just the diff, it tends to make the page lag.
| hinkley wrote:
| Something that really cheeses me off about Atlassian
| (bitbucket) code reviews is that by default they only show
| you the changed lines.
|
| You can't see that the code they changed also lives in
| another function that they missed, or than in fact the whole
| thing should be factored out into a function and fixed once.
| lmilcin wrote:
| That's why I never accept code based on review in web tool
| unless the code is evidently correct. I usually check out the
| branch and try to figure out how the code affects or is
| affected by other, sometimes distant code.
|
| Ideally, of course, everything should be hidden behind leak-
| tight abstractions.
|
| Unfortunately, that is rarely the case.
|
| I recently had a developer reuse a function that generates
| IDs for some other type of ID.
|
| That function that generates IDs happened to use OS entropy
| pool directly. It was used before, once every startup of the
| application so it was deemed OK.
|
| But now with high demand for IDs that is no longer
| acceptable.
|
| I noticed it because I followed the code to see how the
| function _actually_ generates IDs. That would not happen in
| web based tool because there simply isn 't convenient way to
| follow the references from the new code.
|
| That is just a simple example, I get things like that every
| other day.
| bluGill wrote:
| I've never been a fan of any review tool I've used because
| they all assume the code changed stands in isolation. Just
| today I was in a review where a change on line 234 was
| correct, except that it violated a bad assumption on line
| 123 which needed to be fixed. Line 123 is where I wanted to
| comment because that is where the problem should be fixed,
| but the tool saw that as okay code and didn't even let me
| see it, much less comment one it. It gets worse when the
| bad assumption is in a different file.
|
| And I often get frustrated because I see you call foo(),
| but I need to see how foo works with the given arguments.
|
| Though to be honest, most of the time I catch my self
| playing grammar checker, if the style isn't violated and
| you didn't make any of the dozens of checklist items you
| pass (I work in C++, so I often catch people writing c++98
| code when there is a c++14 construct that is better)
| philipkglass wrote:
| Github's PR review tool has some irritating limitations this
| way. If a changed function could cause problems for an
| unchanged caller, there's no way to add a comment on the
| caller unless it's within a few lines of the changed code.
| You cannot add comments on specific file lines that are "too
| far" from a changed line.
|
| My review flow now is to check out the branch under review,
| read it in IDE with git annotate on, and do all my actual
| review work there so I can see the full context. Then add
| line-by-line comments on Github where I can. Comments
| relating to older parts of the system that should change in
| tandem with the current PR have to go in as "general"
| comments, with file/line number included manually.
| dkarl wrote:
| Approaching from the mentality of doing a proof is valuable for
| establishing anything about code, even when it isn't security-
| sensitive. I think many programmers do this instinctively, and
| some don't do it at all. You can spot the difference by asking
| a question like,
|
| "Why do no records get dropped in this step?"
|
| Some programmers will have an argument resembling a logical
| proof: "No records get dropped because for every record, we
| call a method to write the record into the outgoing artifact,
| and if that call fails, the entire stage aborts. The flow of
| control never reaches the point where the outgoing artifact is
| added to the database for delivery. The lack of a successful
| delivery record for the batch triggers an alert after an hour
| and ensures the system will keep reprocessing the batch until
| it succeeds, or until the batch gets manually flagged as bad."
|
| Other programmers will say something like: "Well, all the
| records get passed in, and we write them all to the artifact.
| We had some bugs in this code, but they haven't been hard to
| fix, and anyway, if something goes wrong, the process is
| retried later."
|
| This has been a big thing on my mind recently because I've been
| asked to do a design review for a system created by some high-
| SLOC-productivity people who built a complex system with a
| sophisticated-looking architecture but who keep giving me
| answers like the second one. There's a bunch of cool tech but
| no coherent explanation of why they they believe the system
| will work reliably. It's just, well, there have been some bugs,
| and of course it doesn't work when there are bugs, but we're
| fixing them.
|
| Until you have an argument for why a system works, there's no
| coherent way to understand why it doesn't work, and what the
| effort will be to achieve different aspects of reliability.
| It's just "bugs." With such an argument, you can target
| specific aspects of reliability instead of playing the "it'll
| work when the bugs are gone" eternal game of whack-a-mole.
|
| Granted, this isn't very rigorous, but 99% of the time, it's
| the closest you'll get to "proof" in application code.
| hinkley wrote:
| Code reviews have become a social thing, and bagging on someone
| with a litany of issues in front of a group is never acceptable
| behavior. If they've done ten things wrong, you have to pick 3 of
| them and let the rest go, or decide to kick them out of the group
| entirely.
|
| There was a period for me when I 'solved' this code review
| problem the way many of us solved class participation in grade
| school - it's not a competition, it's a learning environment. It
| doesn't matter to the class if you know the answer, but it's
| really awkward if nobody knows the answer. So you say the answer
| in your head, wait to see if anybody else volunteers that
| information, and then put your hand up if they don't. Or you
| count to five. (I suspect based on hands and grades, at least a
| couple of my classmates knew all the answers but were too shy to
| say anything, and this also appears to be true in code reviews.)
|
| Code reviews are typically but not always asynchronous, so you
| can choose to wait for someone else to point out the obvious
| stuff and then just mention the things nobody else saw. The
| danger there is if an argument starts about one of these issues
| and distracts from others. Like a real conversation, you can run
| out of time for it before the issue you cared about gets
| sufficient airtime.
| Sodman wrote:
| Style nits aside, I don't think it's a good idea to selectively
| pick the worst N things in a PR and let the rest go. If you're
| consistently finding 10+ things to request changes on in
| somebody's pull requests, you probably need to look at your
| engineering process. Was the high level approach discussed
| before it was fully coded and "ready for review"? This is
| something easily fixed by process changes.
|
| Are you finding a lot of little things in the PR itself? Many
| of the usual suspects can and should be automated away
| (linting, codegen, test coverage reports, static analysis,
| etc). Others can easily be added to a checklist before PRs are
| submitted for review, eg "Have you updated the relevant
| documentation".
|
| Usually newer employees will get more comments on their PRs, as
| they're not as familiar with the system and might fall into
| more traps that others know about from experience. This can be
| mitigated somewhat by pairing/mentoring.
|
| Sure some people just aren't going to cut it on your team, but
| most engineers when corrected on an individual coding mistake
| won't repeatedly make the same mistake, in my experience. If
| you just never mention the mistake, they'll keep making it in
| every PR and be completely unaware. Sometimes it helps to
| communicate out of band (ie, not in "official" comments on the
| PR itself) if there are some low hanging gotchas you want to
| make the engineer aware of.
| leetcrew wrote:
| > Code reviews have become a social thing, and bagging on
| someone with a litany of issues in front of a group is never
| acceptable behavior. If they've done ten things wrong, you have
| to pick 3 of them and let the rest go, or decide to kick them
| out of the group entirely.
|
| I wonder if we overestimate how sensitive people are in this
| case. if someone raises ten objections on my review and makes
| it clear what I have to do to get their approval, I'm happy.
| I'll get right to work on fixing it. if this happens in front
| of a group (esp involving skip-levels / managers of other
| teams), I take it as a signal that I should have run the change
| by a teammate or my direct manager before involving so many
| people.
|
| the only thing I actually dislike is when people comment about
| vague concerns that leave it unclear whether I actually have to
| address them in that change to get their approval. sometimes
| this causes a circular waiting deadlock with the other
| reviewers, which is especially frustrating.
| wccrawford wrote:
| It depends on both the person and the environment.
|
| In an environment that places stress on those who
| underperform, anything that calls attention to performance
| problems will be _hated_. Anyone that points out those
| problems will be disliked. Even the best people will fall to
| this stress.
|
| In a more relaxed environment that isn't constantly counting
| points towards raises and promotions, people can relax and
| not worry about the problems that are raised. However, not
| everyone can manage this. Some people have just had too many
| jobs of the previous type, or other experiences like that.
| They just can't adjust.
|
| IMO, it also means they have a much harder time fixing their
| own problems because they can't tolerate other people
| pointing out those problems.
|
| As for code reviews, I always call it like I see it. It's my
| _job_ to make sure the code will work and be secure. I 'm not
| going to let things slip by that violate that, no matter how
| much I want that person to be happy. And I'll still point out
| matters of style as well, though I usually won't insist they
| be corrected, unless the person has repeatedly had the same
| problem. (My people are generally great, though, and will fix
| even my suggestions. So it's generally not a thing.)
|
| You'd probably hate that last bit of my code reviews. If we
| were on a team, my advice would be to simply discuss that
| part with me in real time. We could decide together if it was
| worth the effort to make those changes. Async communications
| like PRs and emails simply aren't fit for that kind of
| discussion.
| hinkley wrote:
| It also depends if everyone else pointed out 2 things and
| you pointed out 10. Letting other people go first means you
| only have to point out 5, which is still lopsided but not
| 'holy shit, man' lopsided.
| nicoburns wrote:
| > In an environment that places stress on those who
| underperform, anything that calls attention to performance
| problems will be hated. Anyone that points out those
| problems will be disliked. Even the best people will fall
| to this stress.
|
| Seems to me that this might be the problem rather than the
| code review itself!
|
| > We could decide together if it was worth the effort to
| make those changes. Async communications like PRs and
| emails simply aren't fit for that kind of discussion.
|
| Indeed. I've definitely found myself falling into a pattern
| of failing the review for serious issues with comments on
| the PR, but just approving the review and slacking the
| comments for small issues.
___________________________________________________________________
(page generated 2021-02-04 23:00 UTC)