[HN Gopher] How I Made a Heap Overflow in Curl
___________________________________________________________________
How I Made a Heap Overflow in Curl
Author : ghuntley
Score : 267 points
Date : 2023-10-11 06:12 UTC (9 hours ago)
(HTM) web link (daniel.haxx.se)
(TXT) w3m dump (daniel.haxx.se)
| taspeotis wrote:
| https://hackerone.com/reports/2187833
| ynik wrote:
| Yesterday's Windows 10 security updates did not include a new
| curl version; the curl included with Windows is still 8.0.1.
| dogben wrote:
| if(!socks5_resolve_local && hostname_len > 255) {
| socks5_resolve_local = TRUE; }
|
| This is really a bad idea. For people who use anti-censorship
| tool to protect privacy, this can leak their identity through
| DNS.
| [deleted]
| vsnf wrote:
| > I think it was downright wrong to switch mode like this,
| since the user asked for remote resolve curl should stick to
| that or fail. It is not even likely to work to just switch,
| even in "good" situations.
|
| Yes, the author shares your opinion.
| account-5 wrote:
| I have to say this is the best CVE writeups I've ever read. And I
| can only commend the author who is humble throughout despite
| authoring one of the cornerstone software products of the age.
| Much respect.
| notRobot wrote:
| I have always been a huge fan of Daniel's writing. Such clear and
| honest writing explaining reasonings and perspectives.
| gkhartman wrote:
| It still blows my mind that so many devices depend on a library
| largely written by a single person. The level of pressure must be
| insane and it shows in this bit:
|
| "Reading the code now it is impossible not to see the bug. Yes,
| it truly aches having to accept the fact that I did this mistake
| without noticing and that the flaw then remained undiscovered in
| code for 1315 days. I apologize. I am but a human."
|
| If Daniel happens to read this, thanks for your hard work, and I
| really don't think any apology is necessary. After all, the
| source was right there for any of us to read and review.
| nine_k wrote:
| https://xkcd.com/2347/ had been true for many years before, and
| has been true for many years since it was published.
|
| Sometimes the "Don't fix it if it ain't broken" principle is
| interpreted as "don't support it until breaks", and leads to
| nasty surprises.
| avg_dev wrote:
| i really feel that the first sentence in that quote is an apt
| way to sum up a part of how learning and attention work for us
| as humans.
|
| personally i've spent some time thinking about how in my life
| it is very easy for me to pay attention as hard as i can manage
| and still miss so many things. i feel like things like code
| review are a great way to work on this sort of thing (attention
| training, global and contextual awareness, etc.). for example,
| there is this place that i've been walking by regularly for a
| couple of years and someone maintains a nice flowerbed there
| and i pause to take a glance at it and have been doing so for
| about a year since i first noticed it. but a couple months ago
| i noticed that there is a companion flowerbed a few feet away
| that i just had never noticed because i was so focused on the
| first one. was that companion flowerbed there all along? i
| suspect that it was! but i have no idea because i didn't even
| know it existed.
|
| i have tried to extrapolate this to knowledge across humanity
| in general. i suspect sometimes it takes just one person to
| notice something, to make an observation that informs their
| behavior, and it begins to disseminate to us all. sometimes it
| takes several tries and a long time to happen. and i wonder how
| much low-hanging fruit there is out there that just none of us
| have noticed yet. and i think about how integrating a simple,
| fundamental set of best practices already established by others
| who have been paying attention can improve the low-bar for us
| all.
|
| i think that julia evans touched on this in
| https://jvns.ca/blog/2023/10/06/new-talk--making-hard-things...
| which was discussed here recently and so did dan luu in his
| https://danluu.com/p95-skill/ which has also been discussed
| here somewhere.
|
| and, to conclude, i too have a lot of gratitude for stenberg
| and all the folks who've worked on curl, and many other parts
| of our internet infrastructure that i usually take for granted
| every day.
| protomolecule wrote:
| Daniel himself has a strange way of saying "thank you" to Scott
| Adams for his classic strip.
|
| "On Scott Adams
|
| I use his "I'm going to write myself a minivan"-strip above
| because it's a classic. Adams himself has turned out to be a
| questionable person with questionable opinions and I do not
| condone or agree with what he says."
| BTinfinity wrote:
| Reminds me of how for a while OpenSSL was being developed and
| maintained by just two guys called Steve [0]. I think they've
| upgraded to a team of seven now at least
|
| [0] https://www.buzzfeed.com/chrisstokelwalker/the-internet-
| is-b...
| slowwriter wrote:
| Presumably all called Steve.
| estebank wrote:
| They spiced it up, now there's a Stephan and an Esteban in
| the mix.
| w23j wrote:
| So true. Also becomes apparent from this sentence in the
| Hackerone report:
|
| "This report seems entirely correct and it hurts in my soul."
|
| and the blog:
|
| "[...] shipping a heap overflow in code installed in over
| twenty billion instances is not an experience I would
| recommend."
|
| Harsh to have to bear that responsibility for so little reward.
| avgcorrection wrote:
| I think he'll be fine.
| deng wrote:
| This is a great writeup. However, even after also reading the
| CVE, I'm still unsure about under which circumstances you would
| be affected. As far as I understand it, you are affected if you
|
| * use a SOCKS5 proxy, AND
|
| * you use the SOCKS5 proxy to resolve hostnames (which AFAICS is
| NOT the default), AND
|
| * the size of the buffer was changed from the default (100kb) to
| something below 65541 bytes, AND (EDIT: Not correct: 'libcurl'
| re-uses the download buffer for this, which by default is 16kB,
| however it is said that 'curl' itself sets it manually to 100kb
| UNLESS you use --limit-rate)
|
| * the SOCKS5 proxy is too slow to handle the request immediately
| (which however, as the CVE states, can usually be provoked if the
| attacker has control over the request rate). (EDIT: This is
| wrong, the CVE actually says "Typical server latency is likely
| "slow" enough to trigger this bug without an attacker needing to
| influence it by DoS or SOCKS server control.")
|
| So to me, the attack vector seems very small. Am I missing
| something?
| foul wrote:
| To me, this looks like recipe for disaster navigating in
| darknets like Tor.
| nunez wrote:
| I wrote a series of shell scripts called docker-proxy that
| creates SOCKS5 tunnels to Docker containers running openconnect
| VPNs, useful when you're working with multiple customers who
| have different VPNs that want all traffic on your machine
| forwarded through. https://github.com/carlosonunez/docker-proxy
|
| Since DNS resolution is meant to occur remotely, this CVE would
| be directly applicable here.
| kiviuq wrote:
| I'm trying to under the last comment
|
| > the SOCKS5 proxy is too slow to handle the request
| immediately
|
| the request seg faults immediately, no delays, no redirects
|
| root@1aac5e228e16:/build/curl-7.74.0# curl -vvv -x
| socks5h://host.docker.internal:9050 $(python3 -c "print(('A'
| _10000), end= '')")
|
| _ Trying 192.168.65.254:9050...
|
| * SOCKS5: server resolving disabled for hostnames of length >
| 255 [actual len=10000]
|
| * SOCKS5 connect to AAAAA...
|
| * Send failure: Bad file descriptor
|
| * Failed to send SOCKS5 connect request.
|
| Segmentation fault
|
| https://gist.github.com/xen0bit/0dccb11605abbeb6021963e2b1a8...
| deng wrote:
| I'm confused why plain 'curl' would crash here: the CVE says
| that
|
| "The target buffer is the heap-based download buffer in
| libcurl that is reused for SOCKS negotiation before the
| transfer has started. The size of the buffer is 16kB by
| default, but can be set to different sizes by the
| application. The curl tool sets it to 102400 bytes by default
| - but it sets the buffer size to a smaller size if --limit-
| rate is set lower than 102400 bytes per second."
|
| I thought with a buffer of 100kB, this bug wouldn't trigger?
| deng wrote:
| Yes sorry, I was wrong. The CVE says the opposite, I put an
| EDIT in the OP.
| davidshepherd7 wrote:
| Yeah I think this is correct with the edit.
|
| I guess this is mostly relevant for software that runs on
| shared infra, sends requests to a url provided by an attacker
| (e.g. webhooks), and uses a SOCKS5 proxy?
| dig1 wrote:
| > Yes, this family of flaws would have been impossible if curl
| had been written in a memory-safe language instead of C
|
| It could have been impossible, but you never know if there is a
| way to escape language VM barriers. However, the author clearly
| ignored the DNS hostname limit stated in RFC1123 [1], which is
| hardcoded even in Java libraries.
|
| [1] https://www.rfc-editor.org/rfc/rfc1123
| js2 wrote:
| It's covered in the section titled "host name length":
|
| > A host name in a URL has no real size limit, but libcurl's
| URL parser refuses to accept names longer than 65535 bytes. DNS
| only accepts host names up 253 bytes. So, a legitimate name
| that is longer than 253 bytes is unusual. A real name that is
| longer than 1024 is virtually unheard of.
|
| DNS is not the only mechanism for resolving host name to
| address, even if it's what's used 99.9% of the time today.
| jandrese wrote:
| Legitimate requirements for hostnames longer than that have
| to be vanishingly small. I dare anybody to come up with a
| single anecdote from real world use.
| matja wrote:
| By the wording of the RFC, _accepting_ a hostname longer than
| 255 bytes is permissible, creating one is not.
| eesmith wrote:
| Could you be more specific? All I saw was:
| Host software MUST handle host names of up to 63 characters and
| SHOULD handle host names of up to 255 characters.
|
| I don't see where the RFC sets a upper limit on host name size.
| dig1 wrote:
| 6.1.3.5 Extensibility: The DNS defines
| domain name syntax very generally -- a string of
| labels each containing up to 63 8-bit octets,
| separated by dots, and with a maximum total of 255
| octets.
|
| RFC1035 makes this more explicit [1].
|
| [1] https://www.freesoft.org/CIE/RFC/1035/9.htm
| eesmith wrote:
| Thanks!
| datenwolf wrote:
| memory-safe language does not imply the use of a VM.
|
| It can just as well mean, that the compiler will attempt to
| execute a proof, that memory is never accessed out of bounds,
| without well defined ownership and within the lifetime of the
| underlying object. Which is what Rust does, for example.
|
| Of course it can also mean, that the compiler will then
| additionally add internal failure checks and safeguards at
| critical places (Rust does not do this, but it would be nice to
| have in systems where one might worry about in-register bit-
| flips (high radiation environments, like X-ray scanners), i.e.
| stuff not caught by - say - e.g. ECC memory).
| pstuart wrote:
| Maybe I'm weird, but having a ternary that evaluates to ? TRUE :
| FALSE bugs me.
| pizza234 wrote:
| Very sensible conclusions about memory safety/memory-safe
| languages; excerpts follow.
|
| ---
|
| Yes, this family of flaws would have been impossible if curl had
| been written in a memory-safe language instead of C [...]
|
| The only approach in that direction I consider viable and
| sensible is to:
|
| - allow, use and support more dependencies written in memory-safe
| languages and
|
| - potentially and gradually replace parts of curl piecemeal, like
| with the introduction of hyper.
|
| Such development is however currently happening in a near glacial
| speed and shows with painful clarity the challenges involved.
| curl will remain written in C for the foreseeable future.
|
| Everyone not happy about this are of course welcome to roll up
| their sleeves and get working.
|
| Including the latest two CVEs reported for curl 8.4.0, the
| accumulated total says that 41% of the security vulnerabilities
| ever found in curl would likely not have happened should we have
| used a memory-safe language. But also: the rust language was not
| even a possibility for practical use for this purpose during the
| time in which we introduced maybe the first 80% of the C related
| problems.
|
| [...]
|
| We repeatedly run several static code analyzers on the code and
| none of them have spotted any problems in this function.
| galangalalgol wrote:
| In the last week new fuzzers and tests also appear to have been
| added. At this point, I think any c project that isn't safety
| critical would do well to mimic this pipeline.
| [deleted]
| 4death4 wrote:
| Replacing parts of curl with Rust will not be possible. Most
| places want to build from source and don't want to introduce a
| new compiler to the tool chain. And Rust supports a tiny subset
| of possible C targets.
| moltonel3x wrote:
| > Replacing parts of curl with Rust will not be possible.
|
| It's not just possible, it's been done. You can compile curl
| with rustls, you could for a time compile it with quiche, and
| work is ongoing to compile it with hyper. Curl is remarkably
| modular, none of those are mandatory.
|
| > And Rust supports a tiny subset of possible C targets.
|
| Gross overstatement. Rust supports the vast majority of
| devices that people buy today. Even if you ignore platform
| popularity and just count platforms supported by gcc vs
| rustc/llvm, there's only a handful missing from the later.
| And if you're talking about vendor-specific compilers, a lot
| of them don't support modern C or C++ either.
| pizza234 wrote:
| The maintainer wrote a post about this topic:
| https://daniel.haxx.se/blog/2020/10/09/rust-in-curl-with-
| hyp....
|
| "Most places want to build from source" is not something he
| considered; there is a brief "for users against the C
| implementation on the platforms you care about".
| steveklabnik wrote:
| Just to be clear, since this hypothetical library exposes a C
| ABI, pre-built binaries wouldn't be an issue if somehow
| magically curl was reimplemented in Rust. Someone would have
| to compile it, but that's the same as if they didn't want to
| build something written in C from source.
| agwa wrote:
| Should the majority of us continue to bear the risks of
| memory unsafe code because of a small minority that uses
| exotic architectures or doesn't want to install a new
| compiler?
| rhdunn wrote:
| You need to have someone, or a group of people, willing to
| write the code to implement a replacement that is fully
| compatible, and maintain it. Until then, curl written in C
| is what we have.
|
| With that, curl can be used on the exotic architectures and
| the rust/other language version can be used on other
| platforms.
| moltonel3x wrote:
| You need to have someone, or a group of people, willing
| to write the code to implement support for modern tooling
| on their platform. Until then, old harder-to-secure
| codebases is what these platforms have.
|
| I know reversing the burden of implementation seems
| flippant, but it's pragmatic. At some stage, it's less
| community-wide work to support Rust on a new platform
| than to spend extra time maintaining/securing dozens of C
| codebases. Curl may not be making its Rust components
| mandatory anytime soon, but other projects like python
| crypto already have, to say nothing of projects written
| in Rust to begin with.
|
| rustc_codegen_gcc is pretty close to ready, let's focus
| on getting it out of the door, and more target triples
| supported by rustc and llvm.
| dezgeg wrote:
| "Most" is a very heavy claim. Any concrete evidence of this?
|
| Also in embedded Linux I guess we are technically "building
| from source", when using a build system like Buildroot or
| Yocto. But Rust is already supported there and in fact
| already in use in our project (due to python3-cryptography).
| pmontra wrote:
| What are those "most places" that "want to build from
| source"? It's a honest question and I intentionally left out
| the "don't want to introduce a new compiler to the tool
| chain" statement to make the set larger. I understand "most"
| as larger than 50% on some metric.
|
| I know about Gentoo and other Linux distributions that do
| build from source but it has been very uncommon for me to see
| a company building things from source instead of using apt,
| rpm, docker in this century. It's at least two orders of
| magnitude faster. I remember how in the 90s I had to download
| the source, scan the README for the dependencies, recursively
| so, then building the libraries, then eventually building the
| program I downloaded first.
|
| Security wise, we were not reading the code, except for
| Nethack.
| jdblair wrote:
| The places that want to build from source are the creators
| of embedded devices that use curl. These devices include
| (but are not limited to) wifi access points, routers, home
| automation devices, data loggers, IP video cameras, smart
| TVs, set top boxes, and streaming devices. There are a lot
| of these devices, and there are more all the time.
|
| Even if the engineering teams do not build every dependency
| every time they build firmware (like busybox), they will at
| some point build all of the components of their system
| themselves (as supported by yocto).
| dezgeg wrote:
| Anybody using the standard embedded build solutions like
| Buildroot or Yocto will get Rust support without doing
| anything. It's already there and implemented.
| pizza234 wrote:
| > These devices include (but are not limited to) wifi
| access points, routers, home automation devices, data
| loggers, IP video cameras, smart TVs, set top boxes, and
| streaming devices
|
| Access points/routers/smart TVs/set top boxes are
| typically be based on ARM/Linux/Android, and should be
| supported.
| yjftsjthsd-h wrote:
| https://doc.rust-lang.org/nightly/rustc/platform-
| support.htm... seems to say that only aarch64 Linux and
| assorted x86 variants are tier 1; everything else is tier
| 2 (it compiles but isn't tested to actually work) at best
| pmontra wrote:
| That makes sense. It was not clear in the context of the
| post I replied to.
| jdblair wrote:
| Yeah, I admit that maybe this isn't "most places," since
| there are almost certainly more shops running various
| versions of Linux servers in production than there are
| embedded Linux development teams.
|
| (edited to fix typo)
| galangalalgol wrote:
| Not wanting to introduce a new compiler is like wanting to
| introduce a new language. If you don't like the choice the
| package maintainer makes, roll up your sleeves.
|
| That said, we can mitigiate it using mrustc to generate c.
| Does anyone keep a list of all the targets libcurl is getting
| built for?
| ChoHag wrote:
| [flagged]
| pgeorgi wrote:
| Come on, it starts out with a link to the CVE and a remark
| "While the advisory contains all the necessary details. I
| figured I would use a few additional words..."
|
| If you just want the gist of it, the article does everything it
| can to send you there.
| Cloudef wrote:
| Quite a bit of hype and drama for something that requires very
| specific condition to be exploited.
| palata wrote:
| ... and that is installed on billions on devices. I can imagine
| you quickly get very specific conditions here and there when
| something is deployed at that scale.
| faster_harder wrote:
| Curl has had their share of issues with CVEs, I assume they
| want to show that they are taking _actual_ security issues very
| seriously. See e.g.
| https://daniel.haxx.se/blog/2023/08/26/cve-2020-19909-is-eve...
| junon wrote:
| [flagged]
| knallfrosch wrote:
| > Everyone not happy about this are of course welcome to roll
| up their sleeves and get working.
| junon wrote:
| I didn't post it as an angry response, just knew what kind of
| comments this would receive. Not sure why it got flagged, I
| wasn't saying this was some sort of excuse or that it should
| be rewritten, just found it interesting he included it. I'm
| sure he gets that comment a lot.
| ahoka wrote:
| It doesn't help that the whole function is a huge computed goto
| mutating things, spanning multiple pages.
| messe wrote:
| ...you mean the most common pattern for writing a state
| machine?
| anon-3988 wrote:
| There's obviously no better way to write state machines...
| parasti wrote:
| Can you elaborate?
| msla wrote:
| If they could, they wouldn't have used sarcasm.
| gdprrrr wrote:
| Not OP, but in functional languages, state machines are
| often simply a function (State, Input) -> State
| diogenes4 wrote:
| This is a well-discussed issue with c memes, not a
| widespread issue. Even in c, jump tables are a safer
| alternative.
| jeffrallen wrote:
| This is a long and interesting note which could be reduced to
| "this is why system libraries need to be either written in a safe
| language or proved correct".
|
| When one of the best, friendliest, and most transparent C
| programmers of our time is writing these posts, we need to pay
| attention.
| yjftsjthsd-h wrote:
| > When one of the best, friendliest, and most transparent C
| programmers of our time is writing these posts, we need to pay
| attention.
|
| Given that said programmer wrote,
|
| > Such development is however currently happening in a near
| glacial speed and shows with painful clarity the challenges
| involved. curl will remain written in C for the foreseeable
| future.
|
| I'm curious what conclusion you'd like us to draw.
___________________________________________________________________
(page generated 2023-10-11 16:01 UTC)