[HN Gopher] How I Made a Heap Overflow in Curl
       ___________________________________________________________________
        
       How I Made a Heap Overflow in Curl
        
       Author : ghuntley
       Score  : 267 points
       Date   : 2023-10-11 06:12 UTC (9 hours ago)
        
 (HTM) web link (daniel.haxx.se)
 (TXT) w3m dump (daniel.haxx.se)
        
       | taspeotis wrote:
       | https://hackerone.com/reports/2187833
        
       | ynik wrote:
       | Yesterday's Windows 10 security updates did not include a new
       | curl version; the curl included with Windows is still 8.0.1.
        
       | dogben wrote:
       | if(!socks5_resolve_local && hostname_len > 255) {
       | socks5_resolve_local = TRUE;       }
       | 
       | This is really a bad idea. For people who use anti-censorship
       | tool to protect privacy, this can leak their identity through
       | DNS.
        
         | [deleted]
        
         | vsnf wrote:
         | > I think it was downright wrong to switch mode like this,
         | since the user asked for remote resolve curl should stick to
         | that or fail. It is not even likely to work to just switch,
         | even in "good" situations.
         | 
         | Yes, the author shares your opinion.
        
       | account-5 wrote:
       | I have to say this is the best CVE writeups I've ever read. And I
       | can only commend the author who is humble throughout despite
       | authoring one of the cornerstone software products of the age.
       | Much respect.
        
       | notRobot wrote:
       | I have always been a huge fan of Daniel's writing. Such clear and
       | honest writing explaining reasonings and perspectives.
        
       | gkhartman wrote:
       | It still blows my mind that so many devices depend on a library
       | largely written by a single person. The level of pressure must be
       | insane and it shows in this bit:
       | 
       | "Reading the code now it is impossible not to see the bug. Yes,
       | it truly aches having to accept the fact that I did this mistake
       | without noticing and that the flaw then remained undiscovered in
       | code for 1315 days. I apologize. I am but a human."
       | 
       | If Daniel happens to read this, thanks for your hard work, and I
       | really don't think any apology is necessary. After all, the
       | source was right there for any of us to read and review.
        
         | nine_k wrote:
         | https://xkcd.com/2347/ had been true for many years before, and
         | has been true for many years since it was published.
         | 
         | Sometimes the "Don't fix it if it ain't broken" principle is
         | interpreted as "don't support it until breaks", and leads to
         | nasty surprises.
        
         | avg_dev wrote:
         | i really feel that the first sentence in that quote is an apt
         | way to sum up a part of how learning and attention work for us
         | as humans.
         | 
         | personally i've spent some time thinking about how in my life
         | it is very easy for me to pay attention as hard as i can manage
         | and still miss so many things. i feel like things like code
         | review are a great way to work on this sort of thing (attention
         | training, global and contextual awareness, etc.). for example,
         | there is this place that i've been walking by regularly for a
         | couple of years and someone maintains a nice flowerbed there
         | and i pause to take a glance at it and have been doing so for
         | about a year since i first noticed it. but a couple months ago
         | i noticed that there is a companion flowerbed a few feet away
         | that i just had never noticed because i was so focused on the
         | first one. was that companion flowerbed there all along? i
         | suspect that it was! but i have no idea because i didn't even
         | know it existed.
         | 
         | i have tried to extrapolate this to knowledge across humanity
         | in general. i suspect sometimes it takes just one person to
         | notice something, to make an observation that informs their
         | behavior, and it begins to disseminate to us all. sometimes it
         | takes several tries and a long time to happen. and i wonder how
         | much low-hanging fruit there is out there that just none of us
         | have noticed yet. and i think about how integrating a simple,
         | fundamental set of best practices already established by others
         | who have been paying attention can improve the low-bar for us
         | all.
         | 
         | i think that julia evans touched on this in
         | https://jvns.ca/blog/2023/10/06/new-talk--making-hard-things...
         | which was discussed here recently and so did dan luu in his
         | https://danluu.com/p95-skill/ which has also been discussed
         | here somewhere.
         | 
         | and, to conclude, i too have a lot of gratitude for stenberg
         | and all the folks who've worked on curl, and many other parts
         | of our internet infrastructure that i usually take for granted
         | every day.
        
         | protomolecule wrote:
         | Daniel himself has a strange way of saying "thank you" to Scott
         | Adams for his classic strip.
         | 
         | "On Scott Adams
         | 
         | I use his "I'm going to write myself a minivan"-strip above
         | because it's a classic. Adams himself has turned out to be a
         | questionable person with questionable opinions and I do not
         | condone or agree with what he says."
        
         | BTinfinity wrote:
         | Reminds me of how for a while OpenSSL was being developed and
         | maintained by just two guys called Steve [0]. I think they've
         | upgraded to a team of seven now at least
         | 
         | [0] https://www.buzzfeed.com/chrisstokelwalker/the-internet-
         | is-b...
        
           | slowwriter wrote:
           | Presumably all called Steve.
        
             | estebank wrote:
             | They spiced it up, now there's a Stephan and an Esteban in
             | the mix.
        
         | w23j wrote:
         | So true. Also becomes apparent from this sentence in the
         | Hackerone report:
         | 
         | "This report seems entirely correct and it hurts in my soul."
         | 
         | and the blog:
         | 
         | "[...] shipping a heap overflow in code installed in over
         | twenty billion instances is not an experience I would
         | recommend."
         | 
         | Harsh to have to bear that responsibility for so little reward.
        
         | avgcorrection wrote:
         | I think he'll be fine.
        
       | deng wrote:
       | This is a great writeup. However, even after also reading the
       | CVE, I'm still unsure about under which circumstances you would
       | be affected. As far as I understand it, you are affected if you
       | 
       | * use a SOCKS5 proxy, AND
       | 
       | * you use the SOCKS5 proxy to resolve hostnames (which AFAICS is
       | NOT the default), AND
       | 
       | * the size of the buffer was changed from the default (100kb) to
       | something below 65541 bytes, AND (EDIT: Not correct: 'libcurl'
       | re-uses the download buffer for this, which by default is 16kB,
       | however it is said that 'curl' itself sets it manually to 100kb
       | UNLESS you use --limit-rate)
       | 
       | * the SOCKS5 proxy is too slow to handle the request immediately
       | (which however, as the CVE states, can usually be provoked if the
       | attacker has control over the request rate). (EDIT: This is
       | wrong, the CVE actually says "Typical server latency is likely
       | "slow" enough to trigger this bug without an attacker needing to
       | influence it by DoS or SOCKS server control.")
       | 
       | So to me, the attack vector seems very small. Am I missing
       | something?
        
         | foul wrote:
         | To me, this looks like recipe for disaster navigating in
         | darknets like Tor.
        
         | nunez wrote:
         | I wrote a series of shell scripts called docker-proxy that
         | creates SOCKS5 tunnels to Docker containers running openconnect
         | VPNs, useful when you're working with multiple customers who
         | have different VPNs that want all traffic on your machine
         | forwarded through. https://github.com/carlosonunez/docker-proxy
         | 
         | Since DNS resolution is meant to occur remotely, this CVE would
         | be directly applicable here.
        
         | kiviuq wrote:
         | I'm trying to under the last comment
         | 
         | > the SOCKS5 proxy is too slow to handle the request
         | immediately
         | 
         | the request seg faults immediately, no delays, no redirects
         | 
         | root@1aac5e228e16:/build/curl-7.74.0# curl -vvv -x
         | socks5h://host.docker.internal:9050 $(python3 -c "print(('A'
         | _10000), end= '')")
         | 
         | _ Trying 192.168.65.254:9050...
         | 
         | * SOCKS5: server resolving disabled for hostnames of length >
         | 255 [actual len=10000]
         | 
         | * SOCKS5 connect to AAAAA...
         | 
         | * Send failure: Bad file descriptor
         | 
         | * Failed to send SOCKS5 connect request.
         | 
         | Segmentation fault
         | 
         | https://gist.github.com/xen0bit/0dccb11605abbeb6021963e2b1a8...
        
           | deng wrote:
           | I'm confused why plain 'curl' would crash here: the CVE says
           | that
           | 
           | "The target buffer is the heap-based download buffer in
           | libcurl that is reused for SOCKS negotiation before the
           | transfer has started. The size of the buffer is 16kB by
           | default, but can be set to different sizes by the
           | application. The curl tool sets it to 102400 bytes by default
           | - but it sets the buffer size to a smaller size if --limit-
           | rate is set lower than 102400 bytes per second."
           | 
           | I thought with a buffer of 100kB, this bug wouldn't trigger?
        
           | deng wrote:
           | Yes sorry, I was wrong. The CVE says the opposite, I put an
           | EDIT in the OP.
        
         | davidshepherd7 wrote:
         | Yeah I think this is correct with the edit.
         | 
         | I guess this is mostly relevant for software that runs on
         | shared infra, sends requests to a url provided by an attacker
         | (e.g. webhooks), and uses a SOCKS5 proxy?
        
       | dig1 wrote:
       | > Yes, this family of flaws would have been impossible if curl
       | had been written in a memory-safe language instead of C
       | 
       | It could have been impossible, but you never know if there is a
       | way to escape language VM barriers. However, the author clearly
       | ignored the DNS hostname limit stated in RFC1123 [1], which is
       | hardcoded even in Java libraries.
       | 
       | [1] https://www.rfc-editor.org/rfc/rfc1123
        
         | js2 wrote:
         | It's covered in the section titled "host name length":
         | 
         | > A host name in a URL has no real size limit, but libcurl's
         | URL parser refuses to accept names longer than 65535 bytes. DNS
         | only accepts host names up 253 bytes. So, a legitimate name
         | that is longer than 253 bytes is unusual. A real name that is
         | longer than 1024 is virtually unheard of.
         | 
         | DNS is not the only mechanism for resolving host name to
         | address, even if it's what's used 99.9% of the time today.
        
           | jandrese wrote:
           | Legitimate requirements for hostnames longer than that have
           | to be vanishingly small. I dare anybody to come up with a
           | single anecdote from real world use.
        
         | matja wrote:
         | By the wording of the RFC, _accepting_ a hostname longer than
         | 255 bytes is permissible, creating one is not.
        
         | eesmith wrote:
         | Could you be more specific? All I saw was:
         | Host software MUST handle host names of up to 63 characters and
         | SHOULD handle host names of up to 255 characters.
         | 
         | I don't see where the RFC sets a upper limit on host name size.
        
           | dig1 wrote:
           | 6.1.3.5 Extensibility:                   The DNS defines
           | domain name syntax very generally -- a         string of
           | labels each containing up to 63 8-bit octets,
           | separated by dots, and with a maximum total of 255
           | octets.
           | 
           | RFC1035 makes this more explicit [1].
           | 
           | [1] https://www.freesoft.org/CIE/RFC/1035/9.htm
        
             | eesmith wrote:
             | Thanks!
        
         | datenwolf wrote:
         | memory-safe language does not imply the use of a VM.
         | 
         | It can just as well mean, that the compiler will attempt to
         | execute a proof, that memory is never accessed out of bounds,
         | without well defined ownership and within the lifetime of the
         | underlying object. Which is what Rust does, for example.
         | 
         | Of course it can also mean, that the compiler will then
         | additionally add internal failure checks and safeguards at
         | critical places (Rust does not do this, but it would be nice to
         | have in systems where one might worry about in-register bit-
         | flips (high radiation environments, like X-ray scanners), i.e.
         | stuff not caught by - say - e.g. ECC memory).
        
       | pstuart wrote:
       | Maybe I'm weird, but having a ternary that evaluates to ? TRUE :
       | FALSE bugs me.
        
       | pizza234 wrote:
       | Very sensible conclusions about memory safety/memory-safe
       | languages; excerpts follow.
       | 
       | ---
       | 
       | Yes, this family of flaws would have been impossible if curl had
       | been written in a memory-safe language instead of C [...]
       | 
       | The only approach in that direction I consider viable and
       | sensible is to:
       | 
       | - allow, use and support more dependencies written in memory-safe
       | languages and
       | 
       | - potentially and gradually replace parts of curl piecemeal, like
       | with the introduction of hyper.
       | 
       | Such development is however currently happening in a near glacial
       | speed and shows with painful clarity the challenges involved.
       | curl will remain written in C for the foreseeable future.
       | 
       | Everyone not happy about this are of course welcome to roll up
       | their sleeves and get working.
       | 
       | Including the latest two CVEs reported for curl 8.4.0, the
       | accumulated total says that 41% of the security vulnerabilities
       | ever found in curl would likely not have happened should we have
       | used a memory-safe language. But also: the rust language was not
       | even a possibility for practical use for this purpose during the
       | time in which we introduced maybe the first 80% of the C related
       | problems.
       | 
       | [...]
       | 
       | We repeatedly run several static code analyzers on the code and
       | none of them have spotted any problems in this function.
        
         | galangalalgol wrote:
         | In the last week new fuzzers and tests also appear to have been
         | added. At this point, I think any c project that isn't safety
         | critical would do well to mimic this pipeline.
        
         | [deleted]
        
         | 4death4 wrote:
         | Replacing parts of curl with Rust will not be possible. Most
         | places want to build from source and don't want to introduce a
         | new compiler to the tool chain. And Rust supports a tiny subset
         | of possible C targets.
        
           | moltonel3x wrote:
           | > Replacing parts of curl with Rust will not be possible.
           | 
           | It's not just possible, it's been done. You can compile curl
           | with rustls, you could for a time compile it with quiche, and
           | work is ongoing to compile it with hyper. Curl is remarkably
           | modular, none of those are mandatory.
           | 
           | > And Rust supports a tiny subset of possible C targets.
           | 
           | Gross overstatement. Rust supports the vast majority of
           | devices that people buy today. Even if you ignore platform
           | popularity and just count platforms supported by gcc vs
           | rustc/llvm, there's only a handful missing from the later.
           | And if you're talking about vendor-specific compilers, a lot
           | of them don't support modern C or C++ either.
        
           | pizza234 wrote:
           | The maintainer wrote a post about this topic:
           | https://daniel.haxx.se/blog/2020/10/09/rust-in-curl-with-
           | hyp....
           | 
           | "Most places want to build from source" is not something he
           | considered; there is a brief "for users against the C
           | implementation on the platforms you care about".
        
           | steveklabnik wrote:
           | Just to be clear, since this hypothetical library exposes a C
           | ABI, pre-built binaries wouldn't be an issue if somehow
           | magically curl was reimplemented in Rust. Someone would have
           | to compile it, but that's the same as if they didn't want to
           | build something written in C from source.
        
           | agwa wrote:
           | Should the majority of us continue to bear the risks of
           | memory unsafe code because of a small minority that uses
           | exotic architectures or doesn't want to install a new
           | compiler?
        
             | rhdunn wrote:
             | You need to have someone, or a group of people, willing to
             | write the code to implement a replacement that is fully
             | compatible, and maintain it. Until then, curl written in C
             | is what we have.
             | 
             | With that, curl can be used on the exotic architectures and
             | the rust/other language version can be used on other
             | platforms.
        
               | moltonel3x wrote:
               | You need to have someone, or a group of people, willing
               | to write the code to implement support for modern tooling
               | on their platform. Until then, old harder-to-secure
               | codebases is what these platforms have.
               | 
               | I know reversing the burden of implementation seems
               | flippant, but it's pragmatic. At some stage, it's less
               | community-wide work to support Rust on a new platform
               | than to spend extra time maintaining/securing dozens of C
               | codebases. Curl may not be making its Rust components
               | mandatory anytime soon, but other projects like python
               | crypto already have, to say nothing of projects written
               | in Rust to begin with.
               | 
               | rustc_codegen_gcc is pretty close to ready, let's focus
               | on getting it out of the door, and more target triples
               | supported by rustc and llvm.
        
           | dezgeg wrote:
           | "Most" is a very heavy claim. Any concrete evidence of this?
           | 
           | Also in embedded Linux I guess we are technically "building
           | from source", when using a build system like Buildroot or
           | Yocto. But Rust is already supported there and in fact
           | already in use in our project (due to python3-cryptography).
        
           | pmontra wrote:
           | What are those "most places" that "want to build from
           | source"? It's a honest question and I intentionally left out
           | the "don't want to introduce a new compiler to the tool
           | chain" statement to make the set larger. I understand "most"
           | as larger than 50% on some metric.
           | 
           | I know about Gentoo and other Linux distributions that do
           | build from source but it has been very uncommon for me to see
           | a company building things from source instead of using apt,
           | rpm, docker in this century. It's at least two orders of
           | magnitude faster. I remember how in the 90s I had to download
           | the source, scan the README for the dependencies, recursively
           | so, then building the libraries, then eventually building the
           | program I downloaded first.
           | 
           | Security wise, we were not reading the code, except for
           | Nethack.
        
             | jdblair wrote:
             | The places that want to build from source are the creators
             | of embedded devices that use curl. These devices include
             | (but are not limited to) wifi access points, routers, home
             | automation devices, data loggers, IP video cameras, smart
             | TVs, set top boxes, and streaming devices. There are a lot
             | of these devices, and there are more all the time.
             | 
             | Even if the engineering teams do not build every dependency
             | every time they build firmware (like busybox), they will at
             | some point build all of the components of their system
             | themselves (as supported by yocto).
        
               | dezgeg wrote:
               | Anybody using the standard embedded build solutions like
               | Buildroot or Yocto will get Rust support without doing
               | anything. It's already there and implemented.
        
               | pizza234 wrote:
               | > These devices include (but are not limited to) wifi
               | access points, routers, home automation devices, data
               | loggers, IP video cameras, smart TVs, set top boxes, and
               | streaming devices
               | 
               | Access points/routers/smart TVs/set top boxes are
               | typically be based on ARM/Linux/Android, and should be
               | supported.
        
               | yjftsjthsd-h wrote:
               | https://doc.rust-lang.org/nightly/rustc/platform-
               | support.htm... seems to say that only aarch64 Linux and
               | assorted x86 variants are tier 1; everything else is tier
               | 2 (it compiles but isn't tested to actually work) at best
        
               | pmontra wrote:
               | That makes sense. It was not clear in the context of the
               | post I replied to.
        
               | jdblair wrote:
               | Yeah, I admit that maybe this isn't "most places," since
               | there are almost certainly more shops running various
               | versions of Linux servers in production than there are
               | embedded Linux development teams.
               | 
               | (edited to fix typo)
        
           | galangalalgol wrote:
           | Not wanting to introduce a new compiler is like wanting to
           | introduce a new language. If you don't like the choice the
           | package maintainer makes, roll up your sleeves.
           | 
           | That said, we can mitigiate it using mrustc to generate c.
           | Does anyone keep a list of all the targets libcurl is getting
           | built for?
        
       | ChoHag wrote:
       | [flagged]
        
         | pgeorgi wrote:
         | Come on, it starts out with a link to the CVE and a remark
         | "While the advisory contains all the necessary details. I
         | figured I would use a few additional words..."
         | 
         | If you just want the gist of it, the article does everything it
         | can to send you there.
        
       | Cloudef wrote:
       | Quite a bit of hype and drama for something that requires very
       | specific condition to be exploited.
        
         | palata wrote:
         | ... and that is installed on billions on devices. I can imagine
         | you quickly get very specific conditions here and there when
         | something is deployed at that scale.
        
         | faster_harder wrote:
         | Curl has had their share of issues with CVEs, I assume they
         | want to show that they are taking _actual_ security issues very
         | seriously. See e.g.
         | https://daniel.haxx.se/blog/2023/08/26/cve-2020-19909-is-eve...
        
       | junon wrote:
       | [flagged]
        
         | knallfrosch wrote:
         | > Everyone not happy about this are of course welcome to roll
         | up their sleeves and get working.
        
           | junon wrote:
           | I didn't post it as an angry response, just knew what kind of
           | comments this would receive. Not sure why it got flagged, I
           | wasn't saying this was some sort of excuse or that it should
           | be rewritten, just found it interesting he included it. I'm
           | sure he gets that comment a lot.
        
         | ahoka wrote:
         | It doesn't help that the whole function is a huge computed goto
         | mutating things, spanning multiple pages.
        
           | messe wrote:
           | ...you mean the most common pattern for writing a state
           | machine?
        
             | anon-3988 wrote:
             | There's obviously no better way to write state machines...
        
               | parasti wrote:
               | Can you elaborate?
        
               | msla wrote:
               | If they could, they wouldn't have used sarcasm.
        
               | gdprrrr wrote:
               | Not OP, but in functional languages, state machines are
               | often simply a function (State, Input) -> State
        
             | diogenes4 wrote:
             | This is a well-discussed issue with c memes, not a
             | widespread issue. Even in c, jump tables are a safer
             | alternative.
        
       | jeffrallen wrote:
       | This is a long and interesting note which could be reduced to
       | "this is why system libraries need to be either written in a safe
       | language or proved correct".
       | 
       | When one of the best, friendliest, and most transparent C
       | programmers of our time is writing these posts, we need to pay
       | attention.
        
         | yjftsjthsd-h wrote:
         | > When one of the best, friendliest, and most transparent C
         | programmers of our time is writing these posts, we need to pay
         | attention.
         | 
         | Given that said programmer wrote,
         | 
         | > Such development is however currently happening in a near
         | glacial speed and shows with painful clarity the challenges
         | involved. curl will remain written in C for the foreseeable
         | future.
         | 
         | I'm curious what conclusion you'd like us to draw.
        
       ___________________________________________________________________
       (page generated 2023-10-11 16:01 UTC)