[HN Gopher] Show HN: Micro HTTP server in 22 lines of C
___________________________________________________________________
Show HN: Micro HTTP server in 22 lines of C
Author : jpegqs
Score : 182 points
Date : 2021-07-31 11:07 UTC (11 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| milansuk wrote:
| That's the beauty of original HTTP - simplicity. Same as parsing
| HTML(in the 90s). With HTTPS(S as Secure) it's a whole different
| story and most programmers use some library.
| Arnavion wrote:
| But HTTPS just adds TLS. You can use "some library" to do the
| TLS handshake and subsequent encryption, and end up with a
| readable-writable stream that you can then parse HTTP from
| yourself. Your code is the same as when it was dealing with a
| TCP stream directly.
| secondcoming wrote:
| There's nothing simple about HTTP. It looks like it should be
| simple, but it isn't.
| darnir wrote:
| HTTP/1.x is anything but simple. They were under defined and
| overly complex in many ways. The original RFC was so complex
| that when reworked, they split it into 6 documents.
|
| I've worked heavily on some HTTP implementations and its
| ridiculously hard to get them right.
|
| Not to mention, this "server" only responds to a simple well
| formed GET request. Without handling about 90% of what the HTTP
| specifications talk about. Its a nice project, but it doesn't
| speak to the simplicity of HTTP
| kevinoid wrote:
| I agree. As with many things, it's only simple as long as you
| ignore the complexities. As they say, the devil's in the
| details.
|
| > this "server" only responds to a simple well formed GET
| request.
|
| And not even that. The Request-URI in a Simple-Request line
| (inherited from HTTP/0.9) may contain escape characters.
| (e.g. `GET /my%20file.txt` to get `my file.txt`) HTTP/1.0
| states "The origin server must decode the Request-URI in
| order to properly interpret the request."[1] This server does
| not.
|
| Which is not to say that this server isn't interesting. Just
| that it's not a demonstration of how easy HTTP/1 is to parse.
|
| [1]: https://www.w3.org/Protocols/HTTP/1.0/spec.html#Request-
| URI
| nly wrote:
| No, parsing HTTP/1.x is a nightmare and definitely not simple.
| It wasn't even particularly well defined until 2014 when the
| original RFCs were modernized, and even now there are bugs
| reported in HTTP parsers all the time.
|
| Node.js came out in 2009, a full ten years after HTTP/1.1 (RFC
| 2068) and its original http-parser is rather hard to follow,
| doesn't conform to the RFCs for performance reasons, and is
| considered unmaintainable by the author of it's replacement[0]
|
| As for parsing HTML, well go look at how Cloudflare have
| stumbled[1]
|
| [0] https://github.com/nodejs/llhttp
|
| [1] https://blog.cloudflare.com/incident-report-on-memory-
| leak-c...
| ibraheemdev wrote:
| > Node.js came out in 2009, a full ten years after HTTP/1.1
| (RFC 2068) and it's original http-parser is full-on spaghetti
| code, doesn't conform to the RFCs for performance reasons,
| and is considered unmaintainable by the author of it's
| replacement
|
| That's because of the way the parser is written. There are
| other simpler parsers that are much more readable.
| na85 wrote:
| Seems like it's yet another example of the node ecosystem
| being amateur hour, rather than a problem with HTTP.
| jart wrote:
| I'm the author of the fastest open source HTTP server.
| Parsing HTTP 0.9, 1.0, and 1.1 is trivial. It's a walk in the
| park. It only takes about a hundred lines of code to create a
| proper O(n) parser. https://github.com/jart/cosmopolitan/blob
| /0b317523a0875d83d6...
|
| The Joyent HTTP parser used by Node is very good but it's
| implemented in a way that makes the problem much more
| complicated than it needs to be. The biggest obstacle with
| high-performance HTTP message parsing is the case-insensitive
| string comparison of header field names. Some servers like
| thttpd do the naive thing and just use a long sequence of
| strcasecmp() statements. Joyent goes "fast" because it uses
| callbacks, which effectively punts the problem to the caller,
| and, for a few select headers which it handles itself, like
| Content-Length, it uses this really complicated internal
| "h_matching" thing for doing painstakingly written out
| hardcoded character compares. Redbean solves the problem by
| using better computer science: perfect hash tables. Thanks to
| gperf command. That makes the API itself much more elegant
| since the parser can not only go faster but return a hash-
| table like structure where individual headers can be indexed
| without performing string comparisons.
| ysleepy wrote:
| I consider Header value parsing and URL parsing part of
| HTTP, those are certainly not trivial.
|
| The charset problems alone are a nightmare.
|
| Parsing the wire format is pretty breezy, (Don't forget
| trailers!)
| jart wrote:
| Trailers can be parsed by invoking the function using
| something along the lines of ParseHttpMessage((struct
| HttpMessage){.t = kHttpStateName}, p, n) where you just
| tell the parser to skip the first-line states. Charset
| isn't a nightmare either. Headers are ISO-8601-1 so you
| just say {0300 | c >> 6, 0200 | c & 077} to turn them
| into UTF-8. It's not difficult. It might be if you want
| to support MIME. But this is HTTP we're talking about. It
| was made to be simple! We're talking Internet engineering
| on the lowest difficulty setting. Implement a TCP or SIP
| stack if you want hard.
| mariusor wrote:
| I think that implementing a proper state machine for the
| header parsing with ragel would give a more comprehensive
| result than using gperf or even the handmade one from your
| code.
|
| I think there are already some versions of the ragel code
| online, but they might be for other target programming
| languages.
| jart wrote:
| I'm one of the authors of Ragel and I disagree with you.
| HTTP is trivial enough that you'd be better served
| writing the state machine yourself using a switch
| statement. See my GitHub link above for an example. The
| code easily ports to other languages, like Java. Lastly
| when it comes to Ragel and gperf, they both do two
| completely different things. Ragel would generate a
| prefix trie search in generated code which would have
| enormous code size compared to what gperf is doing, which
| is much faster. With gperf, you only need to consider
| exactly O(3) octets total to tell which header it is.
| After that, it does a single quick string compare to
| confirm it's one of the predetermined headers rather than
| some unknowable value.
| mariusor wrote:
| I apparently never paid enough attention to how in the
| spec there's a clear defined list and I always assumed
| that a "parser" should handle all valid header*ish
| looking pairs.
|
| Based on this consideration I was thinking that the ragel
| state machine would generate faster code for the non
| happy path (invalid non-ascii, or other types of error)
| at least in the GOTO version.
|
| When working on the full list it makes perfect sense to
| check the minimum amount of bytes for identifying
| headers, so thank you for the clarifications, very
| informative. :)
| giancarlostoro wrote:
| Somewhat related but in the Python space of things: I love
| that Python has a standard for web frameworks so much so that
| you can build your own web framework that targets said
| standard and it can be deployed anywhere without getting lost
| in the weeds of parsing HTTP. For example FastAPI is directly
| a ASGI compliant framework, and it is known as one of the
| fastest Python web frameworks out there. Bottle I think is
| also a raw WSGI framework and its all in one file. (ASGI is
| what became the natural progression for WSGI, think of it
| like the http package Rust wants to standardize).
| strictfp wrote:
| The whole idea behind Node.js was to write a super-efficient
| completely nonblocking http server in C, while keeping all
| the business logic in a simple scripting language.
|
| You should not expect the Node.js parser to be simple.
| hdjjhhvvhga wrote:
| The fact that someone wrote a parser that's hard to follow
| doesn't mean that parsing HTTP/1.x is extremely difficult.
| What is really hard is to construct a parser that is at the
| same time (1) fast, (2) complete, (3) secure. It is much
| easier to choose just two, compare e.g. the one based on
| Nginx[0] vs picohttpparser [1].
|
| [0] https://github.com/Samsung/http-
| parser/blob/master/http_pars...
|
| [1] https://github.com/h2o/picohttpparser/blob/master/picohtt
| ppa...
| a-dub wrote:
| for the basics, however http/1.x is pretty simple. you can
| test webserver health by literally typing in the request.
|
| i suspect the complexity you speak of is similar to MIME.
| where SMTP/POP/IMAP are pretty simple, things got pretty
| hairy with the introduction of MIME, SASL and friends.
|
| i think, though, that most of the complicated stuff in http
| is optional, is it not? like if you don't send a header
| that compression is supported, the server won't compress...
| or am I misremembering?
|
| either way, simpler to understand from a packet capture
| than a grpc stream or spdy/http2 stream.
| jart wrote:
| Pretty much everything is optional if you stick to
| http/1.0. If you implement http/1.1 then you're required
| to do a lot of non-essential stuff like chunk encoding,
| pipelining, and provisionals which themselves are
| reasonably trivial too but they make the server code less
| elegant. If you want a protocol that's actually hard,
| implement SIP.
| jsjohnst wrote:
| I once heard that it's impossible to build a "spec
| compliant" IMAP4 library as the spec itself is
| contradictory. Don't have a reference to prove it, so I
| could be wrong.
| gberger wrote:
| I know this is just for fun and not intended for production use.
| But what could be potential exploits and vulnerabilities in this
| server?
| sneak wrote:
| GET ../../../etc/passwd
| asah wrote:
| sandbox it? e.g. docker, OpenBSD chroot?
| junon wrote:
| That's an environmental thing, the program itself can't
| protect against the class of attacks those sorts of
| environmental setups protect against.
| jcelerier wrote:
| that's not cross-platform tho. it should still be secure
| even if it was running on MS-DOS 5.0
| NieDzejkob wrote:
| Looks like you didn't even bother to test your claim.
| jpegqs wrote:
| I have provided protection against this.
| Someone wrote:
| I don't think it compiles on windows (netdb.h doesn't exist
| there, I think), so you're fine there, too, from a security
| viewpoint.
|
| However, if somebody did a quick and dirty "make it
| compile" port (include winsock2.h instead and, possibly,
| replace some functions/argument types), I think that would
| create security vulnerabilities because the _fopen_ on
| Windows might support using backslashes as path separators.
|
| Even if it doesn't, there's UNC paths (https://en.wikipedia
| .org/wiki/Path_(computing)#Universal_Nam...) to worry
| about.
|
| That made me wonder whether other OSes might have similar
| features. Reading https://pubs.opengroup.org/onlinepubs/007
| 904975/basedefs/xbd..., I'm not sure that forbids Unix from
| doing something similar. It says
|
| _"A pathname that begins with two successive slashes may
| be interpreted in an implementation-defined manner,
| although more than two leading slashes shall be treated as
| a single slash."_
|
| That opens the door for doing special things for paths that
| start with //, for example by supporting
| "//machine:foo/bar/baz" on clusters.
| jart wrote:
| GET %C0%AE%C0%AE/%C0%AE%C0%AE/%C0%AE%C0%AE/etc/passwd
| Matthias247 wrote:
| Regarding availabiltiy: It only handles a single connection,
| and has no timeouts. If someone just connects, and does nothing
| else, the server will be unavailable.
| SahAssar wrote:
| For the interested: this is called (or at least similar to) a
| slow loris attack:
| https://en.wikipedia.org/wiki/Slowloris_(computer_security)
| jpegqs wrote:
| I tried to make it secure and protect from such things. If
| someone finds vulnerabilities, please let me know.
| jijji wrote:
| use strncpy() instead of strcpy()
| jpegqs wrote:
| It's calculated that strcpy() should never cause a buffer
| overflow here.
| throwaway984393 wrote:
| You should still never use functions which have well
| known security flaws if there is a widely available
| alternative which avoids the flaws. Secure programming
| isn't just about calculating whether your current code
| has a bug, it's also about writing code that avoids bugs.
| astrobe_ wrote:
| Thank you Mr Weekend Secure Programming Expert.
| _strncpy()_ has equally dangerous semantics, though.
| jart wrote:
| strncpy() isn't dangerous. People have their heads so
| twisted around muh security that they don't even know
| what the function was intended to do. The purpose of
| strncpy() is to prepare a static search buffer so you can
| do things like perform binary search:
| static const struct People { char name[8];
| int age; } kPeople[] = { {"alice",
| 29}, // {"bob", 42}, // };
| int GetAge(const char *name) { char k[8];
| int m, l, r; l = 0; r =
| ARRAYLEN(kPeople) - 1; strncpy(k, s, 8);
| while (l <= r) { m = (l + r) >> 1;
| if (READ64BE(kPeople[m].s) < READ64BE(k)) {
| l = m + 1; } else if (READ64BE(kPeople[m].s)
| > READ64BE(k)) { r = m - 1; }
| else { return kPeople[m].age; }
| } return -1; }
|
| It was a really common practice back in the 70's and 80's
| when the function was designed for databases to use
| string fields of a specific fixed length.
| jancsika wrote:
| > strncpy() isn't dangerous
|
| Suppose the C specification said that string constants
| are automatically null terminated _unless_ they are a
| certain size that is platform-dependent. At that given
| size the null is not added. (And let 's say above that
| size there's a compiler error. Let's also say there's a
| pragma for telling the compiler you want a bigger limit
| on the maximum string constant size.)
|
| Would that behavior be dangerous in your opinion?
| arp242 wrote:
| If I look at the code as posted then "it uses strcpy
| instead of strncpy" is very low on the list of
| "problems".
|
| "Problems" in quotes because, you know, this is IOCCC
| entry. You're taking a joke way to serious.
| throwaway984393 wrote:
| The author literally _asked for security advice_ , and
| then ignored it. I'm trying to explain why one should not
| just ignore it. There's a lot of novice programmers who
| read these threads and might think it's perfectly fine to
| use strcpy (outside of IOCCC submissions). And by the
| way, who the hell cares about security vulns in IOCCC
| submissions anyway? It's not supposed to be secure, it's
| supposed to be obfuscated.
| jart wrote:
| I don't think anyone asked for free advice from a foul-
| mouthed anonymous throwaway on how to secure their
| computer. If I was building a website I'd want to secure
| it _from_ you not with you.
| [deleted]
| LinAGKar wrote:
| It's only that short because they've shoved a bunch of statements
| onto the same line.
| phoe-krk wrote:
| That's the whole point of IOCCC. The way code is formatted is
| as important as the way it functions.
| lmilcin wrote:
| It says "22 lines of C", not "22 statements of C".
|
| For this type of exercise it is assumed that some readability
| is going to be lost... just look at Perl golf competition.
| These tend to be written in a single line and it is not always
| given you are going to even be able to tell where statements
| start.
| 34qlgkaer wrote:
| Man I wish I could just read some software articles wihthout
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
|
| covid covid covid covid covid covid covid covid covid
| lifthrasiir wrote:
| Reminds me of 2001/cheong [1]. #include
| <stdio.h> int l;int main(int o,char **O, int
| I){char c,*D=O[1];if(o>0){ for(l=0;D[l ];D[l
| ++]-=10){D [l++]-=120;D[l]-= 110;while
| (!main(0,O,l))D[l] += 20; putchar((D[l]+1032)
| /20 ) ;}putchar(10);}else{ c=o+
| (D[I]+82)%10-(I>l/2)* (D[I-l+I]+72)/10-9;D[I]+=I<0?0
| :!(o=main(c/10,O,I-1))*((c+999 )%10-(D[I]+92)%10);}return
| o;}
|
| [1] https://www.ioccc.org/2001/cheong.hint
| ducktective wrote:
| `curl -s https://www.ioccc.org/2001/cheong.hint | nc
| termbin.com 9999`
|
| https://termbin.com/5yaq
|
| In short: for a 2n-digit input, returns the integer part of its
| square root (n-digits)
| codetrotter wrote:
| The ASCII-art formatted version is pretty nice looking.
|
| I was going to say that I don't however get why the "almost
| readable version" is weirdly formatted. But then I ran it through
| clang-format and it looks the same still and I saw that indeed
| it's because it's made to do lots of things on the same line and
| so it is not for lack of white space that it looks so messy.
|
| In conclusion, the "almost readable version" is exactly what it
| should be in this case.
| snet0 wrote:
| I was surprised to read that this is actually a totally valid
| HTTP/1.1 application, according to the RFC. The only thing you
| _need_ is the status line (http version, status code, status
| message, CRLF) and then the message body.
|
| Things sure have come a long way.
| kevinoid wrote:
| It's neat, but I don't believe it is a compliant implementation
| of HTTP/1.1 (or 1.0). For example, it does not handle percent-
| encoded characters in the request URI.[1][2]
|
| [1]:
| https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.1
|
| [2]: https://www.w3.org/Protocols/HTTP/1.0/spec.html#Request-
| URI
| deathanatos wrote:
| _Two_ CRLF pairs (one to terminate the status line, one to
| terminate the (empty) headers), which this is one CR short of.
| Trivially fixable, though it 'd mess up the P slightly...
| coderzx wrote:
| Great
| Galanwe wrote:
| Not sure what's so amazing here that it deserves to be on HN
| front-page.
|
| So basically it's a C program that reads "GET /<something>" from
| a socket and replies with the content of file <something> (with
| some random error handling) . Is it really that amazing that it
| fits in 22 lines of funky formatted C code...?
| bruce343434 wrote:
| Sure, IOCCC exercises are ones in futility, in the same way
| that breaking a speed running record achieves nothing real-
| world useful. But that doesn't mean it isn't spectacular and
| damn impressive.
| shric wrote:
| Aside from using small variable names and odd whitespace, it
| isn't particularly obfuscated.
| snet0 wrote:
| That's what I was going to say. If nondescript variable
| names and poor use of whitespace is obfuscation, a few of
| my friends could submit code they write every day.
| jpegqs wrote:
| This is what I am arguing about with another IOCCC winner.
| What can be called obfuscation, and where are its
| boundaries.
| Galanwe wrote:
| Come on there is no obfuscation here, you can literally read
| the code without issue. The only attempt seems to be 80*101
| for 8080.
| jpegqs wrote:
| It's cool if you can read code like this without issue. I'm
| chasing Kolmogorov complexity, rather than obfuscation. I
| add things like this to fill gaps in a specific shape.
| cpach wrote:
| Beauty is in the eye of the beholder
| Tempest1981 wrote:
| Or beautifully formatted:
|
| https://github.com/ilyakurdyukov/ioccc/blob/main/practice/20...
| 0des wrote:
| Come on man, it's Saturday. It's fine.
| exDM69 wrote:
| And it's a "Show HN" post.
|
| Pretty nice obfuscated C too. It's art, not serious.
| nuclearnice1 wrote:
| Everything is just dirt.
| cpach wrote:
| And anyone who ever played a part
|
| Oh, they wouldn't turn around and hate it
| [deleted]
| SV_BubbleTime wrote:
| On the line after the printf where it looks like they're getting
| status strings for returns... it looks like there is are three
| ternary options. Is that right? How does that work?
|
| https://pbs.twimg.com/media/E7mllyLXoAQmjbT?format=png&name=...
| jpegqs wrote:
| I can explain it: m = n ? /* if (n !=
| 0) */ /* adds index.html if path ends with "/" (means
| the filename is omitted), otherwise copies zero */
| strcpy(b+i-1,b[i-2]-'/'?"":"index.html"), /* log the
| requested filename to stdout */ printf("%s\n",b+5),
| /* if "/." is in the path or an error occurred while opening
| the file */ strstr(b,"/.")||!(f=fopen(b+5,"rb"))
| ? "404 Not Found" : "200 OK" : "501 Not
| Implemented"; /* if (n == 0) */
|
| By filtering filenames with "/." I prevent exploits with ".."
| and also don't allow to read files starting with a dot, these
| are hidden files in Unix-like OS.
| SV_BubbleTime wrote:
| Ah. I see, figured it might be that but it was tough to read.
|
| Ok, so sometimes I think I know C pretty well, then I'll see
| lunatic code like this and realize I Do Not! Thanks for the
| answer and reformat.
| formerly_proven wrote:
| What about "GET //etc/passwd"?
| NieDzejkob wrote:
| Just fired up the server and that does indeed break it. I
| suppose openat2 with RESOLVE_BENEATH and AT_FDCWD would be
| a bullet-proof fix, but that's not very codegolf.
| NieDzejkob wrote:
| Huh, any reason to use printf("%s\n",...) instead of puts?
| mianos wrote:
| Also, if you want to include #include <microhttpd.h> you can
| write a useful, safe (well tested), http server in a similar
| number of lines.
| secondcoming wrote:
| That would defeat the whole point of the post!
|
| But microhttpd is fine is you want a minimal server; its way of
| handling POST bodies is weird though.
| rijoja wrote:
| Why not #include<stdlib.h> and just run system("apache2")
| Koshkin wrote:
| Tried it, didn't work. (Now I want to try system(argv[0]) for
| some reason...)
| [deleted]
| [deleted]
___________________________________________________________________
(page generated 2021-07-31 23:00 UTC)