[HN Gopher] What they don't teach you about sockets
       ___________________________________________________________________
        
       What they don't teach you about sockets
        
       Author : zdw
       Score  : 231 points
       Date   : 2022-07-25 15:15 UTC (1 days ago)
        
 (HTM) web link (macoy.me)
 (TXT) w3m dump (macoy.me)
        
       | weeksie wrote:
       | Ah man, sockets. Real sockets. Yesterday's kiosk thread jogged my
       | memory about all that but almost all of the network communication
       | on my old kiosk installations was socket based. It was a pleasant
       | way to work.
        
       | Nezghul wrote:
       | I very often see "reconnect loops" in various codebases and I
       | wonder are they necessary? Wouldn't the same effect be achieved
       | by for example increasing timeouts or some other connection
       | parameter?
        
         | pravus wrote:
         | For TCP the state required to maintain the socket in the kernel
         | is invalidated on error and needs to be reset. The only way to
         | do this is to explicitly perform the connection setup again. An
         | extended timeout only delays this process since the remote side
         | will have invalidated its state as well.
         | 
         | UDP packets require no connection but you still might see some
         | sort of re-synchronization code to reset application state
         | which could be called "reconnect".
        
         | pgorczak wrote:
         | They're a bit of a feature of the connection-oriented nature of
         | TCP as the other reply mentions. If the server process crashes
         | and restarts for example, the client will be told that its
         | previous connection is not valid anymore. Basically TCP lets
         | client and server assume that all bytes put into the socket
         | after connect()/accept() will end up at the other side in that
         | same order. Each time there is an error that violates that
         | assumption, the connection needs to be explicitly "reset".
        
       | dehrmann wrote:
       | One of the big issues with TCP is a lot of communication isn't
       | well-suited to the stream model and is message-oriented, so yes,
       | like the author says, you have to go implement ACKs. And then you
       | want multiplexing, which streams also fail at. Before you know
       | it, you built a worse version of HTTP2.
        
         | jdthedisciple wrote:
         | Naive questions but doesn't QUIC solve those problems?
        
           | kevin_thibedeau wrote:
           | SCTP solved them. We're just rarely allowed to use it because
           | the internet is broken.
        
             | lxgr wrote:
             | Yes, it's a bummer that we have to encapsulate it in UDP,
             | but when using it like that, it works quite well in my
             | experience.
        
           | cogman10 wrote:
           | HTTP2 solves these problems. HTTP3 (QUIC) solves the head of
           | line blocking problem of TCP. That is, a packet getting
           | corrupted or lost causes TCP to hold up other packets from
           | going out until that packet can be successfully delivered. So
           | you may be multiplexed in your messages, but you end up with
           | a slowdown and backlog of data that could have been
           | successfully received and interpreted but for the lost
           | packet.
        
           | dcsommer wrote:
           | QUIC is HTTP/3 more or less and solves the same problems,
           | yes.
        
         | iTokio wrote:
         | And then you realize that sometimes it has become even worse
         | due to tcp head of line blocking and you move to udp and build
         | a worse version of http3
        
           | hinkley wrote:
           | I think Head of Line blocking is a more concrete problem but
           | it seems that it's often encountered when running away from a
           | different one:
           | 
           | If you have different conversations going on between a single
           | client and the server, you can make several connections to do
           | it. You'll pay a little bit more at the start of the
           | conversation of course, so be frugal with this design, and
           | think about ways to 'hide' the delay during bootstrapping.
           | But know that with funky enough network conditions, it's
           | possible for one connection to error out but the other to
           | continue to work.
           | 
           | The problem for me always comes back to information
           | architecture, and the pervasive lack of it, or at least lack
           | of investment in it. If you really have two separate
           | conversations with the server, losing one channel shouldn't
           | make the whole application freak out. But we all know people
           | take the path of least resistance, and soon you have 2.5
           | conversations on one channel and 1.5 on the other.
           | 
           | The advice here is some of the same advice in more
           | sophisticated treatises on RESTful APIs (it's all distributed
           | computing, it's all the same problem set arranged in
           | different orders). REST calls are generally supposed to be
           | stateless. The client making what looks like a non sequitur
           | call to the backend should Just Work. If you manage that,
           | then having one channel inform the client of the existence of
           | a resource and fetching that resource out of band isn't
           | really coupling of the two channels. That subtlety is lost on
           | some people who defend their actual coupling by pointing out
           | that other people have done it too, when no, they really
           | haven't. And if anything, multiplexing lets them get away
           | with this bad behavior for much longer, allowing the bad
           | patterns to become idiomatic.
        
         | danesparza wrote:
         | If you insist on passing messages, use a well-designed message
         | queue service and don't rebuild the wheel (but just a little
         | shittier).
        
           | pclmulqdq wrote:
           | A message queue service, or an RPC stack, adds a tremendous
           | amount of overhead to a system. This is part of why computers
           | are 200x faster than they were 20 years ago, but the
           | performance feels the same.
           | 
           | HTTP works reasonably well on TCP, but a lot of what we want
           | to do is better suited to a reliable UDP protocol.
           | Unfortunately, routers often balk at UDP packets, so TCP it
           | is.
        
           | gopalv wrote:
           | > use a well-designed message queue service
           | 
           | There is a faq about ZeroMQ written in the Monty Pythonesque
           | fashion of "other than that, what does ZeroMQ do for us?"
           | 
           | I can't quite find it in my bookmarks, but it went by the "so
           | it gives you sockets", "but also message batching" ... "but
           | other than batching, what does it give us?".
           | 
           | Also, the whole problem with using just TCP is that often it
           | needs kernel level tuning - like you need to fix INET MSL on
           | some BSD boxes to avoid port exhaustion or tweak DSACK when
           | you have hanging connections in the network stack (like S3
           | connections hanging when SACK/DSACK is on).
           | 
           | A standard library is likely to have bugs too, but hopefully
           | someone else has found it before you run into it.
        
       | jchw wrote:
       | > I will also ask: how often are you writing applications that
       | want to accept either files or sockets?
       | 
       | In Go where there is an io.Reader/io.Writer abstraction where
       | blocking interactions are OK and you're absolutely intended to
       | handle all of the errors, it's really no problem at all to use a
       | socket where you'd use a file.
       | 
       | Unfortunately, this only works because the abstraction handles it
       | well. You can't really do custom things with file descriptors, so
       | the amount of useful things you can do by treating sockets as
       | files is quite limited (though it certainly exists.)
       | 
       | (I was going to say "TLS for example" though come to think of it
       | this isn't even strictly true under Linux;
       | https://docs.kernel.org/networking/tls.html )
       | 
       | Still, having TCP sockets and files at the same abstraction layer
       | in general isn't all that bad; you should consider that writes to
       | the filesystem could fail and do take time when blocking. When
       | done right, this makes it much easier for apps to become
       | transparent to the network and other media when it should be
       | possible.
        
         | jcranmer wrote:
         | > (I was going to say "TLS for example" though come to think of
         | it this isn't even strictly true under Linux;
         | https://docs.kernel.org/networking/tls.html )
         | 
         | TLS isn't that much harder to handle than regular TCP sockets--
         | you're going to need a socket interface that lets you get
         | reader and writer streams, and the extra roundtrips you need
         | for negotiation are handled in the constructor for the TLS
         | socket.
         | 
         | It is more difficult if you want to support more advanced
         | features of TLS, or especially if you want to support something
         | like STARTTLS (negotiate TLS on an already-open socket). But
         | this is already kind of true for sockets in general: the
         | reader/writer abstraction breaks down relatively quickly if you
         | need to do anything smarter than occasionally-flushed streams.
        
           | jchw wrote:
           | I think my original post was fairly unclear of what I meant.
           | 
           | See, what I meant was like this. File descriptors are an OS
           | abstraction; the "backends" you get are defined in the
           | kernel. You _can 't really_ do custom behavior with FDs; for
           | example, in Go, the Go TLS library can open a connection and
           | return an io.Writer, and when you write, it will be
           | symmetrically encrypted, transparent to you, as if the code
           | spoke TLS. But when you're dealing with raw file descriptors,
           | and the read and write syscalls, there's no way to make
           | 'custom' read and write handlers, like you can with
           | programming language abstractions.
           | 
           | (I do acknowledge that you could in fact do some of this with
           | pipes, though I have seldom seen it used this way outside of
           | shell programming. It's kinda missing the point, anyway,
           | since pipes are just another type of fd. You can do pipes on
           | top of Go's abstraction too, but it would be very cumbersome
           | in comparison.)
           | 
           | But as a kind of quirk, Linux actually _does_ support TLS
           | sockets. It will not do the handshake, so you still have to
           | do that in userspace. But if you use the Linux kernel's TLS
           | socket support, it will in fact give you an FD that you can
           | read from and write to directly with transparency, as if it
           | was any other file or socket; you don't have to handle the
           | symmetric cryptography bits or use a separate interface. I
           | think this is rather neat, although I'm not sure how
           | practically useful it is. (Presumably it would be useful for
           | hardware offloading, at least.)
        
         | jerf wrote:
         | "Unfortunately, this only works because the abstraction handles
         | it well. You can't really do custom things with file
         | descriptors, so the amount of useful things you can do by
         | treating sockets as files is quite limited"
         | 
         | In general, most code has a clear initialization step where it
         | sets up everything it wants to set up, then you can pass it to
         | something that only expects a io.Reader/io.Writer and it can
         | operate. I have a couple of places where one way of getting
         | something does an HTTP request, but another way opens an S3
         | file for writing, and yet another way opens a local disk file.
         | Each of them has their own completely peculiar associated
         | configuration and errors to deal with, but once I've got the
         | stream cleanly I pass of an io.WriteCloser to the target
         | "payload" function.
         | 
         | If you're doing _super_ intense socket stuff you may need to
         | grow the interface, or even just plain code against sockets
         | directly. But most application-level stuff, even complicated
         | stuff like  "Maybe I'm submitting a form and maybe I'm writing
         | to S3 and maybe I'm writing to disk and maybe I'm writing to a
         | multiplexed socket and maybe I'm doing more than one of these
         | things at once" can be cleanly broken into a "initialize the
         | io.Reader/io.Writer, however complicated it may be" phase and a
         | "use the io.Reader/io.Writer in another function that doesn't
         | have to worry about the other details" phase. It is also highly
         | advantageous to be able to pass a memory buffer to the latter
         | function to test it without having to also try to figure out
         | how to fake up a socket or a file or whatever.
         | 
         | People don't write applications that accept either files or
         | sockets because in most languages there is one impediment or
         | another to the process; a system that _almost_ makes file-like
         | objects share an interface but in practice not really,
         | libraries that force you to pass them strings rather than file-
         | like objects, etc. While it isn 't attributable to Go _qua_ Go,
         | by getting it right in the standard library early Go
         | legitimately is _really_ good at this sort of thing, more
         | because the standard library set the tone for the rest of the
         | ecosystem than because of any unique language features. I hear
         | Rust is good too, which I can easily believe. Every time I try
         | to use Python to do this, I 'm just saddened; it _ought_ to
         | work, it _ought_ to be easy, but something always goes wrong.
        
           | jchw wrote:
           | The way I see it, Go did two things that made it work well:
           | 
           | - Have a very simple, fairly well-defined interface for
           | arbitrary read/write streams. This interface needs to have
           | decent characteristics for performance, some kind of error
           | handling, and a way to deal with blocking. Go's interface
           | satisfies all three.
           | 
           | - Have a good story for async in general. It's not really
           | helpful to have an answer for how to deal with I/O blocking
           | if the answer is really crappy and nobody actually wants to
           | use it. A lot of older async I/O solutions felt very much in
           | this camp.
           | 
           | I think that Rust does a pretty decent job, though I'm a
           | little bearish on their approach to async. (Not that I have
           | any better ideas; to the contrary, I'm pretty convinced that
           | Rust async is necessarily a mess and there's not much that
           | can be done to make it significantly better.)
           | 
           | But I think you can actually do a decent job of this even in
           | traditional C++, in the right conditions. In fact, I used
           | QIODevice much the way one would use io.Reader/io.Writer in
           | Go. It was a little more cumbersome, but the theory is much
           | the same. So I think it's not necessarily that Go did
           | something especially well, I think the truth is actually
           | sadder: I think most programming languages and their standard
           | libraries just have a terrible story around I/O and
           | asynchronicity. I do think that the state of the art has
           | gotten better here and that future programming languages are
           | unlikely to not attempt to solve this problem. So at least
           | there's that.
           | 
           | The truth is that input and output is unreliable, limited and
           | latent by the nature of it. You can ignore it for disk
           | because it's _relatively_ fast. But at the end of the day,
           | the bytes you 're writing to disk need to go through the
           | user/kernel boundary, possibly a couple times, to the
           | filesystem, most likely asynchronously out of the CPU to the
           | I/O controller, to the disk which likely buffers it, and then
           | finally from the disk's buffers to its platters or cells or
           | what have you. That's a lot of stuff going on.
           | 
           | I think it's fair to say that "input and output" in this
           | context means "anything that goes out of the processor." For
           | example, storing data in a register would certainly not be
           | I/O. Memory/RAM is generally excluded from I/O, because it's
           | treated as a critical extension of the CPU and sometimes
           | packaged with it anyway; it's fair for your application (and
           | operating system) to crash violently if someone unplugs a
           | stick of RAM.
           | 
           | But that reality is not extended almost anywhere else. USB
           | flash drives can be yanked out of the port at any time, and
           | that's just how it goes; all buffers are going to get dropped
           | and the state of the flash drive is just whatever it was when
           | it was yanked, roughly. USB flash drives are not a special
           | case. Hell, you can obviously hotplug ordinary SSDs and HDDs,
           | too, even if you wouldn't typically do so in a home computer.
           | 
           | So is disk I/O seriously that different from network I/O?
           | It's "unreliable" (relative to registers or RAM). It's "slow"
           | (relative to registers or RAM). It has "latency" (relative to
           | registers or RAM). The difference seems to be the degree of
           | unreliability and the degree of latency, but still. Should
           | you treat a `write` call to disk differently than a `write`
           | call to the network? I argue not very much.
           | 
           | I don't really know 100% why the situation is bad with
           | Python, but I can only say that I don't really think it
           | should've been. Of course, hindsight is 20:20. It's probably
           | a lot more complicated than I think.
        
       | deathanatos wrote:
       | > _How you handle a file no longer existing vs. a socket
       | disconnection are not likely to be very similar. I 'm sure I'll
       | get counter arguments to this,_
       | 
       | That's my cue! I think the "everything is a file" is somewhat
       | misunderstood. I might even rephrase it as "everything is a file
       | descriptor" first, but then if you need to give a _name_ to it,
       | or ACL that thing, that 's what the file-system is for: that all
       | the objects share a common hierarchy, and different things that
       | need names can be put in say, the same directory. I.e., that
       | there is _one_ hierarchy, for pipes, devices, normal files, etc.
       | 
       | I'd argue that the stuff that "is" a file (named or anonymous) or
       | a file descriptor is actually rather small, and most operations
       | are going to require knowing what kind of file you have.
       | 
       | E.g., in Linux, read()/write() behave differently when operating
       | on, e.g., an eventfd, read() requires a buffer of at least 8 B.
       | 
       | Heck, "close" might really be the only thing that makes sense,
       | generically. (I'd also argue that a (fallible1) "link" call
       | should be in that category, but alas, it isn't on Linux, and
       | potentially unlink for similar reasons -- i.e., give and remove
       | names from a file. I think this is a design mistake personally,
       | but Linux is what it is. What I think is a bigger mistake is
       | having separate/isolated namespaces for separate types of
       | objects. POSIX, and hence Linux, makes this error in a few
       | spots.)
       | 
       | But if you're just writing a TCP server, yeah, that probably
       | doesn't matter.
       | 
       | > _and that you should write your applications to treat these the
       | same._
       | 
       | But I wouldn't argue that. A socket likely is a socket for some
       | very specific purpose (whatever the app _is_ , most likely), and
       | that specific purpose is going mean it will be treated as that.
       | 
       | In OOP terms, just b/c something might _have_ a base class doesn
       | 't mean we always treat it as only an instance of the base class.
       | Sometimes the stuff on the derived class is _very_ important to
       | the function of the program, and that 's fine.
       | 
       | 1e.g., in the case of an anonymous normal file (an unlinked temp
       | file, e.g., O_TMPFILE), attempting to link it on a different
       | file-system would have to fail.
        
       | quickthrower2 wrote:
       | Regarding his original expectations of TCP. Even if true, is
       | there much difference between dropped data and data being
       | delivered hours late? I imagine at an app level you would suspect
       | any message that got sent 12 hours ago but kept in a queue.
       | 
       | I imagine if that scenario is Ok you would explicitly use a queue
       | system.
        
       | neilobremski wrote:
       | This article may as well have been about writing to file streams
       | does not block until a read occurs. Maybe the documentation (on
       | sockets?!) could be more clear but at some point more words don't
       | help with conceptual understanding.
        
       | hinkley wrote:
       | The first bit of real code I ever wrote was an SO_LINGER bug fix
       | for a game that couldn't restart if users had disconnected due to
       | loss of network.
       | 
       | Then I had to explain it to several other people who had the same
       | problems. Seems a lot of copy and paste went on among that
       | community.
        
       | andai wrote:
       | I played around with making a little websocket layer for browser
       | games a while back. I used a Windows tool called "clumsy" to
       | simulate crappy network conditions and was surprised at how poor
       | the results were with just WebSockets, despite all the overhead
       | of being a protocol on top of a protocol. The result is that you
       | need to build a protocol on top of the protocol on top of the
       | protocol if you actually want your messages delivered...
        
         | swagasaurus-rex wrote:
         | I built a javascript data synchronization library specifically
         | for games
         | 
         | https://github.com/siriusastrebe/jsynchronous
         | 
         | A core part of the library is the ability to re-send data
         | that's been lost due to connection interruption. Absolutely
         | crucial for ensuring data is properly synchronized.
        
         | tuukkah wrote:
         | That's because WebSockets are more or less just sockets for web
         | apps. You'lll want to use a protocol that deals with messages
         | and their at-least-once delivery, such as MQTT (that can run on
         | top of WebSocket if you need it).
        
       | gwbas1c wrote:
       | > If I send() a message, I have no guarantees that the other
       | machine will recv() it if it is suddenly disconnected from the
       | network. Again, this may be obvious to an experienced network
       | programmer, but to an absolute beginner like me it was not.
       | 
       | Uhh, that's 100% obvious. That's why it's not taught.
       | 
       | Of course, I've also goofed on things that are obvious to other
       | people. But, com'on, TCP isn't magic.
        
         | dragon96 wrote:
         | It could be obvious, but no need to be condescending about it.
         | 
         | There are many statements that fit into the category of
         | "obvious once stated, but not obvious if you didn't consider
         | the distinction to begin with".
        
         | tptacek wrote:
         | It's not 100% obvious. The mental model where send() blocks
         | until recv() on the other side confirms it is coherent: the
         | receiver sends ACKs with bumped ack numbers to acknowledge the
         | bytes it's received, and could delay those ACKs until the
         | application has taken the bytes out of the socket buffer. It
         | doesn't work that way, of course, and shouldn't, but it could.
        
         | hinkley wrote:
         | If it were so obvious we wouldn't have so many concurrency bugs
         | that appear time and again in new programs. If it's not network
         | flush it's file system flush.
        
         | jsmith45 wrote:
         | > Uhh, that's 100% obvious. That's why it's not taught.
         | 
         | Is it? You probably feel that way from knowing how TCP works.
         | But it would be quite straightforward to make it true with a
         | slightly modified version of TCP (that acknowledges all
         | packets, rather than every second one) by having send() block
         | until it receives back the ACK from the receiver. (Yeah this
         | would kill transmission rates, but it would function!) And
         | furthermore, while it would be a terrible idea, the ACK could
         | even be delayed until the end application makes the recv() call
         | for the packet.
         | 
         | To somebody not familiar with the details (and why this would
         | be a terrible tradeoff), something along those lines would be
         | entirely plausible.
        
         | [deleted]
        
       | 10000truths wrote:
       | Perhaps there should exist a flag for send() that would make it
       | so that it doesn't return until all data in the send() call has
       | been ACKed by the receiving side (with a user configurable
       | timeout).
       | 
       | Of course, it's still not bulletproof. The other side can receive
       | the packets, stuff them in its receive buffer, send an ACK for
       | those packets, and then fail before draining the receive buffer
       | due to an OS crash or hardware failure. But computers and
       | operating systems tend to be much more reliable than networks, so
       | it would still provide a much stronger guarantee of delivery or
       | failure.
        
         | tenebrisalietum wrote:
         | There's a difference between ACK of the remote network stack
         | (yeah, we got your packets, they're waiting in line) and ACK of
         | the application (yeah, app X processed your requests composed
         | of 1 or more packets)
         | 
         | Compare with the classic OS optimization for spinning rust hard
         | drives - write system calls will return immediately, but actual
         | write requests to the hardware will occur sometime later. It's
         | assumed most of the time your computer doesn't lose power, but
         | that does happen sometimes, hence journaling.
        
         | noselasd wrote:
         | Much stronger but of limited use still, for the reasons you
         | listed.
         | 
         | You very rarely care that the remote tcp/ip stack has acked the
         | message, you care that the messages has been received by the
         | program and processed - You're better off implementing your own
         | acks in those cases, allowing you to report back any errors in
         | that ack as well. Or you don't really care, and can just fire
         | and forget those messages.
         | 
         | And that also allows you to implement a system where you can
         | pipeline messages - waiting for remote acks when allowing just
         | 1 message in flight limits your throughput severely.
        
       | Uptrenda wrote:
       | It seems like the OP is mostly talking about 'blocking' sockets.
       | Such sockets return when they're ready or there's an error. So
       | send returns when its passed off its data to the network buffer
       | (or if its full it will wait until it can pass off SOME data.)
       | You might think that sounds excellent - but from memory send may
       | not send all of the bytes you pass to it. So if you want to send
       | out all of a given buffer with blocking sockets - you really need
       | to write a loop that implements send_all with a count of the
       | amount of bytes sent or quit on error.
       | 
       | Blocking sockets are kind of shitty, not gonna lie. The
       | counterpart to send is recv. Say you send a HTTP request to a web
       | server and you want to get a response. With a blocking socket its
       | quite possible that your application will simply wait forever. I
       | am p sure the default 'timeout' for blocking sockets is 'None' so
       | it just waits for success or failure. So a shitty web server can
       | make your entire client hang.
       | 
       | So how to solve this?
       | 
       | Well, you might try setting a 'timeout' for blocking operations
       | but this would also screw you. Any thread that calls that
       | blocking operation is going to hang that entire time. Maybe that
       | is fine for you -- should you design program to be multi-threaded
       | and pass off sockets so they can wait like that -- and that is
       | one such solution.
       | 
       | Another solution -- and this is the one the OP uses -- is to use
       | the 'select' call to check to see if a socket operation will
       | 'block.' I believe it works on 'reading' and 'writing.' But wait
       | a minute. Now you've got to implement some kind of performant
       | loop that periodically does checks on your sockets. This may
       | sound simple but its actually the subject of whole research
       | projects to try build the most performant loops possible. Now
       | we're really talking about event loops here and how to build
       | them.
       | 
       | So how to solve this... today... for real-world uses?
       | 
       | Most people are just going to want to use asynchronous I/O. If
       | you've never worked with async code before: its a way of doing
       | event-based programming where you can suspend execution of a
       | function if an event isn't ready. This allows other functions to
       | 'run.' Note that this is a way to do 'concurrency' -- or switch
       | between multiple tasks. A good async library may or may not also
       | be 'parallel' -- the ability to execute functions simultaneously
       | (like on multiple cores.)
       | 
       | If we go back to the idea of the loop and using 'select' on our
       | socket descriptors. This is really like a poor-persons async
       | event loop. It can easily be implemented in a single thread, in a
       | single core. But again -- for modern applications -- you're going
       | to want to stay away from using the socket functions and go for
       | async I/O instead.
       | 
       | One last caveat to mention:
       | 
       | Network code needs to be FAST. Not all software that we write
       | needs to run as fast as possible. That's just a fact and indeed
       | many warn against 'premature optimization.' I would say this
       | advice doesn't bode well for network code. It's simply not
       | acceptable to write crappy algorithms that add tens of
       | milliseconds or nanoseconds to packet delivery time if you can
       | avoid it. It can actually add up to costs a lot of money and make
       | certain applications impossible.
       | 
       | The thing is though -- profiling async code can be hard --
       | profiling network code even harder. A network is unreliable and
       | to measure run-time of code you only care about how the code
       | performs when its successful. So you're going to want to find
       | tools that let you throw away erroneous results and measure how
       | long 'coroutines' actually run for.
       | 
       | Async network code may underlyingly use non-blocking sockets,
       | select, and poll. But they are designed to be as efficient as
       | possible. So if you have access to using them its probably what
       | you want to use!
        
         | PeterWhittaker wrote:
         | select(...)
         | 
         | ?
         | 
         | Why not                 EPOLLRDHUP          ?
        
       | larsonnn wrote:
       | The title is confusing. I did learn it that way...
        
       | TYPE_FASTER wrote:
       | What really helped me understand and troubleshoot network
       | communications at the socket level was the book by W. Richard
       | Stevens[0]. I think this is because he starts with an
       | introduction to TCP, and builds from there. Knowing TCP states
       | and transitions is important to reading packet dumps, and in
       | general debugging networking.
       | 
       | Also, no matter what platform you're using, there is probably a
       | socket implementation somewhere at the core. You're best off
       | understanding how sockets work, then understanding how the
       | platform you're working with uses sockets[1].
       | 
       | Once I read the W. Richard Stevens book, I was able to read and
       | understand RFCs for protocols like HTTP to know how things should
       | work. Then you're better prepared to figure out if a behavior is
       | due to your code, or an implementation of the protocol in your
       | device, or the device on the other end of the network connection,
       | or some intermediary device.
       | 
       | [0] - http://www.kohala.com/start/unpv12e.html
       | 
       | [1] - https://docs.microsoft.com/en-
       | us/windows/win32/winsock/porti...
        
         | waynesonfire wrote:
         | curious whether the book help w/ understanding routing and
         | firewalls?
        
           | TYPE_FASTER wrote:
           | Yes. To really understand routing and firewalls, I'd suggest
           | implementing a firewall for yourself using Linux or OpenBSD.
           | I call out OpenBSD because when I went through that process
           | years ago, OpenBSD was lightweight from resource and
           | complexity perspectives, so I could focus on network
           | configuration. There might be better options with Linux these
           | days, it's been a long time since I looked.
        
           | tptacek wrote:
           | Very yes on both, especially with firewalls.
        
         | lanstin wrote:
         | We had to debug an issue with ARP during a deploy of docker
         | containers a year or two ago. A nearly complete understanding
         | of the lower layers is not that much knowledge and quite useful
         | at times.
        
           | 5e92cb50239222b wrote:
           | "Nearly complete"? No matter how deep your knowledge is,
           | you're only scratching the surface. To pick a small example,
           | there's something like 20 TCP congestion avoidance algorithms
           | alone (many are available in the mainline Linux kernel, and
           | they can be picked depending on the task and network at
           | hand), and I believe it took something like a decade of
           | research, trial and error to solve the bufferbloat problem.
           | 
           | https://en.wikipedia.org/wiki/TCP_congestion_control
           | 
           | https://lwn.net/Articles/616241/
        
             | hn_go_brrrrr wrote:
             | Only CUBIC and BBR seem to have any meaningful usage,
             | however: https://www.comp.nus.edu.sg/~ayush/images/sigmetri
             | cs2020-gor...
        
               | icedchai wrote:
               | I tried out BBR on a system that had a ton of packet
               | loss. It made a tremendous difference!
        
             | n_kr wrote:
             | > No matter how deep your knowledge is, you're only
             | scratching the surface.
             | 
             | I understand this is just emphasis, but no, its not magic,
             | its not innate ability, its just software man! If you have
             | dug deep enough, and understood it, that's it. Key phrase
             | is IMO 'understood', but that's universal.
        
           | sophacles wrote:
           | I agree with your main point - a decent working knowledge
           | down-stack really helps with network stuffs.
           | 
           | "nearly complete" strikes me as a whole lot of hubris though
           | - I've spent almost 20 years at a career that can be summed
           | up as "weird stuff with packets on linux", and the only thing
           | that I'm certain of is: every time I think I'm starting to
           | get a handle on this networking thing, the universe shows me
           | that my assessment of how much there is to know was off by at
           | least an order of magnitude.
        
             | macintux wrote:
             | I once interviewed someone with "Internet guru" on his
             | resume. I advised him there were only a handful of people
             | I'd consider for that title.
        
         | gwright wrote:
         | I had the privilege of working for Rich as a junior developer
         | and then later with him as a co-author. He was my mentor and
         | friend. Seeing messages like this over 30 years later really
         | makes my day and reminds me how much I miss him.
        
         | Dowwie wrote:
         | for those unfamiliar with C, if anyone can recommend sufficient
         | reference material to understand the C examples presented in
         | the book, that would be helpful
        
           | wruza wrote:
           | The most confusing part should be "inheritance" of sockaddr
           | structs, afair (the rest should read as if it was
           | javascript). If this is it, then be sure that no magic
           | happens there, structure quirks matter much less than the
           | values they are filled with.
        
           | TYPE_FASTER wrote:
           | Some languages, like Python[0], have a low-level socket
           | library that maps pretty closely to the C API used in the
           | book.
           | 
           | [0] https://docs.python.org/3/library/socket.html
        
           | throwaway1777 wrote:
           | You're opening a can of worms but Kernigan and Ritchie wrote
           | the canonical book on C and you cant go wrong with that.
        
             | quietbritishjim wrote:
             | It's with stressing how different the K&R book is from
             | books in the past that introduce a programming language.
             | Usually those sort of books are fairly long and you end up
             | with a decent but incomplete understanding.
             | 
             | With K&R, it's quite short, it's an easy going read, and
             | yet when you get to the end you've covered the _whole_ of C
             | (even the standard library!).
        
         | hinkley wrote:
         | There were many authors in that era who had practically
         | compulsory books. Stevens is one of the few to have two. His
         | Unix Programming book was referred to as the Bible.
        
           | sophacles wrote:
           | I still refer back to APUE often and push my junior folks to
           | read it.
        
       | dimman wrote:
       | Interesting read. I'm quite curious of where all the initial
       | misperceptions about sockets comes from.
       | 
       | I can highly recommend Beej's guide to network programming:
       | https://beej.us/guide/bgnet/
       | 
       | That together with Linux/BSD man pages should be everything
       | needed, some great documentation there.
        
         | pgorczak wrote:
         | I definitely used to think TCP was more "high-level" than it
         | actually is. Yes it does much more than UDP but still, its job
         | is to get a sequence of bytes from A to B. You can tune it for
         | higher throughput or more sensitive flow control but anything
         | concerning message passing, request/response, ... is beyond the
         | scope of TCP.
        
       | opportune wrote:
       | If I could make a meta point, nothing about software is magic
       | (except the fact it works at all). Computers do exactly as they
       | are told and nothing more.
       | 
       | If you're using some underlying technology you should really know
       | how it works at some level so you can understand what assumptions
       | you can make and which you can't. TCP doesn't mean there is never
       | data loss.
        
         | dylan604 wrote:
         | Except when literal bugs short out your system, or actual
         | cosmic rays bit flip your data, or any plethora of other things
         | that can cause your "correctly written" code to not function as
         | expected.
         | 
         | That's when the magicians come out and do their thing
        
       | gabereiser wrote:
       | This was a good write up but was kind of short of details. It
       | would have been really awesome with some code examples. I love
       | socket programming. Love, like I love going to the dentist.
       | 
       | Having a state machine to handle basic I/O between client/server
       | kind of blows when it comes to plenko machine switch statements
       | of arbitrary state enums. Languages like Go or Python allow you
       | to have a way of communicating with the client from the server in
       | a more direct client->server oriented way. Write/Read input, do
       | thing, write, read, do, repeat until you finish ops.
       | 
       | Go is my favorite for this as I can spawn a goroutine to
       | read/write to a socket. Rust is my second favorite for this but
       | it's a bit trickier. Python's twisted framework has something
       | like this with Protocols. I wish C++ had a standard socket
       | implementation (std::network?)
       | 
       | Anyway, this gave me a smile today so thanks.
        
       | zwieback wrote:
       | In the early 90s I was working on AppleTalk-PC interoperability
       | SW, which included "ATP", the AppleTalk Transaction Protocol,
       | which is kind of a guaranteed delivery protocol. It even had "at
       | least once" and "exactly once" options to make sure state
       | wouldn't get messed up in client-server applications:
       | https://developer.apple.com/library/archive/documentation/ma...
       | 
       | I was kind of sad to see that things like this fell by the
       | wayside in favor of streaming protocols but of course it's way
       | more efficient to not handle each transaction individually.
       | 
       | The problem of users implementing TCP++ on top of TCP is a real
       | problem, I think.
        
         | NegativeLatency wrote:
         | Was there a driver or library for ATP or were you implementing
         | that on PCs?
        
           | zwieback wrote:
           | No, we were the ones writing the PC drivers, for DOS, Windows
           | and OS/2. Back then the lower layers were more or less
           | standardized but there was no AppleTalk stack so that's what
           | we developed as a product.
        
         | bombela wrote:
         | I just want to point out that "exactly-once" is not physically
         | possible. The documentation claims "exactly-once", and proceeds
         | to explain that after a while, it will timeout. Which makes it
         | "at-most-once", as expected.
         | 
         | As far as I know; in communication; you can only have at-least-
         | once (1..N) or at-most-once (0..1).
        
       | neonroku wrote:
       | The following sentence in the article jumped out at me: "The
       | difference is that you are not dealing with the unreliability of
       | UDP like TCP is." This reads to me like TCP is built on top of
       | UDP, which at one time I thought to be the case, but it's not.
       | UDP and TCP are both transport layer, built on the internet
       | layer, which is unreliable.
        
       | enriquto wrote:
       | > How you handle a file no longer existing vs. a socket
       | disconnection are not likely to be very similar.
       | 
       | Why not? It seems that fopen(3) and fread(3) provide the perfect
       | abstraction for that. The semantics to remove(3) a file that is
       | open are very clear, and they represent exactly what you want to
       | happen when a connection is lost.
       | 
       | I never understood the need for "sockets" as a separate type of
       | file. Why can't they be just be exposed as regular files?
        
       | [deleted]
        
       | psyc wrote:
       | I'm pleased to hear that anyone is still teaching anyone anything
       | about sockets.
       | 
       | The bit about Windows not treating sockets as files made me
       | pause, since Windows does treat so many things as files. After
       | thinking about it some, I suppose it's kernel32 that treats
       | kernel32 things as files. Winsock has a separate history.
        
         | amluto wrote:
         | Windows has a long and unfortunate history of encouraging
         | programs to extend system behavior by injecting libraries into
         | other programs' address spaces. You can look up Winsock LSPs
         | and Win32 hooks [0] for examples. This means that programs
         | cannot rely on the public APIs actually interacting with the
         | kernel in the way one would natively imagine -- the
         | implementation may be partially replaced with a call into user
         | code from another vendor in the same progress. Eww!
         | 
         | So, as I recall, a _normal_ socket is a kernel object, but a
         | third party LSP-provided socket might not be. This also means
         | that any API, e.g. select, that may interact with more than one
         | socket at once has problems. [1]
         | 
         | [0] https://docs.microsoft.com/en-
         | us/windows/win32/api/winuser/n... -- see the remarks section.
         | 
         | [1] https://docs.microsoft.com/en-
         | us/windows/win32/api/Winsock2/... -- again, see the remarks.
        
         | xeeeeeeeeeeenu wrote:
         | It's complicated. Sockets are _mostly_ interchangeable with
         | filehandles but there are many exceptions. For example,
         | ReadFile() works with sockets, whereas DuplicateHandle()
         | silently corrupts them.
         | 
         | However, there's another problem: overlapped vs non-overlapped
         | handles. socket() always creates overlapped sockets, while
         | WSASocket() can create both types. Overlapped handles can't be
         | read synchronously with standard APIs, which in turn means you
         | can't read() a fd created from an overlapped handle.
         | 
         | Naturally, in their infinite wisdom, Windows designers decided
         | there's no need to provide an official API to introspect
         | handles, so there's no documented way to tell them apart (there
         | are unofficial ways, though). BTW, it's a proof of poor
         | communication between teams in Microsoft, because their C
         | runtime (especially its fd emulation) would greatly benefit
         | from such an API.
         | 
         | It's frustrating. I'm sure that if Windows was an open-source
         | project, that mess would be fixed immediately.
        
       ___________________________________________________________________
       (page generated 2022-07-26 23:00 UTC)