[HN Gopher] Unix Domain Sockets vs Loopback TCP Sockets (2014)
___________________________________________________________________
Unix Domain Sockets vs Loopback TCP Sockets (2014)
Author : e12e
Score : 112 points
Date : 2023-09-11 12:51 UTC (9 hours ago)
(HTM) web link (nicisdigital.wordpress.com)
(TXT) w3m dump (nicisdigital.wordpress.com)
| c7DJTLrn wrote:
| A lot of modern software disregards the existence of unix
| sockets, probably because TCP sockets are an OS agnostic concept
| and perform well enough. You'd need to write Windows-specific
| code to handle named pipes if you didn't want to use TCP sockets.
| giovannibonetti wrote:
| I imagine there should be some OS-agnostic libraries somewhere
| that handle it and provide the developer a unified interface.
| eptcyka wrote:
| Yes, but there's NamedPipes and they can be used the same way
| on Windows. And Windows also supports UDS as well today. It's
| no excuse.
| nsteel wrote:
| Going forward, hopefully modern software will use the modern
| approach of AF_UNIX sockets in Windows 10 and above:
| https://devblogs.microsoft.com/commandline/af_unix-comes-to-...
|
| EDIT: And it would be interesting for someone to reproduce a
| benchmark like this on Windows to compare TCP loopback and the
| new(ish) unix socket support.
| [deleted]
| zamadatix wrote:
| A couple of years after this article came out Windows added
| support for SOCK_STREM Unix sockets.
| rnmmrnm wrote:
| windows is exactly the reason they didn't prevail imo. Windows
| named pipes have weird security caveats and are not really
| supported in high level languages. I think this lead everyone
| to just using loopback TCP as the portable IPC communication
| API instead of going with unix sockets.
| duped wrote:
| IME a lot of developers have never even heard of address
| families and treat "socket" as synonymous with TCP (or
| possibly, but rarely, UDP).
| [deleted]
| jjice wrote:
| Windows actually added Unix sockets about six years ago, and
| with how aggressive Microsoft EOLs older versions of their OS
| (relative to something like enterprise linux at least), it's
| probably a pretty safe bet to use at this point.
|
| https://devblogs.microsoft.com/commandline/af_unix-comes-to-...
| c7DJTLrn wrote:
| Interesting, thanks.
| Aachen wrote:
| With how aggressively Microsoft EOLs older versions of their
| OS, we're still finding decades-old server and client systems
| at clients.
|
| While Server 2003 is getting more rare and the last sighting
| of Windows 98/2000 has been a while, they're all running at
| the very least a few months after the last free security
| support is gone. But whether that's something you want to
| support as a developer is your choice to make.
| marcosdumay wrote:
| That's not very relevant.
|
| If you start developing a new software today, it won't need
| to run on those computers. And if it's old enough that it
| need to, you can bet all of those architectural decisions
| were already made and written into stone all over the
| place.
| johnmaguire wrote:
| > If you start developing a new software today, it won't
| need to run on those computers.
|
| This is a weird argument to make.
|
| For context, I work on mesh overlay VPNs at Defined.net.
| We initially used Unix domain sockets for our daemon-
| client control model. This supported Windows 10 / Server
| 2019+.
|
| We very quickly found our users needed support for Server
| 2016. Some are even still running 2012.
|
| Ultimately, as a software vendor, we can't just force
| customers to upgrade their datacenters.
| foobiekr wrote:
| It's actually the opposite of Microsoft quickly eoling on
| the server side. Server 2012 was EVERYWHERE as late as
| 2018-2019. They were still issuing service packs in 2018.
| rollcat wrote:
| I'd be more interested in the security and usability aspect.
| Loopback sockets (assuming you don't accidentally bind to
| 0.0.0.0, which would make it even worse) are effectively rwx to
| any process on the same machine that has the permission to open
| network connections, unless you bother with setting up a local
| firewall (which requires admin privileges). On top of that you
| need to figure out which port is free to bind to, and have a
| backup plan in case the port isn't free.
|
| Domain sockets are simpler in both aspects: you can create one in
| any suitable directory, give it an arbitrary name, chmod it to
| control access, etc.
| spacechild1 wrote:
| > Two communicating processes on a single machine have a few
| options
|
| Curiously, the article does not even mention pipes, which I would
| assume to be the most obvious solution for this task (but not
| necessarily the best, of course!)
|
| In particular, I am wondering how Unix domain sockets compare to
| (a pair of) pipes. At first glance, they appear to be very
| similar. What are the trade-offs?
| tptacek wrote:
| The pipe vs. socket perf debate is a very old one. Sockets are
| more flexible and tunable, which may net you better performance
| (for instance, by tweaking buffer sizes), but my guess is that
| the high order bit of how a UDS and a pipe perform are the
| same.
|
| Using pipes instead of a UDS:
|
| * Requires managing an extra set of file descriptors to get
| bidirectionality
|
| * Requires processes to be related
|
| * Surrenders socket features like file descriptor passing
|
| * Is more fiddly than the socket code, which can often be
| interchangeable with TCP sockets (see, for instant, the Go
| standard library)
|
| If you're sticking with Linux, I can't personally see a reason
| ever to prefer pipes. A UDS is probably the best default answer
| for generic IPC on Linux.
| xuhu wrote:
| With pipes, the sender has to add a SIGPIPE handler which is
| not trivial to do if it's a library doing the send/recv. With
| sockets it can use send(fd, buf, MSG_NOSIGNAL) instead.
| badrabbit wrote:
| Why not UDP? Less overhead and you can use multicast to expand
| messaging to machines in a lan. TCP on localhost makes little
| sense, especially when simple ack's can be implemented in UDP.
|
| But even then, I wonder how the segmentation in TCP is affecting
| performance in addition to windowing.
|
| Another thing I always wanted to try was using raw IP packets,
| why not? Just sequence requests and let the sender close a send
| transaction only when it gets an ack packet with the sequence #
| for each send. Even better, a raw AF_PACKET socket on the
| loopback interface! That might beat UDS!
| sophacles wrote:
| Give it a try and find out! I'd give that blog post a read.
|
| I suspect you'd run into all sorts of interesting issues...
| particularly if the server is one process but there are N>1
| clients and you're using AF_PACKET.
| svanwaa wrote:
| Would TCP_NODELAY make any difference (good or bad)?
| inv2004 wrote:
| Would be better to retest
|
| If I remember correct, we had the same results described in
| article in 2014, but also I remember that linux loopback was
| optimized after it and different was much smaller if visible
| duped wrote:
| What's in the way of TCP hitting the same performance as unix
| sockets, is it just netfilter?
| woodruffw wrote:
| I believe the conventional wisdom here is that UDS performs
| better because of fewer context switches and copies between
| userspace and kernelspace.
| foobiekr wrote:
| No. This is exactly the same. Think about life of a data gram
| or stream bytes on the syscall edge for each.
| woodruffw wrote:
| I'm not sure I understand. This isn't something I haven't
| thought about in a while, but it's pretty intuitive to me
| that a loopback TCP connection would pretty much always be
| slower: each transmission unit goes through the entire TCP
| stack, feeds into the TCP state machine, etc. Thats more
| time spent in the kernel.
| foobiekr wrote:
| The ip stack.
| noselasd wrote:
| TCP has a lot of rules nailed down in numerous RFCs -
| everything from how to handle sequence numbers, the 3-way
| handshake, congestion control, and much more.
|
| That translates into a whole lot of code that needs to run,
| while unix sockets are not that much more than a kernel buffer
| and code to copy data back and forth in that buffer - which
| doesn't need a lot of code to make happen.
| majke wrote:
| Always use Unix Domain sockets if you can. There are at least
| three concerns with TCP.
|
| First, local port numbers are a limited resource.
|
| https://blog.cloudflare.com/how-to-stop-running-out-of-ephem...
| https://blog.cloudflare.com/the-quantum-state-of-a-tcp-port/
| https://blog.cloudflare.com/this-is-strictly-a-violation-of-...
|
| Then the TCP buffer autotune can go berserk:
| https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-fo...
| https://blog.cloudflare.com/when-the-window-is-not-fully-ope...
|
| Finally, conntrack. https://blog.cloudflare.com/conntrack-tales-
| one-thousand-and... https://blog.cloudflare.com/conntrack-turns-
| a-blind-eye-to-d...
|
| These issues don't exist in Unix Sockets land.
| alexvitkov wrote:
| If different components of your system are talking over a
| pretend network you've already architectured yourself face
| first into a pile of shit. There's no argument for quality
| either way so I'll just use TCP sockets and save myself 2 hours
| when I inevitably have to get it running on Windows.
| mhuffman wrote:
| >If different components of your system are talking over a
| pretend network you've already architectured yourself face
| first into a pile of shit.
|
| How do you have your file delivery, database, and business
| logic "talk" to each other? Everything on the same computer
| is a "pretend network" to some extent, right? Do you always
| architect your own database right into your business logic
| along with a web-server as a single monolith? One off SPAs
| must take 2-3 months!
| johnmaguire wrote:
| FYI, Windows supports Unix domain sockets since Windows 10 /
| Server 2019.
| alexvitkov wrote:
| Good thing to mention, thanks.
|
| That's mostly why I said 2 hours and not a day, as you
| still have to deal with paths (there's no /run) and you may
| have to fickle with UAC or god save us NTFS permissions
| adzm wrote:
| I had not head of this! Long story short, AF_UNIX now
| exists for Windows development.
|
| https://devblogs.microsoft.com/commandline/af_unix-comes-
| to-... https://visualrecode.com/blog/unix-
| sockets/#:~:text=Unix%20d....
| Karrot_Kream wrote:
| These matter if you have need to bind to multiple ports, but if
| you're only running a handful of services that need to bind a
| socket, then port number allocation isn't a big issue. TCP
| Buffer autotune having problems also matters at certain scale,
| but in my experience requires a tipping point. TCP sockets also
| have configurable buffer sizes while Unix sockets have a fixed
| buffer size, so TCP socket buffers can get much deeper.
|
| At my last role we benchmarked TCP sockets vs Unix sockets in a
| variety of scenarios. In our benchmarks, only certain cases
| benefited from Unix sockets and generally the complexity of
| using them in containerized environments made them less
| attractive than TCP unless we needed to talk to a high
| throughput cache or we were doing things like farming requests
| out to a FastCGI process manager. Generally speaking, using
| less chatty protocols than REST (involving a lot less serde
| overhead and making it easier to allocate ingest structures)
| made a much bigger difference.
|
| I was actually a huge believer in deferring to Unix sockets
| where possible, due to blog posts like these and my
| understanding of the implementation details (I've implemented
| toy IPC in a toy kernel before), but a coworker challenged me
| to benchmark my belief. Sure enough on benchmark it turned out
| that in most cases TCP sockets were fine and simplified a
| containerized architecture enough that Unix sockets just
| weren't worth it.
| kelnos wrote:
| > _the complexity of using [UNIX sockets] in containerized
| environments made them less attractive than TCP_
|
| Huh, I would think UNIX sockets would be easier; since
| sharing the socket between the host and a container (or
| between containers) is as simple as mounting a volume in the
| container and setting permissions on the socket
| appropriately.
|
| Using TCP means dealing with iptables and seems... less fun.
| I easily run into cases where the host's iptables firewall
| interferes with what Docker wants to do with iptables such
| that it takes hours just to get simple things working
| properly.
| lokar wrote:
| Also UDS have more features, for example you can get the remote
| peer UID and pass FDs
| the8472 wrote:
| And SOCK_SEQPACKET which greatly simplifies fd-passing
| chadaustin wrote:
| How does SOCK_SEQPACKET simplify fd-passing? Writing a
| streaming IPC crate as we speak and wondering if there are
| land mines beyond https://gist.github.com/kentonv/bc7592af9
| 8c68ba2738f44369208...
| the8472 wrote:
| Well, the kernel does create implicit packetization
| boundary when you attach FDs to a byte-stream... but this
| is underdocumented and there's an impedance mismatch
| between byte streams and discrete application-level
| messages. You can also send zero-sized messages to pass
| an FD. with byte streams you must send at least one byte.
| Which means you can send the FDs separately after sending
| the bytes which makes it easier to notify the application
| that it should expect FDs (in case it's not always using
| recvmsg with an cmsg allocation prepared). SEQPACKET just
| makes it more straight-forward because 1 message
| (+ancillary data) is always one sendmsg/recvmsg pair.
| chadaustin wrote:
| I appreciate your reply!
|
| My approach has been to send a header with the number of
| fds and bytes the next packet will contain, and the
| number of payload bytes is naturally never 0 in my case.
| etaham wrote:
| +1
| booleanbetrayal wrote:
| We've seen observable performance increases in migrating to
| unix domain sockets wherever possible, as some TCP stack
| overhead is bypassed.
| LinAGKar wrote:
| One problem I've run into when trying to use Unix sockets
| though is that it can only buffer fairly few messages at once,
| so if you have a lot of messages in flight at once you can
| easily end up with sends failing. TCP sockets can handle a lot
| more messages.
| count wrote:
| Can't you tune this with sysctl?
| kevincox wrote:
| The biggest reason for me is that you can use filesystem
| permissions to control access. Often I want to run a service
| locally and do auth at the reverse proxy, but if the service
| binds to localhost then all local processes can access without
| auth. If I only grant the reverse proxy permissions on the
| filesystem socket then you can't access without going through
| the auth.
| piperswe wrote:
| And with `SO_PEERCRED`, you can even implement more complex
| transparent authorization & logging based on the uid of the
| connecting process.
| kevincox wrote:
| This is true but to me mostly negates the benefit for this
| use case. The goal is to offload the auth work to the
| reverse proxy not to add more rules.
|
| Although I guess you could have the reverse proxy listen
| both on IP and UNIX sockets. It can then do different auth
| depending on how the connection came in. So you could auth
| with TLS Cert or Password over IP or using your PID/UNIX
| account over the UNIX socket.
| o11c wrote:
| Adjacently, remember that with TCP sockets you _can_ vary the
| address anywhere within 127.0.0.0 /8
| majke wrote:
| However this is not the case for ipv6. Technically you can
| use only ::1, unless you do Ipv6 FREEBIND
| nine_k wrote:
| You usually have a whole bunch of link-local IPv6
| addresses. Can't you use them?
| cout wrote:
| I agree. Always choose unix domain sockets over local TCP if it
| is an option. There are some valid reasons though to choose
| TCP.
|
| In the past, I've chosen local TCP sockets because I can
| configure the receive buffer size to avoid burdening the sender
| (ideally both TCP and unix domain sockets should correctly
| handle EAGAIN, but I haven't always had control over the code
| that does the write). IIRC the max buffer size for unix domain
| sockets is lower than for TCP.
|
| Another limitation of unix domain sockets is that the size of
| the path string must be less than PATH_MAX. I've run into this
| when the only directory I had write access to was already close
| to the limit. Local TCP sockets obviously do not have this
| limitation.
|
| Local TCP sockets can also bypass the kernel if you have a
| user-space TCP stack. I don't know if you can do this with unix
| domain sockets (I've never tried).
|
| I can also use local tcp for websockets. I have no idea if
| that's possible with unix domain sockets.
|
| In general, I choose a shared memory queue for local-only
| inter-process communication.
| duped wrote:
| > Local TCP sockets can also bypass the kernel if you have a
| user-space TCP stack. I don't know if you can do this with
| unix domain sockets (I've never tried).
|
| Kernel bypass exists because hardware can handle more packets
| than the kernel can read or write, and all the tricks
| employed are clever workarounds (read: kinda hacks) to get
| the packets managed in user space.
|
| This is kind of an orthogonal problem to IPC, and there's
| already a well defined interface for multiple processes to
| communicate without buffering through the kernel - and that's
| shared memory. You could employ some of the tricks (like
| LD_PRELOAD to hijack socket/accept/bind/send/recv) and
| implement it in terms of shared memory, but at that point why
| not just use it directly?
|
| If speed is your concern, shared memory is always the fastest
| IPC. The tradeoff is that you now have to manage the
| messaging across that channel.
| bheadmaster wrote:
| In my experience, for small unbatchable messages, UNIX
| sockets are fast enough not to warrant the complexity of
| dealing with shared memory.
|
| However, for bigger and/or batchable messages, shared
| memory ringbuffer + UNIX socket for synchronization is the
| most convenient yet fast IPC I've used.
| Agingcoder wrote:
| On Linux you can use abstract names, prefixed with a null
| byte. They disappear automatically when your process dies,
| and afaik don't require rw access to a directory.
| throwway120385 wrote:
| > I can also use local tcp for websockets. I have no idea if
| that's possible with unix domain sockets.
|
| The thing that makes this possible or impossible is how your
| library implements the protocol, at least in C/C++. The
| really bad protocol libraries I've seen like for MQTT, AMQP,
| et. al. all insist on controlling both the connection stream
| and the protocol state machine and commingle all of the code
| for both. They often also insist on owning your main loop
| which is a bad practice for library authors.
|
| A much better approach is to implement the protocol as a
| separate "chunk" of code with well-defined interfaces for
| receiving inputs and generating outputs on a stream, and with
| hooks for protocol configuration as-needed. This allows me to
| do three things that are good: * Choose how I want to do I/O
| with the remote end of the connection. * Write my own main
| loop or integrate with any third-party main loop that I want.
| * Test the protocol code without standing up an entire TLS
| connection.
|
| I've seen a LOT of libraries that don't allow these things.
| Apache's QPID Proton is a big offender for me, although they
| were refactoring in this direction. libmosquitto provides
| some facilities to access the filedescriptor but otherwise
| tries to own the entire connection. So on and so forth.
|
| Edit: I get how you end up there because it's the easiest way
| to figure out the libraries. Also, if I had spare time on my
| hands I would go through and work with maintainers to fix
| these libraries because having generic open-source protocol
| implementations would be really useful and would probably
| solve a lot of problems in the embedded space with ad-hoc
| messaging implementations.
|
| If the protocol library allows you to control the connection
| and provides a connection-agnostic protocol implementation
| then you could replace a TLS connection over TCP local
| sockets from OpenSSL with SPI transfers or CAN transfers to
| another device if you really wanted to. Or Unix Domain
| Sockets, because you own the file descriptor and you manage
| the transfers yourself.
| chrsig wrote:
| > Another limitation of unix domain sockets is that the size
| of the path string must be less than PATH_MAX. I've run into
| this when the only directory I had write access to was
| already close to the limit. Local TCP sockets obviously do
| not have this limitation.
|
| This drove me nuts for a _long_ time, trying to hunt down why
| the socket couldn 't be created. it's a really subtle
| limitation, and there's not a good error message or anything.
|
| In my use case, it was for testing the server creating the
| socket, and each test would create it's own temp dir to house
| the socket file and various other resources.
|
| > In general, I choose a shared memory queue for local-only
| inter-process communication.
|
| Do you mean the sysv message queues, or some user space
| system? I've never actually seen sysv queues in the wild, so
| I'm curious to hear more.
| pixl97 wrote:
| Isn't PATH_MAX 4k characters these days? Have to have some
| pretty intense directory structures to hit that.
| rascul wrote:
| For unix domain sockets on Linux the max is 108 including a
| null terminator.
|
| https://www.man7.org/linux/man-pages/man7/unix.7.html
|
| https://unix.stackexchange.com/questions/367008/why-is-
| socke...
| rwmj wrote:
| AF_VSOCK is another one to consider these days. It's a kind of
| hybrid of loopback and Unix. Although they are designed for
| communicating between virtual machines, vsock sockets work just
| as well between regular processes. Also supported on Windows.
|
| https://www.man7.org/linux/man-pages/man7/vsock.7.html
| https://wiki.qemu.org/Features/VirtioVsock
| touisteur wrote:
| With some luck and love in the future hopefully we'll also be
| able to use them in containers
| https://patchwork.kernel.org/project/kvm/cover/2020011617242...
| which would simplify a lot of little things.
| tptacek wrote:
| What's the advantage to vsocks over Unix domain sockets? UDS's
| are very fast, and much easier to use.
| rwmj wrote:
| I didn't mean to imply any advantage, just that they are
| another socket-based method for two processes to communicate.
| Since vsocks use a distinct implementation they should
| probably be benchmarked alongside Unix domain sockets and
| loopback sockets in any comparisons. My expectation is they
| would be somewhere in the middle - not as well optimized as
| Unix domain sockets, but with less general overhead than TCP
| loopback.
|
| If you are using vsocks between two VMs as intended then they
| have the advantage that they allow communication without
| involving the network stack. This is used by VMs to implement
| guest agent communications (screen resizing, copy and paste
| and so on) where the comms don't require the network to have
| been set up at all or be routable to the host.
| cout wrote:
| I did not know about this. Thanks for the tip!
| coppsilgold wrote:
| VMM's such as firecracker and cloud-hypervisor translate
| between vsock and UDS. [1]
|
| In recent kernel versions, sockmap also has vsock translation:
| <https://github.com/torvalds/linux/commit/5a8c8b72f65f6b80b52..
| .>
|
| This allows for a sort of UDS "transparency" between guest and
| host. When the host is connecting to a guest, the use of a
| multiplexer UDS is required. [1]
|
| [1] <https://github.com/firecracker-
| microvm/firecracker/blob/main...>
___________________________________________________________________
(page generated 2023-09-11 22:00 UTC)