hngopher.com

       [HN Gopher] TCP, the workhorse of the internet
       ___________________________________________________________________
        
       TCP, the workhorse of the internet
        
       Author : signa11
       Score  : 261 points
       Date   : 2025-11-15 06:37 UTC (16 hours ago)
        
 (HTM) web link (cefboud.com)
 (TXT) w3m dump (cefboud.com)
        
       | zkmon wrote:
       | I hate to think of the future of these nice blog posts, that need
       | to struggle to convince the readers about the _organic_ level of
       | their content.
        
       | stavros wrote:
       | Wait, _can_ you actually just use IP? Can I just make up a packet
       | and send it to a host across the Internet? I 'd think that all
       | the intermediate routers would want to have an opinion about my
       | packet, caring, at the very least, that it's either TCP or UDP.
        
         | gsliepen wrote:
         | They shouldn't; the whole point is that the IP header is enough
         | to route packets between endpoints, and only the endpoints
         | should care about any higher layer protocols. But unfortunately
         | some routers do, and if you have NAT then the NAT device needs
         | to examine the TCP or UDP header to know how to forward those
         | packets.
        
           | jadamson wrote:
           | Notably, QUIC (and thus HTTP/3) uses UDP instead of a new
           | protocol number for this reason.
        
             | stavros wrote:
             | Yeah, this is basically what I was wondering, why QUIC used
             | UDP instead of their own protocol if it's so
             | straightforward. It seems like the answer may be "it's not
             | as interference-free as they'd like it".
        
               | Twisol wrote:
               | UDP pretty much just tacks a source/destination port pair
               | onto every IP datagram, so its primary function is to
               | allow multiple independent UDP peers to coexist on the
               | same IP host. (That is, UDP just multiplexes an IP link.)
               | UDP as a protocol doesn't add any additional network
               | guarantees or services on top of IP.
               | 
               | QUIC is still "their own protocol", just implemented as
               | another protocol nested inside a UDP envelope, the same
               | way that HTTP is another protocol typically nested inside
               | a TCP connection. It makes some sense that they'd
               | piggyback on UDP, since (1) it doesn't require an
               | additional IP protocol header code to be assigned by
               | IANA, (2) QUIC definitely wants to coexist with other
               | services on any given node, and (3) it allows whatever
               | middleware analyses that exist for UDP to apply naturally
               | to QUIC applications.
               | 
               | (Regarding (3) specifically, I imagine NAT in particular
               | requires cooperation from residential gateways, including
               | awareness of both the IP and the TCP/UDP port. Allowing a
               | well-known outer UDP header to surface port information,
               | instead of re-implementing ports somewhere in the QUIC
               | header, means all existing NAT implementations should
               | work unchanged for QUIC.)
        
               | conradludgate wrote:
               | When it comes to QUIC, QUIC works best with unstable end-
               | user internet (designed for http3 for the mobile age).
               | Most end-user internet access is behind various layers of
               | CGNAT. The way that NAT works is by using your port
               | numbers to increase the address space. If you have 2^32
               | IPv4 addresses, you have 2^48 IPv4 address+port pairs.
               | All these NAT middleboxes speak TCP and UDP only.
               | 
               | Additionally, firewalls are also designed to filter out
               | any weird packets. If the packet doesn't look like you
               | wanted to receive it, it's dropped. It usually does this
               | by tracking open ports just like NAT, therefore many
               | firewalls also don't trust custom protocols.
        
               | lxgr wrote:
               | It's effectively impossible to use anything other than
               | TCP or UDP these days.
               | 
               | Some people here will argue that it actually really is,
               | and that everybody experiencing issues is just on a
               | really weird connection or using broken hardware, but
               | those weird connections and bad hardware make up the
               | overwhelming majority of Internet connections these days.
        
               | hylaride wrote:
               | Using UDP means QUIC support is as "easy" as adding it to
               | the browser and server software. To add it as a separate
               | protocol would have involved all OS's needing to add
               | support for it into their networking stacks and that
               | would have taken ages and involved more politics. The
               | main reason QUIC was created was so that Google could
               | more effectively push ads and add tracking, remember. The
               | incentives were not there for others to implement it.
        
               | toast0 wrote:
               | Yeah, so... You can do it. But only for some values of
               | you. In a NAT world, the NAT needs to understand the
               | protocol so that it can adjust the core multiplexing in
               | order to adjust addresses. A best effort NAT could let
               | one internal IP at a time connect to each external IP on
               | an unknown protocol, but that wouldn't work for QUIC:
               | Google expects multiple clients behind a NAT to connect
               | to its service IPs. It can often works for IP tunneling
               | protocols where at most one connection to an external IP
               | isn't super restrictive. But even then, many NATs won't
               | pass unknown IP protocols at all.
               | 
               | Most firewalls will drop unknown IP protocols. Many will
               | drop a lot of TCP; some drop almost all UDP. This is why
               | so much stuff runs over tcp ports 80 and 443; it's almost
               | always open. QUIC/HTTP/3 encourages opening of udp/443,
               | so it's a good port to run unrelated things over too.
               | 
               | Also, given that SCTP had similar goals to QUIC and never
               | got much deployment or support in OSes and NATs and
               | firewalls and etc. It's a clear win to just use UDP and
               | get something that will just work on a large portion of
               | networks.
        
           | Hikikomori wrote:
           | Can also NAT using IP protocol.
        
         | Twisol wrote:
         | As far as I'm aware, sure you can. TCP packets and UDP
         | datagrams are wrapped in IP datagrams, and it's the job of an
         | IP network to ship your data from point A (sender) to point B
         | (receiver). Nodes along the way might do so-called "deep packet
         | inspection" to snoop on the payload of your IP datagrams (for
         | various reasons, not all nefarious), but they don't need to do
         | that to do the basic job of routing. From a semantic
         | standpoint, the information in the TCP and UDP headers (as part
         | of the IP payload) is only there to govern interactions between
         | the two endpoint parties. (For instance, the "port" of a TCP or
         | UDP packet is a node-local identifier for one of many services
         | that might exist at the IP address the packet was routed to,
         | allowing many services to coexist at the same node.)
        
           | stavros wrote:
           | Hmm, I thought intermediate routers use the TCP packet's bits
           | for congestion control, no? Though I guess they can probably
           | just use the destination IP for that.
        
             | Twisol wrote:
             | They probably _can_ do deep /shallow packet inspection for
             | that purpose (being one of the non-nefarious applications I
             | alluded to), but that's not to say their correct
             | functioning _relies_ on it. Those routers also need to
             | support at least UDP, and UDP provides almost no extra
             | information at that level -- just the source and
             | destination ports (so, perhaps QoS prioritization) and the
             | inner payload 's length and checksum (so, perhaps dropping
             | bad packets quickly).
             | 
             | If middleware decides to do packet inspection, it better
             | make sure that any behavioral differences (relative to not
             | doing any inspection) is strictly an optimization and does
             | not impact the correctness of the link.
             | 
             | Also, although I'm not a network operator by any stretch,
             | my understanding is that TCP congestion control is
             | primarily a function of the endpoints of the TCP link, not
             | the IP routers along the way. As Wikipedia explains [0]:
             | 
             | > Per the end-to-end principle, congestion control is
             | largely a function of internet hosts, not the network
             | itself.
             | 
             | [0]: https://en.wikipedia.org/wiki/TCP_congestion_control
        
             | toast0 wrote:
             | _Most_ intermediate routers don 't care much. Lookup the
             | destination IP in the routing table, forward to the next
             | hop, no time for anything else.
             | 
             | Classic congestion control is done on the sender alone. The
             | router's job is simply to drop packets when the queue is
             | too large.
             | 
             | Maybe the router supports ECN, so _if_ there 's a queue
             | going to the next hop, it will look for protocol specific
             | ECN headers to manipulate.
             | 
             | Some network elements do more than the usual routing work.
             | A traffic shaper might have per-user queues with outbound
             | bandwidth limits. A network accelerator may effectively
             | reterminate TCP in hopes of increasing acheivable
             | bandwidth.
             | 
             | Often, the router has an aggregated connection to the next
             | hop, so it'll use a hash on the addresses in the packet to
             | choose which of the underlying connections to use. That
             | hash could be based on many things, but it's not uncommon
             | to use tcp or udp port numbers if available. This can also
             | be used to chose between equally scored next hops and
             | that's why you often see several different paths during a
             | traceroute. Using port numbers is helpful to balance
             | connections from IP A to IP B over multiple links. If you
             | us an unknown protocol, even if it is multiplexed into
             | ports or similar (like tcp and udp), the different streams
             | will likely always hash onto the same link and you won't be
             | able to exceed the bandwidth of a single link and a damaged
             | or congested link will affect all or none of your
             | connections.
        
           | HPsquared wrote:
           | Huh. So it's literally "TCP over IP" like the name suggests.
        
         | ilkkao wrote:
         | You can definitely craft an IP packet by hand and send it. If
         | it's IPv4, you need to put a number between 0 and 255 to the
         | protocol field from this list:
         | https://www.iana.org/assignments/protocol-numbers/protocol-n...
         | 
         | Core routers don't inspect that field, NAT/ISP boxes can. I
         | believe that with two suitable dedicated linux servers it is
         | very possible to send and receive single custom IP packet
         | between them even using 253 or 254 (= Use for experimentation
         | and testing [RFC3692]) as the protocol number
        
           | Twisol wrote:
           | > If it's IPv4, you need to put a number between 0 and 255 to
           | the protocol field from this list:
           | 
           | To save a skim (though it's an interesting list!), protocol
           | codes 253 and 254 are suitable "for experimentation and
           | testing".
        
           | stavros wrote:
           | Very interesting, thanks!
        
           | inglor_cz wrote:
           | This is an interesting list; it makes you appreciate just how
           | many obscure protocols have died out in practice. Evolution
           | in networks seems to mimic evolution in nature quite well.
        
           | morcus wrote:
           | What happens when the remaining 104 unassigned protocol
           | numbers are exhausted?
        
             | marcosdumay wrote:
             | People will start overloading the numbers.
             | 
             | I do hope we'll have stopped using IPv4 by then... But
             | well, a decade after address exhaustion we are still on it,
             | so who knows?
        
               | kbolino wrote:
               | IPv6 uses the exact same 8-bit codes as IPv4.
               | 
               | It uses them a little differently -- in IPv4, there is
               | one protocol per packet, while in IPv6, "protocols" can
               | be chained in a mechanism called extension headers -- but
               | this actually makes the problem of number exhaustion more
               | acute.
        
               | brewmarche wrote:
               | What if extension headers made it better? We could come
               | up with a protocol consisting solely of a larger Next
               | Header field and chain this pseudo header with the actual
               | payload whenever the protocol number is > 255. The same
               | idea could also be used in IPv4.
        
               | kbolino wrote:
               | I didn't mean to imply otherwise. But, as you say, this
               | is equally applicable to IPv4 and IPv6. There were a lot
               | of issues solved by IPv6, but "have even more room for
               | non-TCP/UDP transports" wasn't one of them (and didn't
               | need to be, tbqh).
        
             | hylaride wrote:
             | We're about half-way to exhausted, but a huge chunk of the
             | ones assigned are long deprecated and/or proprietary
             | technologies and could conceivably be reassigned.
             | Assignment now is obviously a lot more conservative than it
             | was in the 1980s.
             | 
             | There is sometimes drama with it, though. Awhile back, the
             | OpenBSD guys created CARP as a fully open source router
             | failover protocol, but couldn't get an official IP number
             | and ended up using the same one as VRRP. There's also a lot
             | of historical animosity that some companies got numbers for
             | proprietary protocols (eg Cisco got one for its then-
             | proprietary EIGRP).
             | 
             | https://en.wikipedia.org/wiki/List_of_IP_protocol_numbers
        
             | Ekaros wrote:
             | Probably use of some type of options. Up to 320 bits, so I
             | think there is reasonable amount of space there for good
             | while. Ofc, this makes really messy processing, but with
             | current hardware not impossible.
        
           | rfmoz wrote:
           | Playing with protocol number change usually results in
           | "Protocol Unreachable" or "Malformed Packet" from your OS.
        
         | LeoPanthera wrote:
         | You know I've always wondered if you could run Kermit*-over-IP,
         | without having TCP inbetween.
         | 
         | *The protocol.
        
         | Karrot_Kream wrote:
         | If there's no form of NAT or transport later processing along
         | your path between endpoints you shouldn't have an issue. But
         | NAT and transport and application layer load balancing are very
         | common on the net these days so YMMV.
         | 
         | You might have more luck with an IPv6 packet.
        
         | NooneAtAll3 wrote:
         | something like this?
         | 
         | https://en.wikipedia.org/wiki/IP_over_Avian_Carriers
        
           | Twisol wrote:
           | That would be IP over some lower level physical layer, not
           | some custom content stuffed into an IP packet :)
           | 
           | (It's absolutely worth reading some of those old April Fools'
           | RFCs, by the way [0]. I'm a big fan of RFC 7168, which
           | introduced HTTP response code 418 "I'm a teapot".)
           | 
           | [0]: https://en.wikipedia.org/wiki/April_Fools%27_Day_Request
           | _for...
        
         | immibis wrote:
         | Yes but not if you or they are behind NAT. It's a shame port
         | numbers aren't in IP.
        
         | nly wrote:
         | The reason you wouldn't do that is IP doesn't give you a
         | mechanism to share an IP address with multiple processes on a
         | host, it just gets your packets to a particular host.
         | 
         | As soon as you start thinking about having multiple services on
         | a host you end up with the idea of having a service id or
         | "port"
         | 
         | UDP or UDP Lite gives you exactly that at the cost of 8 bytes,
         | so there's no real value in not just putting everything on top
         | of UDP
        
         | xorcist wrote:
         | > caring, at the very least, that it's either TCP or UDP.
         | 
         | You left out ICMP, my favourite! (And a lot more important in
         | IPv6 than in v4.)
         | 
         | Another pretty well known protocol that is neither TCP nor UDP
         | is IPsec. (Which is really two new IP protocols.) People really
         | did design proper IP protocols still in the 90s.
         | 
         | > Can I just make up a packet and send it to a host across the
         | Internet?
         | 
         | You should be able to. But if you are on a corporate network
         | with a really strict firewalling router that only forwards
         | traffic it likes, then likely not. There are also really crappy
         | home routers which gives similar problems from the other end of
         | enterpriseness.
         | 
         | NAT also destroyed much of the end-to-end principle. If you
         | don't have a real IP address and relies on a NAT router to
         | forward your data, it needs to be in a protocol the router
         | recognizes.
         | 
         | Anyway, for the past two decades people have grown tired of
         | that and just piles hacks on top of TCP or UDP instead. That's
         | sad. Or who am I kidding? Really it's on top of HTTP. HTTP will
         | likely live on long past anything IP.
        
           | gruturo wrote:
           | > NAT also destroyed much of the end-to-end principle. If you
           | don't have a real IP address and relies on a NAT router to
           | forward your data, it needs to be in a protocol the router
           | recognizes.
           | 
           | Not necessarily. Many protocols can survive being NATed if
           | they don't carry IP/port related information inside their
           | payload. FTP is a famous counterexample - it uses a control
           | channel (TCP21) which contains commands to open data channels
           | (TCP20), and those commands specify IP:port pairs, so,
           | depending on the protocol, a NAT router has to rewrite them
           | and/or open ports dynamically and/or create NAT entries on
           | the fly. A lot of other stuff has no need for that and will
           | happily go through without any rewriting.
        
             | lxgr wrote:
             | Of course NAT allows application layer protocols _layered
             | on TCP or UDP_ to pass through without the NAT
             | understanding the application layer - otherwise, NATted
             | networks would be entirely broken.
             | 
             | The end-to-end principle at the IP layer (i.e. having the
             | IP forwarding layer be agnostic to the transport layer
             | protocols above it) is still violated.
        
               | Hikikomori wrote:
               | You can NAT on IP protocol as well, just not to more than
               | one per external IP.
        
               | brewmarche wrote:
               | I guess most people mean NAPT/PAT when they say NAT
        
             | xorcist wrote:
             | I think we agree. Of course a NAT router with an
             | application proxy such as FTP or SIP can relay and rewrite
             | traffic as needed.
             | 
             | TCP and UDP have port numbers that the NAT software can
             | extract and keep state tables for, so we can send the
             | return traffic to its intended destination.
             | 
             | For unknown IP protocols that is not possible. It may at
             | best act like network diode, which is one way of violating
             | the end-to-end principle.
        
               | Hikikomori wrote:
               | You can NAT on IP protocol as well, just not to more than
               | one per external IP.
        
               | gruturo wrote:
               | Actually the observation about ports being mostly a
               | TCP/UDP feature is a very good point I had failed to
               | consider. This would indeed greatly limit the ability of
               | a NAT gateway - it could keep just a state table of IP
               | src/dst pairs and just direct traffic back to its source,
               | but it's indeed very crude. Thanks for bringing it up!
        
           | lxgr wrote:
           | > You left out ICMP, my favourite!
           | 
           | Even ICMP has a hard time traversing NATs and firewalls these
           | days, for largely bad reasons. Try pinging anything in AWS,
           | for example...
        
             | 6031769 wrote:
             | Have to say that I don't encounter any problems pinging
             | hosts in AWS.
             | 
             | If any host is firewalling out ICMP then it won't be
             | pingable but that does not depend on the hosting provider.
             | AWS is no better or worse than any other in that regard,
             | IME.
        
             | Hikikomori wrote:
             | Doesn't really have anything to do with nat though.
        
           | xyzzyz wrote:
           | There is little point in inventing new protocols, given how
           | low the overhead of UDP is. That's just 8 bytes per packet,
           | and it enables going through NAT. Why come up with a new
           | transport layer protocol, when you can just use UDP framing?
        
             | mlhpdx wrote:
             | Agreed. Building a custom protocol seems "hard" to many
             | folks who are doing it without any fear on top of HTTP. The
             | wild shenanigans I've seen with headers, query params and
             | JSON make me laugh a little. Everything as text is
             | _actually_ hard.
             | 
             | A part of the problem with UDP is the lack of good
             | platforms and tooling. Examples as well. I'm trying to help
             | with that, but it's an uphill battle for sure.
        
         | GardenLetter27 wrote:
         | Probably not, loads of routers are even blocking parts of ICMP.
        
           | eqvinox wrote:
           | That's firewalls (or others), not routers. If it blocks
           | things, it's by definition not a router anymore.
        
             | lxgr wrote:
             | You can call the things mangling IP addresses and TCP/UDP
             | ports what you want, but that will unfortunately not make
             | them go away and stop throwing away non-TCP/UDP traffic.
        
             | marcosdumay wrote:
             | Both things come on the same box nowadays.
             | 
             | There are many routers that don't care at all about what's
             | going through them. But there aren't any firewalls that
             | don't route anymore (not even at the endpoints).
        
             | rubatuga wrote:
             | And by your definition my home router is not a router since
             | it does NAT? There's really no point in arguing semantics
             | like this.
        
               | eqvinox wrote:
               | We're discussing nonstandard IP protocols. In that
               | context, your home router is a CPE, and not described by
               | the term "router" without further qualifiers, because
               | that's the level the discussion is at. I'm happy to call
               | it a router when talking to the neighbors, when I'm not
               | discussing IP protocols with them.
        
         | gruturo wrote:
         | Yep it's full of IP protocols other than the well-known TCP,
         | UDP and ICMP (and, if you ever had the displeasure of learning
         | IPSEC, its AH and ESP).
         | 
         | A bunch of multicast stuff (IGMP, PIM)
         | 
         | A few routing protocols (OSPF, but notably not BGP which just
         | uses TCP, and (usually) not MPLS which just goes over the wire
         | - it sits at the same layer as IP and not above it)
         | 
         | A few VPN/encapsulation solutions like GRE, IP-in-IP, L2TP and
         | probably others I can't remember
         | 
         | As usual, Wikipedia has got you covered, much better than my
         | own recollection:
         | https://en.wikipedia.org/wiki/List_of_IP_protocol_numbers
        
           | lxgr wrote:
           | To GPs point, though, most of these will unfortunately be
           | dropped by most middleboxes for various reasons.
           | 
           | Behind a NA(P)T, you can obviously only use those protocols
           | that the translator knows how to remap ports for.
        
             | Hikikomori wrote:
             | Can also do 1:1 NAT for IP protocols like ipsec, or your
             | own protocol.
        
               | lxgr wrote:
               | Yes, but who else does? Network effects are important in
               | a network.
        
         | eqvinox wrote:
         | > I'd think that all the intermediate routers would want to
         | have an opinion about my packet, caring, at the very least,
         | that it's either TCP or UDP.
         | 
         | They absolutely don't. Routers are layer 3 devices; TCP & UDP
         | are layer 4. The only impact is that the ECMP flow hashes will
         | have less entropy, but that's purely an optimization thing.
         | 
         | Note TCP, UDP and ICMP are nowhere near all the protocols
         | you'll commonly see on the internet -- at minimum, SCTP, GRE,
         | L2TP and ESP are reasonably widespread (even a tiny fraction of
         | traffic is still a giant number considering internet scales).
         | 
         | You can send whatever protocol number with whatever contents
         | your heart desires. Whether the other end will do anything
         | useful with it is another question.
        
           | lxgr wrote:
           | > They absolutely don't. Routers are layer 3 devices;
           | 
           | Idealized routers are, yes.
           | 
           | Actual IP paths these days usually involve at least one NAT,
           | and these will absolutely throw away anything other than TCP,
           | UDP, and if you're lucky ICMP.
        
             | eqvinox wrote:
             | See nearby comment about terminology. Either we're
             | discussing odd IP protocols, then the devices you're
             | describing aren't just "routers" (and particularly what
             | you're describing is not part of a "router"), or we're not
             | discussing IP protocols, then we're not having this thread.
             | 
             | And note the GP talked about "intermediate routers". That's
             | the ones in a telco service site or datacenter by my book.
        
       | gsliepen wrote:
       | If you start with the problem of how to create a reliable stream
       | of data on top of an unreliable datagram layer, then the solution
       | that comes out will look virtually identical to TCP. It just is
       | the right solution for the job.
       | 
       | The three drawbacks of the original TCP algorithm were the window
       | size (the maximum value is just too small for today's speeds),
       | poor handling of missing packets (addressed by extensions such as
       | selective-ACK), and the fact that it only manages one stream at a
       | time, and some applications want multiple streams that don't
       | block each other. You could use multiple TCP connections, but
       | that adds its own overhead, so SCTP and QUIC were designed to
       | address those issues.
       | 
       | The congestion control algorithm is not part of the on-the-wire
       | protocol, it's just some code on each side of the connection that
       | decides when to (re)send packets to make the best use of the
       | available bandwidth. Anything that implements a reliable stream
       | on top of datagrams needs to implement such an algorithm. The
       | original ones (Reno, Vegas, etc) were very simple but already did
       | a good job, although back then network equipment didn't have
       | large buffers. A lot of research is going into making better
       | algorithms that handle large buffers, large roundtrip times,
       | varying bandwidth needs and also being fair when multiple
       | connections share the same bandwidth.
        
         | bobmcnamara wrote:
         | > If you start with the problem of how to create a reliable
         | stream of data on top of an unreliable datagram layer, then the
         | solution that comes out will look virtually identical to TCP.
         | 
         | I'll add that at the time of TCP's writing, the telephone
         | people far outnumbered everyone else in the packet switching vs
         | circuit switching debate. TCP gives you a virtual circuit over
         | a packet switched network as a pair of reliable-enough
         | independent byte streams over IP. This idea, that the endpoints
         | could implement reliability through retransmission came from an
         | earlier French network, Cylades, and ends up being a core
         | principle of IP networks.
        
           | Karrot_Kream wrote:
           | We're still "suffering" from the latency and jitter effects
           | of the packet switching victory. (The debate happened before
           | my time and I don't know if I would have really agreed with
           | circuit switching.) Latency and jitter on the modern Internet
           | are very best effort emphasis on "effort".
        
             | lxgr wrote:
             | True, but with circuit switching, we'd probably still be
             | paying by the minute, so most of these
             | jittery/bufferbloated connections would not exist in the
             | first place.
        
               | hylaride wrote:
               | Also, circuit switching is harder (well, more expensive)
               | to do at scale, especially with different providers
               | (probably a reason the traditional telecoms pushed it so
               | hard - to protect their traditional positions). Even
               | modern circuit technologies like MPLS are mostly
               | contained to within a network (though there can be and is
               | cross-networking peering) and aren't as connection
               | oriented as previous circuits like ATM or Frame Relay.
        
               | bcrl wrote:
               | Circuit switching is not harder to do, it's simply less
               | efficient. In the PSTN and ISDN world, circuits consumed
               | bandwidth regardless of whether it was actively in use or
               | not. There was no statistical multiplexing as a result.
               | 
               | Circuit switching packets means carrying metadata about
               | the circuit rather than simply using the destination MAC
               | or IP address to figure out routing along the way. ATM
               | took this to an extreme with nearly 10% protocol overhead
               | (48 bytes of payload in a 53 byte cell) and 22 bytes of
               | wasted space in the last ATM cell for a 1500 byte
               | ethernet packet. That inefficiency is what really hurt.
               | Sadly the ATM legacy lives on in GPON and XGSPON -- EPON
               | / 10GEPON are far better protocols. As a result, GPON and
               | XGSPON require gobs of memory per port for frame
               | reassembly (128 ONUs x 8 priorities x 9KB for jumbo
               | frames = 9MB per port worst case), whereas EPON / 10GEPON
               | do not.
               | 
               | MPLS also has certain issues that are solved by using the
               | IPv6 next header feature which avoids having to push /
               | pop headers (modifying the size of the packet which has
               | implications for buffering and the associated QoS issues
               | making the hardware more complex) in the transport
               | network. MPLS labels made sense at the time of
               | introduction in the early 2000s when transport network
               | hardware was able to utilize a small table to look up the
               | next hop of a frame instead of doing a full route lookup.
               | The hardware constraints of those early days requiring
               | small SRAMs have effectively gone away since modern ASICs
               | have billions of transistors which make on chip route
               | tables sufficient for many use-cases.
        
             | jandrese wrote:
             | As someone who at one point was working with people that
             | were trying to keep an ATM network reliable there is a
             | reason packet switching won.
        
             | wmf wrote:
             | L4S should improve latency and jitter.
        
           | musicale wrote:
           | The telephone people were basically right with their
           | criticisms of TCP/IP such as:
           | 
           | What about QoS? Jitter, bandwidth, latency, fairness
           | guarantees? What about queuing delay? What about multiplexing
           | and tunneling? Traffic shaping and engineering? What about
           | long-haul performance? Easy integration with optical circuit
           | networks? etc. ATM addressed these issues, but TCP/IP did
           | not.
           | 
           | All of these things showed up again once you tried to do VOIP
           | and video conferencing, and in core ISPs as well as access
           | networks, and they weren't (and in many cases still aren't)
           | easy to solve.
        
             | cachius wrote:
             | How could a circuit switched network look like at today's
             | scale?
        
               | musicale wrote:
               | The optical layer is still circuit-switched.
               | 
               | Also MPLS is basically a virtual circuit network.
        
             | hollerith wrote:
             | If that is true, then why did the telcos rapidly move the
             | entire backbone of the telephone network to IP in the
             | 1990s?
             | 
             | And why are they trying to persuade regulators to let them
             | get rid of the remaining (peripheral) part of the old
             | circuit-switched network, i.e., to phase out old-school
             | telephone hardware, requiring all customers to have IP
             | phone hardware?
        
               | kragen wrote:
               | Packet switching is cheaper; even though it can't make
               | any guarantees about latency and bandwidth the way
               | circuit switching could, it uses scarce long-haul
               | bandwidth more efficiently. I _regularly_ see people
               | falling off video calls, like, multiple times a week. So,
               | in some ways, it 's a worse product, but costs much less.
        
               | musicale wrote:
               | They moved to IP because it was improving faster in speed
               | and commoditization vs. ATM. But in order to make it
               | work, they had to figure out how to make QoS work on IP
               | networks, which wasn't easy. It still isn't easy (see:
               | crappy zoom calls.)
               | 
               | Modern circuit switched networks use optics rather than
               | the legacy copper circuits which date back to telegraphy.
        
               | mulmen wrote:
               | You can criticize something and still select it as the
               | best option. I do this daily with Apple. If you can't
               | find a flaw in a technical solution you probably aren't
               | looking close enough.
        
         | NooneAtAll3 wrote:
         | > If you start with the problem of how to create a reliable
         | stream of data on top of an unreliable datagram layer
         | 
         | > poor handling of missing packets
         | 
         | so it was poor at exact thing it was designed for?
        
           | allarm wrote:
           | It was a trade off at the time. Selective acknowledgments
           | require more resources.
        
           | silvestrov wrote:
           | Poor for high speed connections ( _) or very unreliable
           | connections.
           | 
           | _ ) compared to when TCP was invented.
           | 
           | When I started at university the ftp speed from the US during
           | daytime was 500 bytes per second! You don't have many
           | unacknowledged packages in such a connection.
           | 
           | Back then even a 1 megabits/sec connection was super high
           | speed and very expensive.
        
         | rini17 wrote:
         | Might be obvious in hindsight, but it was not clear at all back
         | then, that the congestion is manageable this way. There were
         | legitimate concerns that it will all just melt down.
        
         | rkagerer wrote:
         | _it only manages one stream at a time_
         | 
         | I'll take flak for saying it, but I feel web developers are
         | partially at fault for laziness on this one. I've often seen
         | them trigger a swath of connections (e.g. for uncoordinated
         | async events), when carefully managed multiplexing over one or
         | a handful will do just fine.
         | 
         | Eg. In prehistoric times I wrote a JavaScript library that let
         | you queue up several downloads over one stream, with control
         | over prioritization and cancelability.
         | 
         | It was used in a GreaseMonkey script on a popular dating
         | website, to fetch thumbnails and other details of all your
         | matches in the background. Hovering over a match would bring up
         | all their photos, and if some hadn't been retrieved yet they'd
         | immediately move to the top of the queue. I intentionally
         | wanted to limit the number of connections, to avoid
         | oversaturating the server or the user's bandwidth. Idle time
         | was used to prefetch all matches on the page (IIRC in a
         | sensible order responsive to your scroll location). If you
         | picked a large enough pagination, then stepped away to top up
         | your coffee, by the time you got back you could browse through
         | all of your recent matches instantly, without waiting for any
         | server roundtrip lag.
         | 
         | It was pretty slick. I realize these days modern stacks give
         | you multiplexing for free, but to put in context this was
         | created in the era before even JQuery was well-known.
         | 
         | Funny story, I shared it with one of my matches and she found
         | it super useful but was a bit surprised that, in a way, I was
         | helping my competition. Turned out OK... we're still together
         | nearly two decades later and now she generously jokes I
         | invented Tinder before it was a thing.
        
           | xyzzyz wrote:
           | Sure, you can reimplement multiplexing on the application
           | level, but it just makes more sense to do it on the transport
           | level, so that people don't have to do it in JavaScript.
        
             | groundzeros2015 wrote:
             | But unfortunately QUIC is a user space implementation over
             | kernel UDP.
        
               | MrDarcy wrote:
               | How is that relevant? The user agent (browser) handles
               | the transport.
        
               | groundzeros2015 wrote:
               | That's the problem. Browsers are billion dollar ventures
               | and are operating systems unto themselves. So they like
               | QUIC.
               | 
               | But you have to include giant libraries and kernel can't
               | see the traffic to better manage timing etc.
        
               | adzm wrote:
               | There is no real reason QUIC couldn't be implemented in
               | the kernel though.
        
           | karmakaze wrote:
           | _[Not a web dev but]_ I thought each site gets a handful of
           | connections (4) to each host and more requests would have to
           | wait to use one of them. That 's pretty close to what I'd
           | want with a reasonably fast connection.
        
             | rkagerer wrote:
             | That's basically right. Back when I made this, many servers
             | out there still limited you to just 2 (or sometimes even 1)
             | concurrent connections. As sites became more media-heavy
             | that number trended up. HTTP/2 can handle many concurrent
             | streams on one connection, I'm not sure if you get as fine-
             | grained control as with the library I wrote (maybe!).
        
           | rishabhaiover wrote:
           | This is wonderful to hear. I have a naive question. Is this
           | the reason most websites/web servers absolutely need CDNs
           | (apart from their edge capabilities) because they understand
           | caching much more than a web developer does? But I would
           | think the person more closer to the user access pattern would
           | know the optimal caching strategy.
        
             | vbezhenar wrote:
             | Most websites do not need CDNs.
             | 
             | CDNs became popular back in the old days, when some people
             | thought that if two websites are using jquery-1.2.3.min.js,
             | CDN could cache it and second site would load quicker.
             | These days, browser don't do that, they'll ignore cached
             | assets from other websites because it somehow helps to
             | protect user privacy and they value privacy over
             | performance in this case.
             | 
             | There are some reasons CDNs might be helpful. Edge
             | capability probably is the most important one. Another
             | reason is that serving lots of static data might be a
             | complicated task for a small website, so it makes sense to
             | offload it to a specialised service. These days, CDNs went
             | beyond static data. They can hide your backend, so public
             | user won't know its address and can't DDoS it. They can
             | handle TLS for you. They can filter bots, tor and people
             | from countries you don't like. All in a few clicks in the
             | dashboard, no need to implement complicated solutions.
             | 
             | But nothing you couldn't write yourself in a few days,
             | really.
        
         | 29athrowaway wrote:
         | I was excited about SCTP over 10 years ago but getting it to
         | work was hard.
         | 
         | The Linux kernel supports it but at least when I had tried this
         | those modules were disabled on most distros.
        
         | 1vuio0pswjnm7 wrote:
         | "... some applications want multiple streams that don't block
         | each other. You could use multiple TCP connections, but that
         | adds its own overhead, so SCTP and QUIC were designed to
         | address those issues."
         | 
         | Other applications work just fine with a single TCP connection
         | 
         | If I am using TCP for DNS, for example, and I am retrieving
         | data from a single host such as a DNS cache, I can send
         | multiple queries over a single TCP connection and receive
         | multiple responses over the same single TCP single connection,
         | out of order. No blocking.^1 If the cache (application)
         | supports it, this is much faster than receiving answers
         | sequentially and it's more efficient and polite than opening
         | multiple TCP connections
         | 
         | 1. I do this every day outside the browser with DNS over TLS
         | (DoT) using something like streamtcp from NLNet Labs. I'm not
         | sure that QUIC is faster, server support for QUIC is much more
         | limited, but QUIC may have other advantages
         | 
         | I also do it with DNS over HTTPS (DoH), outside the browser,
         | using HTTP/1.1 pipelining, but there I receive answers
         | sequentially. I'm still not convinced that HTTP/2 is faster for
         | this particular use case, i.e., downloading data from a single
         | host using multiple HTTP requests (compared to something like
         | integrating online advertising into websites, for example)
        
           | do_not_redeem wrote:
           | > I can send multiple queries over a single TCP connection
           | and receive multiple responses over the same single TCP
           | single connection, out of order. No blocking.
           | 
           | You're missing the point. You have one TCP connection, and
           | the sever sends you response1 and then response2. Now if
           | response1 gets lost or delayed due to network conditions, you
           | must wait for response1 to be retransmitted before you can
           | read response2. That is blocking, no way around it. It has
           | nothing to do with advertising(?), and the other protocols
           | mentioned don't have this drawback.
        
             | maccard wrote:
             | I work on an application that does a lot of high frequency
             | networking in a tcp like custom framework. Our protocol
             | guarantees ordering per "channel" so you can send requesr1
             | on channel 1 and request2 on channel 2 and receive the
             | responses in any order. (But if you send request 1 and then
             | request 2 on the same channel you'll get them back in
             | order)
             | 
             | It's a trade off, and there's a surprising amount of
             | application code involved on the receiving side in the
             | application waiting for state to be updated on both
             | channels. I definitely prefer it, but it's not without its
             | tradeoffs.
        
               | tekne wrote:
               | So, roll-your-own-QUIC?
        
           | pverheggen wrote:
           | > I can send multiple queries over a single TCP connection
           | and receive multiple responses over the same single TCP
           | single connection, out of order.
           | 
           | This is because DoT allows the DNS server to resolve queries
           | concurrently and send query responses out of order.
           | 
           | However, this is an application layer feature, not a
           | transport layer one. The underlying TCP packets still have to
           | arrive in order and therefore are subject to blocking.
        
         | kragen wrote:
         | There are a lot of design alternatives possible to TCP within
         | the "create a reliable stream of data on top of an unreliable
         | datagram layer" space:
         | 
         | * Full-duplex connections are probably a good idea, but
         | certainly are not the only way, or the most obvious way, to
         | create a reliable stream of data on top of an unreliable
         | datagram layer. TCP's predecessor NCP was half-duplex.
         | 
         | * TCP itself also supports a half-duplex mode--even if one end
         | sends FIN, the other end can keep transmitting as long as it
         | wants. This was probably also a good idea, but it's certainly
         | not the only obvious choice.
         | 
         | * Sequence numbers on messages or on bytes?
         | 
         | * Wouldn't it be useful to expose message boundaries to
         | applications, the way 9P, SCTP, and some SNA protocols do?
         | 
         | * If you expose message boundaries to applications, maybe you'd
         | also want to include a message type field? Protocol-level
         | message-type fields have been found to be very useful in
         | Ethernet and IP, and in a sense the port-number field in UDP is
         | also a message-type field.
         | 
         | * Do you really need urgent data?
         | 
         | * Do servers need different port numbers? TCPMUX is a
         | straightforward way of giving your servers port _names_ , like
         | in CHAOSNET, instead of port numbers. It only creates extra
         | overhead at connection-opening time, assuming you have the
         | moral equivalent of file descriptor passing on your OS. The
         | only limitation is that you have to use different client ports
         | for multiple simultaneous connections to the same server host.
         | But in TCP everyone uses different client ports for different
         | connections _anyway_. TCPMUX itself incurs an extra round-trip
         | time delay for connection establishment, because the requested
         | server name can 't be transmitted until the client's ACK
         | packet, but if you incorporated it into TCP, you'd put the
         | server name in the SYN packet. If you eliminate the server port
         | number in every TCP header, you can expand the client port
         | number to 24 or even 32 bits.
         | 
         | * Alternatively, maybe network addresses should be assigned to
         | server processes, as in Appletalk (or IP-based virtual hosting
         | before HTTP/1.1's Host: header, or, for TLS, before SNI became
         | widespread), rather than assigning network addresses to hosts
         | and requiring port numbers or TCPMUX to distinguish multiple
         | servers on the same host?
         | 
         | * Probably SACK was actually a good idea and should have always
         | been the default? SACK gets a lot easier if you ack message
         | numbers instead of byte numbers.
         | 
         | * _Why_ is acknowledgement reneging allowed in TCP? That was a
         | terrible idea.
         | 
         | * It turns out that measuring round-trip time is really
         | important for retransmission, and TCP has no way of measuring
         | RTT on retransmitted packets, which can pose real problems for
         | correcting a ridiculously low RTT estimate, which results in
         | excessive retransmission.
         | 
         | * Do you really need a PUSH bit? C'mon.
         | 
         | * A modest amount of overhead in the form of erasure-coding
         | bits would permit recovery from modest amounts of packet loss
         | without incurring retransmission timeouts, which is especially
         | useful if your TCP-layer protocol requires a modest amount of
         | packet loss for congestion control, as TCP does.
         | 
         | * Also you could use a "congestion experienced" bit instead of
         | packet loss to detect congestion in the usual case. (TCP did
         | eventually acquire CWR and ECE, but not for many years.)
         | 
         | * The fact that you can't resume a TCP connection from a
         | different IP address, the way you can with a Mosh connection,
         | is a serious flaw that seriously impedes nodes from moving
         | around the network.
         | 
         | * TCP's hardcoded timeout of 5 minutes is also a major flaw.
         | Wouldn't it be better if the application could set that to 1
         | hour, 90 minutes, 12 hours, or a week, to handle intermittent
         | connectivity, such as with communication satellites? Similarly
         | for very-long-latency datagrams, such as those relayed by
         | single LEO satellites. Together this and the previous flaw have
         | resulted in TCP largely being replaced for its original
         | session-management purpose with new ad-hoc protocols such as
         | HTTP magic cookies, protocols which use TCP, if at all, merely
         | as a reliable datagram protocol.
         | 
         | * Initial sequence numbers turn out not to be a very good
         | defense against IP spoofing, because that wasn't their original
         | purpose. Their original purpose was preventing the erroneous
         | reception of leftover TCP segments from a previous incarnation
         | of the connection that have been bouncing around routers ever
         | since; this purpose would be better served by using a different
         | client port number for each new connection. The ISN namespace
         | is far too small for current LFNs anyway, so we had to patch
         | over the hole in TCP with timestamps and PAWS.
        
           | musicale wrote:
           | AppleTalk didn't get much love for its broadcast (or possibly
           | multicast?) based service discovery protocol - but of course
           | that is what inspired mDNS. I believe AppleTalk's LAN
           | addresses were always dynamic (like 169.x IP addresses),
           | simplifying administration and deployment.
           | 
           | I tend to think that one of the reasons linux containers are
           | needed for network services is that DNS traditionally only
           | returns an IP address (rather than address + port) so each
           | service process needs to have its own IP address, which in
           | linux requires a container or at least a network namespace.
           | 
           | AppleTalk also supported a reliable transaction (basically
           | request-response RPC) protocol (ATP) and a session protocol,
           | which I believe were used for Mac network services (printing,
           | file servers, etc.) Certainly easier than
           | serializing/deserializing byte streams.
        
             | kragen wrote:
             | Does "session protocol" mean that it provided packet
             | retransmission and reordering, like TCP? How does that save
             | you serializing and deserializing byte streams?
             | 
             | I agree that, given the existing design of IP and TCP, you
             | could get much of the benefit of first-class addresses for
             | services by using, for example, DNS-SD, and that is what
             | ZeroConf does. (It is not a coincidence that the DNS-SD RFC
             | was written by a couple of Apple employees.) But, if that's
             | the way you're going to be finding endpoints to initiate
             | connections to, there's no benefit to having separate port
             | numbers and IP addresses. And IP addresses are far scarcer
             | than just requiring a Linux container or a network
             | namespace: there are only 232 of them. But it is rare to
             | find an IP address that is listening on more than 64 of its
             | 216 TCP ports, so in an alternate history where you moved
             | those 16 bits from the port number to the IP address, we
             | would have one thousandth of the IP-address crunch that we
             | do.
             | 
             | Historically, possibly the reason that it wasn't done this
             | way is that port numbers predated the DNS by about 10
             | years.
        
               | musicale wrote:
               | The session protocol was for sessions with servers and
               | was used for AFP (AppleShare file servers) I believe.
               | 
               | The higher level protocols were built on ATP which was
               | message based.
               | 
               | ADSP was a stream protocol that could be used for remote
               | terminal access or other applications where byte streams
               | actually made sense.
               | 
               | > Historically, possibly the reason that it wasn't done
               | this way is that port numbers predated the DNS by about
               | 10 years.
               | 
               | Predated or postdated?
               | 
               | My understanding is that DNS can potentially provide port
               | numbers, but this is not widely used or supported.
        
               | kragen wrote:
               | DNS postdated port numbers.
               | 
               | Mockapetris's DNS RFCs are from 01983, although I think
               | I've talked to people who installed DNS a year or two
               | before that. Port numbers were first proposed in RFC 38
               | in 01970 https://datatracker.ietf.org/doc/html/rfc38
               | 
               | > _The END and RDY must specify relevant sockets in
               | addition to the link number. Only the local socket name
               | need be supplied_
               | 
               | and given actual numbers in RFC 54, also in 01970
               | https://datatracker.ietf.org/doc/html/rfc54
               | 
               | > _Connections are named by a pair of sockets. Sockets
               | are 40 bit names which are known throughout the network.
               | Each host is assigned a private subset of these names,
               | and a command which requests a connection names one
               | socket which is local to the requesting host and one
               | local to the receiver of the request._
               | 
               | > _Sockets are polarized; even numbered sockets are
               | receive sockets; odd numbered ones are send sockets. One
               | of each is required to make a connection._
               | 
               | In RFC 129 in 01971 we see discussion about whether
               | socketnames should include host numbers and/or user
               | numbers, still with the low-order bit indicating the
               | socket's gender (emissive or receptive).
               | https://datatracker.ietf.org/doc/html/rfc129
               | 
               | RFC 147 later that year
               | https://datatracker.ietf.org/doc/html/rfc147 discusses
               | within-machine port numbers and how they should or should
               | not relate to the socketnames transmitted in NCP packets:
               | 
               | > _Previous network papers postulated that a process
               | running under control of the host 's operating system
               | would have access to a number of ports. A port might be a
               | physical input or output device, or a logical I/O device
               | (...)_
               | 
               | > _A socket has been defined to be the identification of
               | a port for machine to machine communication through the
               | ARPA network. Sockets allocated to each host must be
               | uniquely associated with a known process or be undefined.
               | The name of some sockets must be universally known and
               | associated with a known process operating with a
               | specified protocol. (e.g., a logger socket, RJE socket, a
               | file transfer socket). The name of other sockets might
               | not be universally known, but given in a transmission
               | over a universally known socket, (c. g. the socket pair
               | specified by the transmission over the logger socket
               | under the Initial Connection Protocol (ICP). In any case,
               | communication over the network is from one socket to
               | another socket, each socket being identified with a
               | process running at a known host._
               | 
               | RFC 167 the same year
               | https://datatracker.ietf.org/doc/html/rfc167 proposes
               | that socketnames not be required to be unique network-
               | wide but just within a host. It also points out that you
               | really only need the socketname during the initial
               | connection process, if you have some other way of knowing
               | which packets belong to which connections:
               | 
               | > _Although fields will be helpful in dealing with socket
               | number allocation, it is not essential that such field
               | designations be uniform over the network. In all network
               | transactions the 32-bit socket number is handled with its
               | 8-bit host number. Thus, if hosts are able to maintain
               | uniqueness and repeatability internally, socket numbers
               | in the network as a whole will also be unique and
               | repeatable. If a host fails to do so, only connections
               | with that offending host are affected._
               | 
               | > _Because the size, use, and character of systems on the
               | network are so varied, it would be difficult if not
               | impossible to come up with an agreed upon particular
               | division of the 32-bit socket number. Hosts have
               | different internal restrictions on the number of users,
               | processes per user, and connections per process they will
               | permit._
               | 
               | > _It has been suggested that it may not be necessary to
               | maintain socket uniqueness. It is contended that there is
               | really no significant use made of the socket number after
               | a connection has been established. The only reason a host
               | must now save a socket number for the life of a
               | connection is to include it in the CLOSE of that
               | connection._
               | 
               | RFC 172 in June
               | https://datatracker.ietf.org/doc/html/rfc172 proposes
               | using port 3 for the second version of FTP:
               | 
               | > _[6] It seems that socket 1 has been assigned to
               | logger. Socket 3 seems a reasonable choice for File
               | Transfer._
               | 
               | This updates the first version in RFC 114 in April
               | https://datatracker.ietf.org/doc/html/rfc114 which said:
               | 
               | > _[16] It seems that socket 1 has been assigned to
               | logger and socket 5 to NETRJS. Socket 3 seems a
               | reasonable choice for the file transfer process._
               | 
               | RFC 196 the same year
               | https://datatracker.ietf.org/doc/html/rfc196 proposes to
               | use port 5 to receive mail and/or print jobs:
               | 
               | > _Initial Connection will be as per the Official Initial
               | Connection Protocol, Documents #2, NIC 7101, to a
               | standard socket not yet assigned. A candidate socket
               | number would be socket #5._
               | 
               | In RFC204 in August https://www.rfc-
               | editor.org/rfc/rfc204.html Postel publishes the first
               | list of port number assignments:
               | 
               | > _I would like to collect information on the use of
               | socket numbers for "standard" service programs. For
               | example Loggers (telnet servers) Listen on socket 1. What
               | sockets at your host are Listened to by what programs?_
               | 
               | > _Recently Dick Watson suggested assigning socket 5 for
               | use by a mail-box protocol (RFC196). Does any one object
               | ? Are there any suggestions for a method of assigning
               | sockets to standard programs? Should a subset of the
               | socket numbers be reserved for use by future standard
               | protocols?_
               | 
               | > _Please phone or mail your answers and commtents to
               | (...)_
               | 
               | Amusingly in retrospect, Postel did not include an email
               | address, presumably because they didn't have email
               | working yet.
               | 
               | FTP's assignment to port 3 was confirmed in RFC 265 in
               | November:
               | 
               | > _Socket 3 is the standard preassigned socket number on
               | which the cooperating file transfer process at the
               | serving host should "listen". (*)The connection
               | establishment will be in accordance with the standard
               | initial connection protocol, (*)establishing a full-
               | duplex connection._
               | 
               | In May of 01972 Postel published a list as RFC 349
               | https://www.rfc-editor.org/rfc/rfc349.html:
               | 
               | > _I propose that there be a czar (me ?) who hands out
               | official socket numbers for use by standard protocols.
               | This czar should also keep track of and publish a list of
               | those socket numbers where host specific services can be
               | obtained. I further suggest that the initial allocation
               | be as follows:_                       Sockets
               | Assignment             0-63            Network wide
               | standard functions             64-127          Host
               | specific functions             128-239         Reserved
               | for future use             240-255         Any
               | experimental function
               | 
               | > _and within the network wide standard functions the
               | following particular assignment be made:_
               | Socket          Assignment                1
               | Telnet                3            File Transfer
               | 5            Remote Job Entry                7
               | Echo                9            Discard
               | 
               | Note that ports 7 and 9 are _still_ assigned to echo and
               | discard in  /etc/services, although Telnet and FTP got
               | moved to ports 23 and 21, respectively.
               | tcpmux          1/tcp                           # TCP
               | port service multiplexer         echo            7/tcp
               | echo            7/udp         discard         9/tcp
               | sink null         discard         9/udp           sink
               | null         systat          11/tcp          users
               | daytime         13/tcp         daytime         13/udp
               | netstat         15/tcp         qotd            17/tcp
               | quote         chargen         19/tcp          ttytst
               | source         chargen         19/udp          ttytst
               | source         ftp-data        20/tcp         ftp
               | 21/tcp         fsp             21/udp          fspd
               | ssh             22/tcp                          # SSH
               | Remote Login Protocol         telnet          23/tcp
               | 
               | So, internet port numbers in their current form are from
               | 01971 (several years before the split between TCP and
               | IP), and DNS is from about 01982.
               | 
               | In December of 01972, Postel published RFC 433
               | https://www.rfc-editor.org/rfc/rfc433.html, obsoleting
               | the RFC 349 list with a list including chargen and some
               | other interesting services:                      Socket
               | Assignment                 1               Telnet
               | 3               File Transfer            5
               | Remote Job Entry            7               Echo
               | 9               Discard            19
               | Character Generator [e.g. TTYTST]                 65
               | Speech Data Base @ ll-tx-2 (74)            67
               | Datacomputer @ cca (31)                 241
               | NCP Measurement            243             Survey
               | Measurement            245             LINK
               | 
               | The gap between 9 and 19 is unexplained.
               | 
               | RFC 503 https://www.rfc-editor.org/rfc/rfc503.html from
               | 01973 has a longer list (including systat, datetime, and
               | netstat), but _also_ listing which services were running
               | on which ARPANet hosts, 33 at that time. So RFC 503
               | contained a list of every server process running on what
               | would later become the internet.
               | 
               | Skipping RFC 604, RFC 739 from 01977 https://www.rfc-
               | editor.org/rfc/rfc739.html is the first one that shows
               | the modern port number assignments (still called "socket
               | numbers") for FTP and Telnet, though those presumably
               | dated back a couple of years at that point:
               | Specific Assignments:                   Decimal   Octal
               | Description                      References
               | -------   -----     -----------
               | ----------              Network Standard Functions
               | 1         1         Old Telnet
               | [6]              3         3         Old File Transfer
               | [7,8,9]              5         5         Remote Job Entry
               | [10]              7         7         Echo
               | [11]              9         11        Discard
               | [12]              11        13        Who is on or SYSTAT
               | 13        15        Date and Time              15
               | 17        Who is up or NETSTAT              17        21
               | Short Text Message              19        23
               | Character generator or TTYTST          [13]
               | 21        25        New File Transfer
               | [1,14,15]              23        27        New Telnet
               | [1,16,17]              25        31        Distributed
               | Programming System      [18,19]              27        33
               | NSW User System w/COMPASS FE           [20]
               | 29        35        MSG-3 ICP
               | [21]              31        37        MSG-3
               | Authentication                   [21]
               | 
               | Etc. This time I have truncated the list. It also has
               | Finger on port 79.
               | 
               | You say, "My understanding is that DNS can potentially
               | provide port numbers, but this is not widely used or
               | supported." DNS SRV records have existed since 01996
               | (proposed by Troll Tech and Paul Vixie in RFC 2052
               | https://www.rfc-editor.org/rfc/rfc2052), but they're
               | really only widely used in XMPP, in SIP, and in ZeroConf,
               | which was Apple's attempt to provide the facilities of
               | AppleTalk on top of TCP/IP.
        
           | musicale wrote:
           | > The fact that you can't resume a TCP connection from a
           | different IP address, the way you can with a Mosh connection,
           | is a serious flaw that seriously impedes nodes from moving
           | around the network
           | 
           | This 100% !! And basically the reason mosh had to be created
           | in the first place (and it probably wasn't easy.)
           | Unfortunately mosh only solves the problem for ssh. Exposing
           | fixed IP addresses to the application layer probably doesn't
           | help either.
           | 
           | So annoying that TCP tends to break whenever you switch wi-fi
           | networks or switch from wi-fi to cellular. (On iPhones at
           | least you have MPTCP, but that requires server-side support.)
        
           | Animats wrote:
           | * Full-duplex connections are probably a good idea, but
           | certainly are not the only way, or the most obvious way, to
           | create a reliable stream of data on top of an unreliable
           | datagram layer. TCP itself also supports a half-duplex mode--
           | even if one end sends FIN, the other end can keep
           | transmitting as long as it wants. This was probably also a
           | good idea, but it's certainly not the only obvious choice.
           | 
           | Much of that comes from the original applications being FTP
           | and TELNET.
           | 
           | * Sequence numbers on messages or on bytes?
           | 
           | Bytes, because the whole TCP message might not fit in an IP
           | packet. This is the MTU problem.
           | 
           | * Wouldn't it be useful to expose message boundaries to
           | applications, the way 9P, SCTP, and some SNA protocols do?
           | 
           | Early on, there were some message-oriented, rather than
           | stream-oriented, protocols on top of IP. Most of them died
           | out. RDP was one such. Another was QNet.[2] Both still have
           | assigned IP protocol numbers, but I doubt that a RDP packet
           | would get very far across today's internet.
           | 
           | This was a lack. TCP is not a great message-oriented
           | protocol.
           | 
           | * Do you really need urgent data?
           | 
           | The purpose of urgent data is so that when your slow Teletype
           | is typing away, and the recipient wants it to stop, there's a
           | way to break in. See [1], p. 8.
           | 
           | * It turns out that measuring round-trip time is really
           | important for retransmission, and TCP has no way of measuring
           | RTT on retransmitted packets, which can pose real problems
           | for correcting a ridiculously low RTT estimate, which results
           | in excessive retransmission.
           | 
           | Yes, reliable RTT is a problem.
           | 
           | * Do you really need a PUSH bit? C'mon.
           | 
           | It's another legacy thing to make TELNET work on slow links.
           | Is it even supported any more?
           | 
           | * A modest amount of overhead in the form of erasure-coding
           | bits would permit recovery from modest amounts of packet loss
           | without incurring retransmission timeouts, which is
           | especially useful if your TCP-layer protocol requires a
           | modest amount of packet loss for congestion control, as TCP
           | does.
           | 
           | * Also you could use a "congestion experienced" bit instead
           | of packet loss to detect congestion in the usual case. (TCP
           | did eventually acquire CWR and ECE, but not for many years.)
           | 
           | Originally, there was ICMP Source Quench for that, but
           | Berkley didn't put it in BSD, so nobody used it. Nobody was
           | sure when to send it or what to do when it was received.
           | 
           | * The fact that you can't resume a TCP connection from a
           | different IP address, the way you can with a Mosh connection,
           | is a serious flaw that seriously impedes nodes from moving
           | around the network.
           | 
           | That would require a security system to prevent hijacking
           | sessions.
           | 
           | [1] https://archive.org/stream/rfc854/rfc854.txt_djvu.txt
           | 
           | [2] https://en.wikipedia.org/wiki/List_of_IP_protocol_numbers
        
         | musicale wrote:
         | > how to create a reliable stream of data on top of an
         | unreliable datagram layer, then the solution that comes out
         | will look virtually identical to TCP. It just is the right
         | solution for the job
         | 
         | A stream of bytes made sense in the 1970s for remote terminal
         | emulation. It still sort of makes sense for email, where a
         | partial message is useful (though downloading headers in bulk
         | followed by full message on demand probably makes more sense.)
         | 
         | But in 2025 much of communication involves _messages_ that aren
         | 't useful if you only get part of them. It's also a pain to
         | have to serialize messages into a byte stream and then
         | deserialize the byte stream into messages (see: gRPC etc.) and
         | the byte stream ordering is costly, doesn't work well with
         | multipathing, and doesn't provide much benefit if you are only
         | delivering complete messages.
         | 
         | TCP without congestion control isn't particularly useful. As
         | you note traditional TCP congestion control doesn't respond
         | well to reordering. Also TCP's congestion control traditionally
         | doesn't distinguish between intentional packet drops (e.g. due
         | to buffer overflow) and packet loss (e.g. due to corruption.)
         | This means, for example that it can't be used directly over
         | networks with wireless links (which is why wi-fi has its own
         | link layer retransmission).
         | 
         | TCP's traditional congestion control is designed to fill
         | buffers up until packets are dropped, leading to undesirable
         | buffer bloat issues.
         | 
         | TCP's traditional congestion control algorithms (additive
         | increase/multiplicative decrease on drop) also have the poor
         | property that your data rate tends to drop as RTT increases.
         | 
         | TCP wasn't designed for hardware offload, which can lead to
         | software bottlenecks and/or increased complexity when you do
         | try to offload it to hardware.
         | 
         | TCP's three-way handshake is costly for one-shot RPCs, and slow
         | start means that short flows may never make it out of slow
         | start, neutralizing benefits from high-speed networks.
         | 
         | TCP is also poor for mobility. A connection breaks when your IP
         | address changes, and there is no easy way to migrate it. Most
         | TCP APIs expose IP addresses at the application layer, which
         | causes additional brittleness.
         | 
         | Additionally, TCP is poorly suited for optical/WDM networks,
         | which support dedicated bandwidth (signal/channel bandwidth as
         | well as data rate), and are becoming more important in
         | datacenters and as interconnects for GPU clusters.
         | 
         | etc.
        
         | kccqzy wrote:
         | Yeah the fact that the congestion control algorithm isn't part
         | of the wire protocol is very ahead of its time and gave the
         | protocol flexibility that's much needed in retrospective. OTOH
         | a lot of college courses about TCP don't really emphasize this
         | fact and still many people I interacted with thought that TCP
         | had a single defined congestion control algorithm.
        
         | o11c wrote:
         | TCP has another unfixable flaw - it cannot be properly secured.
         | Writing a security layer on top of TCP can at most _detect_ ,
         | not _avoid_ , attacks.
         | 
         | It is very easy for a malicious actor anywhere in the network
         | to _inject_ data into a connection. By contrast, it is much
         | harder for a malicious actor to _break_ the legitimate traffic
         | flow ... except for the fact that TCP RST grants any rando the
         | power to upgrade  "inject" to "break". This is quite common in
         | the wild for any traffic that does not look like HTTP, even
         | when both endpoints are perfectly healthy.
         | 
         | Blocking TCP RST packets using your firewall will significantly
         | improve reliability, but this still does not project you from
         | more advanced attackers which cause a desynchronization due to
         | forged sequence numbers with nonempty payload.
         | 
         | As a result, it is _mandatory_ for every application to support
         | a full-blown  "resume on a separate connection" operation,
         | which is complicated and hairy and also immediately runs into
         | the additional flaw that TCP is very slow to start.
         | 
         | ---
         | 
         | While not an outright _flaw_ , I also think it has become clear
         | by now that it is highly suboptimal for "address" and "port" to
         | be separate notions.
        
       | iberator wrote:
       | Its trivial to develop your own protocols on top of IP. It was
       | trivial like 15 years ago in python (without any libraries) just
       | handcrafted packets (arp, ip etc).
        
       | acosmism wrote:
       | i have an idea for a new javascript framework
        
       | cynicalsecurity wrote:
       | I can easily spot it's an AI written article, because it actually
       | explains the technology in understandable human language. A human
       | would have written it the way it was either presented to them in
       | university or in bloated IT books: absolutely useless.
        
         | omnimus wrote:
         | I can easily spot it's an AI written comment, because it
         | actually explains their idea in understandable human language
         | and brings nothing to the discussion. A human would have
         | written it the way they understand it and bring their opinions
         | along: absolutely useless.
        
         | pjjpo wrote:
         | At first wanted to give the benefit of the doubt that this is
         | sarcasm but gave a skim through history and I guess it's just a
         | committed anti-AI agenda.
         | 
         | Personally I found the tone of the article quite genuine and
         | the video at the end made a compelling case for it. Well I
         | figure you commented having actually read it.
         | 
         | Edit: I can't downvote but if I could it probably would have
         | been better than this comment!
        
       | shevy-java wrote:
       | > The internet is incredible. It's nearly impossible to keep
       | people away from.
       | 
       | Well ... he seems very motivated. I am more skeptical.
       | 
       | For instance, Google via chrome controls a lot of the internet,
       | even more so via its search engine, AI, youtube and so forth.
       | 
       | Even aside from this people's habits changed. In the 1990s
       | everyone and their Grandma had a website. Nowadays ... it is a
       | bit different. We suddenly have horrible blogging sites such as
       | medium.com, pestering people with popups. Of course we also had
       | popups in the 1990s, but the diversity was simply higher.
       | Everything today is much more streamlined it seems. And top-down
       | controlled. Look at Twitter, owned by a greedy and selfish
       | billionaire. And the US president? Super-selfish too. We lost
       | something here in the last some 25 years.
        
         | __MatrixMan__ wrote:
         | You're talking about the web, which is merely an app with the
         | internet as its platform. We can scrap it and still use the
         | internet to build a different one.
        
       | FrankWilhoit wrote:
       | TCP is one of the great works of the human mind, but it did not
       | envision the dominance of semiconnected networks.
        
         | cpach wrote:
         | Are you referring to NAT?
        
           | FrankWilhoit wrote:
           | No. TCP _likes_ zero packet loss (connected), and it
           | _understands_ 100% packet loss (disconnected). Its weakness
           | is scenarios (semiconnected) in which packet loss is
           | constantly fluctuating between substantial and nearly-total.
           | It doesn 't know what is going on, and it may cope or it may
           | not, because its designers did not envision a future in which
           | most networks have a semiconnected last mile; but that is
           | where we are. Without things like forward error correction,
           | TCP would be nearly useless over wireless. It is interesting
           | to envision a layer-4 protocol that would incorporate FEC-
           | like capabilities.
        
         | convolvatron wrote:
         | if you went back to 1981 and said 'yeah, this is great. but
         | what we really want to do is not have an internet, but kind of
         | a piecewise internet. instead of a global address we'll use
         | addresses that have a narrower scope. and naturally as
         | consequence of this we'll need to start distinguishing between
         | nodes that everyone can reach, service nodes, and nodes that no
         | one can reach - client nodes. and as a consequence of this
         | we'll start building links that are asymmetric in bandwidth,
         | since one direction is only used for requests and acks and not
         | any data volume.'
         | 
         | they would have looked at you and asked straight out what you
         | hoped to gain by making these things distinguished, because it
         | certainly complicates things.
        
           | FrankWilhoit wrote:
           | Wireless networks are always going to have asymmetries of
           | transmit power. Everything flows from that. ALOHAnet was
           | 1971.
        
       | api wrote:
       | It's worth considering how the tiny computers of the era forced a
       | simple clean design. IPv6 was designed starting in the early 90s
       | and they couldn't resist loading it up with extensions, though
       | the core protocol remains fine and is just IP with more bits.
       | (Many of the extensions are rarely if ever used.)
       | 
       | If the net were designed today it would be some complicated
       | monstrosity where every packet was reminiscent of X.509 in terms
       | of arcane complexity. It might even have JSON in it. It would be
       | incredibly high overhead and we'd see tons of articles about how
       | someone made it fast by leveraging CPU vector instructions or a
       | GPU to parse it.
       | 
       | This is called Eroom's law, or Moore's law backwards, and it is
       | very real. Bigger machines let programmers and designers loose to
       | indulge their desire to make things complicated.
        
         | rubatuga wrote:
         | What are some extensions? just curious.
        
           | api wrote:
           | IPSec was a big one that's now borderline obsolete, though it
           | is still used for VPNs and was back ported to IPv4.
           | 
           | Many networking folks including myself consider IPv6 router
           | advertisements and SLAAC to be inferior, in practice, to
           | DHCPv6, and that it would be better if we'd just left IP
           | assignment out of the spec like it was in V4. Right now we
           | have this mess where a lot of nets prefer or require DHCPv6
           | but some vendors, like apparently Android, refuse to support
           | it.
           | 
           | The rules about how V6 addresses are chopped up and assigned
           | are wasteful and dumb. The entire V4 space could have been
           | mapped onto /32 and an encapsulation protocol made to allow
           | V4 to carry V6, providing a seamless upgrade path that does
           | not require full upgrade of the whole core, but that would
           | have been too logical. Every machine should get like a /96 so
           | it can use 32 bits of space to address apps, VMs, containers,
           | etc. As it stands we waste 64 bits of the space to make SLAAC
           | possible, as near as I can tell. The SLAAC tail must have
           | wagged the dog in that people thought this feature was cool
           | enough to waste 8 bytes per packet.
           | 
           | The V6 header allows extension bits that are never used and
           | blocked by most firewalls. There's really no point in them
           | existing since middle boxes effectively freeze the base
           | protocol in stone.
           | 
           | Those are some of the big ones.
           | 
           | Basically all they should have done was make IPs 64 or 128
           | bits and left everything else alone. But I think there was a
           | committee.
           | 
           | As it stands we have what we have and we should just treat V6
           | as IP128 and ignore the rest. I'm still in favor of the
           | upgrade. V4 is too small, full stop. If we don't enlarge the
           | addresses we will completely lose end to end connectivity as
           | a supported feature of the network.
        
             | toast0 wrote:
             | > Every machine should get like a /96 so it can use 32 bits
             | of space to address apps, VMs, containers, etc.
             | 
             | You can just SLAAC some more addresses for whatever you
             | want. Although hopefully you don't use more than the ~ARP~
             | NDP table size on your router; then things get nasty. This
             | should be trivial for VMs, and could be made possible for
             | containers and apps.
             | 
             | > The V6 header allows extension bits that are never used
             | and blocked by most firewalls. [...] Basically all they
             | should have done was make IPs 64 or 128 bits and left
             | everything else alone.
             | 
             | This feels contradictory... IPv4 also had extension headers
             | that were mostly unused and disallowed. V6 changed the
             | header extension mechanism, but offers the same
             | opportunities to try things that might work on one network
             | but probably won't work everywhere.
        
       | throw0101a wrote:
       | Any love for SCTP?
       | 
       | > _The Stream Control Transmission Protocol (SCTP) is a computer
       | networking communications protocol in the transport layer of the
       | Internet protocol suite. Originally intended for Signaling System
       | 7 (SS7) message transport in telecommunication, the protocol
       | provides the message-oriented feature of the User Datagram
       | Protocol (UDP) while ensuring reliable, in-sequence transport of
       | messages with congestion control like the Transmission Control
       | Protocol (TCP). Unlike UDP and TCP, the protocol supports
       | multihoming and redundant paths to increase resilience and
       | reliability._
       | 
       | [...]
       | 
       | > _SCTP may be characterized as message-oriented, meaning it
       | transports a sequence of messages (each being a group of bytes),
       | rather than transporting an unbroken stream of bytes as in TCP.
       | As in UDP, in SCTP a sender sends a message in one operation, and
       | that exact message is passed to the receiving application process
       | in one operation. In contrast, TCP is a stream-oriented protocol,
       | transporting streams of bytes reliably and in order. However TCP
       | does not allow the receiver to know how many times the sender
       | application called on the TCP transport passing it groups of
       | bytes to be sent out. At the sender, TCP simply appends more
       | bytes to a queue of bytes waiting to go out over the network,
       | rather than having to keep a queue of individual separate
       | outbound messages which must be preserved as such._
       | 
       | > _The term multi-streaming refers to the capability of SCTP to
       | transmit several independent streams of chunks in parallel, for
       | example transmitting web page images simultaneously with the web
       | page text. In essence, it involves bundling several connections
       | into a single SCTP association, operating on messages (or chunks)
       | rather than bytes._
       | 
       | * https://en.wikipedia.org/wiki/Stream_Control_Transmission_Pr...
        
         | nesarkvechnep wrote:
         | As a BSD enjoyer and paid to write Erlang, I have nothing but
         | love for SCTP.
        
         | o11c wrote:
         | No, SCTP only fixes half of a problem, but also gratuitously
         | introduces several additional flaws, even ignoring the "router
         | support" problem.
         | 
         | The only good answer is "a reliability layer on top of UDP";
         | fortunately everybody is now rallying around QUIC as the choice
         | for that.
        
       | mlhpdx wrote:
       | TCP being the "default" meant it was chosen when the need for
       | ordering and uniform reliability wasn't there. That was fine but
       | left systems working less well than they could have with more
       | carefully chosen underpinnings. With HTTP/3 gaining traction, and
       | HTTP being the "next level up default choice" things potentially
       | get better. The issue I see is that QUIC is far more complex, and
       | the new power is fantastic for a few but irrelevant to most.
       | 
       | UDP has its place as well, and if we have more simple and
       | effective solutions like WireGuard's handshake and encryption on
       | top of it we'd be better off as an industry.
        
       | bmacho wrote:
       | Otherwise please use the original title, unless it is misleading
       | or linkbait; don't editorialize.
       | 
       | https://news.ycombinator.com/newsguidelines.html
        
         | cosmic_quanta wrote:
         | Strange, the title was definitely the original title earlier
         | today
        
       | rfmoz wrote:
       | RUDP from Plan9 was a nice step between TCP and UDP -
       | https://en.wikipedia.org/wiki/Reliable_User_Datagram_Protoco...
        
       | tolerance wrote:
       | For the record I thought the TLD for this page was 'cerfbound',
       | which sounds like the name for the race horse of the internet.
        
       | brcmthrowaway wrote:
       | How much energy does the internet use?
        
       | brcmthrowaway wrote:
       | Do extraterrestrial civilizations also use TCP?
        
       | jccx70 wrote:
       | Crap, that thing, the code etc... has been posted thousand of
       | times on the internet. The final quote "Oh I am so happy this
       | works", ok thanks bye.
        
       | jiggawatts wrote:
       | The congestion control algorithm in TCP has some interesting
       | effects on throughput that a lot of developers aren't aware of.
       | 
       | For example, sending some data on a fresh TCP connection is
       | _slow_ , and the "ramp up time" to the bandwidth of the network
       | is almost entirely determined by the _latency_.
       | 
       | Amazing speed ups can be achieved in a data centre network by
       | shaving microseconds off the round trip time!
       | 
       | Similarly, many (all?) TCP stacks count segments, not bytes, when
       | determining this ramp up rate. This means that jumbo frames can
       | provide 6x the bandwidth during this period!
       | 
       | If you read about the network design of AWS, they put a lot of
       | effort into low switching latency and enabling jumbo frames.
       | 
       | The real pros do this kind of network tuning, everyone else
       | wonders why they don't get anywhere near 10 Gbps through a 10
       | Gbps link.
        
       ___________________________________________________________________
       (page generated 2025-11-15 23:00 UTC)