[HN Gopher] TCP, the workhorse of the internet
___________________________________________________________________
TCP, the workhorse of the internet
Author : signa11
Score : 261 points
Date : 2025-11-15 06:37 UTC (16 hours ago)
(HTM) web link (cefboud.com)
(TXT) w3m dump (cefboud.com)
| zkmon wrote:
| I hate to think of the future of these nice blog posts, that need
| to struggle to convince the readers about the _organic_ level of
| their content.
| stavros wrote:
| Wait, _can_ you actually just use IP? Can I just make up a packet
| and send it to a host across the Internet? I 'd think that all
| the intermediate routers would want to have an opinion about my
| packet, caring, at the very least, that it's either TCP or UDP.
| gsliepen wrote:
| They shouldn't; the whole point is that the IP header is enough
| to route packets between endpoints, and only the endpoints
| should care about any higher layer protocols. But unfortunately
| some routers do, and if you have NAT then the NAT device needs
| to examine the TCP or UDP header to know how to forward those
| packets.
| jadamson wrote:
| Notably, QUIC (and thus HTTP/3) uses UDP instead of a new
| protocol number for this reason.
| stavros wrote:
| Yeah, this is basically what I was wondering, why QUIC used
| UDP instead of their own protocol if it's so
| straightforward. It seems like the answer may be "it's not
| as interference-free as they'd like it".
| Twisol wrote:
| UDP pretty much just tacks a source/destination port pair
| onto every IP datagram, so its primary function is to
| allow multiple independent UDP peers to coexist on the
| same IP host. (That is, UDP just multiplexes an IP link.)
| UDP as a protocol doesn't add any additional network
| guarantees or services on top of IP.
|
| QUIC is still "their own protocol", just implemented as
| another protocol nested inside a UDP envelope, the same
| way that HTTP is another protocol typically nested inside
| a TCP connection. It makes some sense that they'd
| piggyback on UDP, since (1) it doesn't require an
| additional IP protocol header code to be assigned by
| IANA, (2) QUIC definitely wants to coexist with other
| services on any given node, and (3) it allows whatever
| middleware analyses that exist for UDP to apply naturally
| to QUIC applications.
|
| (Regarding (3) specifically, I imagine NAT in particular
| requires cooperation from residential gateways, including
| awareness of both the IP and the TCP/UDP port. Allowing a
| well-known outer UDP header to surface port information,
| instead of re-implementing ports somewhere in the QUIC
| header, means all existing NAT implementations should
| work unchanged for QUIC.)
| conradludgate wrote:
| When it comes to QUIC, QUIC works best with unstable end-
| user internet (designed for http3 for the mobile age).
| Most end-user internet access is behind various layers of
| CGNAT. The way that NAT works is by using your port
| numbers to increase the address space. If you have 2^32
| IPv4 addresses, you have 2^48 IPv4 address+port pairs.
| All these NAT middleboxes speak TCP and UDP only.
|
| Additionally, firewalls are also designed to filter out
| any weird packets. If the packet doesn't look like you
| wanted to receive it, it's dropped. It usually does this
| by tracking open ports just like NAT, therefore many
| firewalls also don't trust custom protocols.
| lxgr wrote:
| It's effectively impossible to use anything other than
| TCP or UDP these days.
|
| Some people here will argue that it actually really is,
| and that everybody experiencing issues is just on a
| really weird connection or using broken hardware, but
| those weird connections and bad hardware make up the
| overwhelming majority of Internet connections these days.
| hylaride wrote:
| Using UDP means QUIC support is as "easy" as adding it to
| the browser and server software. To add it as a separate
| protocol would have involved all OS's needing to add
| support for it into their networking stacks and that
| would have taken ages and involved more politics. The
| main reason QUIC was created was so that Google could
| more effectively push ads and add tracking, remember. The
| incentives were not there for others to implement it.
| toast0 wrote:
| Yeah, so... You can do it. But only for some values of
| you. In a NAT world, the NAT needs to understand the
| protocol so that it can adjust the core multiplexing in
| order to adjust addresses. A best effort NAT could let
| one internal IP at a time connect to each external IP on
| an unknown protocol, but that wouldn't work for QUIC:
| Google expects multiple clients behind a NAT to connect
| to its service IPs. It can often works for IP tunneling
| protocols where at most one connection to an external IP
| isn't super restrictive. But even then, many NATs won't
| pass unknown IP protocols at all.
|
| Most firewalls will drop unknown IP protocols. Many will
| drop a lot of TCP; some drop almost all UDP. This is why
| so much stuff runs over tcp ports 80 and 443; it's almost
| always open. QUIC/HTTP/3 encourages opening of udp/443,
| so it's a good port to run unrelated things over too.
|
| Also, given that SCTP had similar goals to QUIC and never
| got much deployment or support in OSes and NATs and
| firewalls and etc. It's a clear win to just use UDP and
| get something that will just work on a large portion of
| networks.
| Hikikomori wrote:
| Can also NAT using IP protocol.
| Twisol wrote:
| As far as I'm aware, sure you can. TCP packets and UDP
| datagrams are wrapped in IP datagrams, and it's the job of an
| IP network to ship your data from point A (sender) to point B
| (receiver). Nodes along the way might do so-called "deep packet
| inspection" to snoop on the payload of your IP datagrams (for
| various reasons, not all nefarious), but they don't need to do
| that to do the basic job of routing. From a semantic
| standpoint, the information in the TCP and UDP headers (as part
| of the IP payload) is only there to govern interactions between
| the two endpoint parties. (For instance, the "port" of a TCP or
| UDP packet is a node-local identifier for one of many services
| that might exist at the IP address the packet was routed to,
| allowing many services to coexist at the same node.)
| stavros wrote:
| Hmm, I thought intermediate routers use the TCP packet's bits
| for congestion control, no? Though I guess they can probably
| just use the destination IP for that.
| Twisol wrote:
| They probably _can_ do deep /shallow packet inspection for
| that purpose (being one of the non-nefarious applications I
| alluded to), but that's not to say their correct
| functioning _relies_ on it. Those routers also need to
| support at least UDP, and UDP provides almost no extra
| information at that level -- just the source and
| destination ports (so, perhaps QoS prioritization) and the
| inner payload 's length and checksum (so, perhaps dropping
| bad packets quickly).
|
| If middleware decides to do packet inspection, it better
| make sure that any behavioral differences (relative to not
| doing any inspection) is strictly an optimization and does
| not impact the correctness of the link.
|
| Also, although I'm not a network operator by any stretch,
| my understanding is that TCP congestion control is
| primarily a function of the endpoints of the TCP link, not
| the IP routers along the way. As Wikipedia explains [0]:
|
| > Per the end-to-end principle, congestion control is
| largely a function of internet hosts, not the network
| itself.
|
| [0]: https://en.wikipedia.org/wiki/TCP_congestion_control
| toast0 wrote:
| _Most_ intermediate routers don 't care much. Lookup the
| destination IP in the routing table, forward to the next
| hop, no time for anything else.
|
| Classic congestion control is done on the sender alone. The
| router's job is simply to drop packets when the queue is
| too large.
|
| Maybe the router supports ECN, so _if_ there 's a queue
| going to the next hop, it will look for protocol specific
| ECN headers to manipulate.
|
| Some network elements do more than the usual routing work.
| A traffic shaper might have per-user queues with outbound
| bandwidth limits. A network accelerator may effectively
| reterminate TCP in hopes of increasing acheivable
| bandwidth.
|
| Often, the router has an aggregated connection to the next
| hop, so it'll use a hash on the addresses in the packet to
| choose which of the underlying connections to use. That
| hash could be based on many things, but it's not uncommon
| to use tcp or udp port numbers if available. This can also
| be used to chose between equally scored next hops and
| that's why you often see several different paths during a
| traceroute. Using port numbers is helpful to balance
| connections from IP A to IP B over multiple links. If you
| us an unknown protocol, even if it is multiplexed into
| ports or similar (like tcp and udp), the different streams
| will likely always hash onto the same link and you won't be
| able to exceed the bandwidth of a single link and a damaged
| or congested link will affect all or none of your
| connections.
| HPsquared wrote:
| Huh. So it's literally "TCP over IP" like the name suggests.
| ilkkao wrote:
| You can definitely craft an IP packet by hand and send it. If
| it's IPv4, you need to put a number between 0 and 255 to the
| protocol field from this list:
| https://www.iana.org/assignments/protocol-numbers/protocol-n...
|
| Core routers don't inspect that field, NAT/ISP boxes can. I
| believe that with two suitable dedicated linux servers it is
| very possible to send and receive single custom IP packet
| between them even using 253 or 254 (= Use for experimentation
| and testing [RFC3692]) as the protocol number
| Twisol wrote:
| > If it's IPv4, you need to put a number between 0 and 255 to
| the protocol field from this list:
|
| To save a skim (though it's an interesting list!), protocol
| codes 253 and 254 are suitable "for experimentation and
| testing".
| stavros wrote:
| Very interesting, thanks!
| inglor_cz wrote:
| This is an interesting list; it makes you appreciate just how
| many obscure protocols have died out in practice. Evolution
| in networks seems to mimic evolution in nature quite well.
| morcus wrote:
| What happens when the remaining 104 unassigned protocol
| numbers are exhausted?
| marcosdumay wrote:
| People will start overloading the numbers.
|
| I do hope we'll have stopped using IPv4 by then... But
| well, a decade after address exhaustion we are still on it,
| so who knows?
| kbolino wrote:
| IPv6 uses the exact same 8-bit codes as IPv4.
|
| It uses them a little differently -- in IPv4, there is
| one protocol per packet, while in IPv6, "protocols" can
| be chained in a mechanism called extension headers -- but
| this actually makes the problem of number exhaustion more
| acute.
| brewmarche wrote:
| What if extension headers made it better? We could come
| up with a protocol consisting solely of a larger Next
| Header field and chain this pseudo header with the actual
| payload whenever the protocol number is > 255. The same
| idea could also be used in IPv4.
| kbolino wrote:
| I didn't mean to imply otherwise. But, as you say, this
| is equally applicable to IPv4 and IPv6. There were a lot
| of issues solved by IPv6, but "have even more room for
| non-TCP/UDP transports" wasn't one of them (and didn't
| need to be, tbqh).
| hylaride wrote:
| We're about half-way to exhausted, but a huge chunk of the
| ones assigned are long deprecated and/or proprietary
| technologies and could conceivably be reassigned.
| Assignment now is obviously a lot more conservative than it
| was in the 1980s.
|
| There is sometimes drama with it, though. Awhile back, the
| OpenBSD guys created CARP as a fully open source router
| failover protocol, but couldn't get an official IP number
| and ended up using the same one as VRRP. There's also a lot
| of historical animosity that some companies got numbers for
| proprietary protocols (eg Cisco got one for its then-
| proprietary EIGRP).
|
| https://en.wikipedia.org/wiki/List_of_IP_protocol_numbers
| Ekaros wrote:
| Probably use of some type of options. Up to 320 bits, so I
| think there is reasonable amount of space there for good
| while. Ofc, this makes really messy processing, but with
| current hardware not impossible.
| rfmoz wrote:
| Playing with protocol number change usually results in
| "Protocol Unreachable" or "Malformed Packet" from your OS.
| LeoPanthera wrote:
| You know I've always wondered if you could run Kermit*-over-IP,
| without having TCP inbetween.
|
| *The protocol.
| Karrot_Kream wrote:
| If there's no form of NAT or transport later processing along
| your path between endpoints you shouldn't have an issue. But
| NAT and transport and application layer load balancing are very
| common on the net these days so YMMV.
|
| You might have more luck with an IPv6 packet.
| NooneAtAll3 wrote:
| something like this?
|
| https://en.wikipedia.org/wiki/IP_over_Avian_Carriers
| Twisol wrote:
| That would be IP over some lower level physical layer, not
| some custom content stuffed into an IP packet :)
|
| (It's absolutely worth reading some of those old April Fools'
| RFCs, by the way [0]. I'm a big fan of RFC 7168, which
| introduced HTTP response code 418 "I'm a teapot".)
|
| [0]: https://en.wikipedia.org/wiki/April_Fools%27_Day_Request
| _for...
| immibis wrote:
| Yes but not if you or they are behind NAT. It's a shame port
| numbers aren't in IP.
| nly wrote:
| The reason you wouldn't do that is IP doesn't give you a
| mechanism to share an IP address with multiple processes on a
| host, it just gets your packets to a particular host.
|
| As soon as you start thinking about having multiple services on
| a host you end up with the idea of having a service id or
| "port"
|
| UDP or UDP Lite gives you exactly that at the cost of 8 bytes,
| so there's no real value in not just putting everything on top
| of UDP
| xorcist wrote:
| > caring, at the very least, that it's either TCP or UDP.
|
| You left out ICMP, my favourite! (And a lot more important in
| IPv6 than in v4.)
|
| Another pretty well known protocol that is neither TCP nor UDP
| is IPsec. (Which is really two new IP protocols.) People really
| did design proper IP protocols still in the 90s.
|
| > Can I just make up a packet and send it to a host across the
| Internet?
|
| You should be able to. But if you are on a corporate network
| with a really strict firewalling router that only forwards
| traffic it likes, then likely not. There are also really crappy
| home routers which gives similar problems from the other end of
| enterpriseness.
|
| NAT also destroyed much of the end-to-end principle. If you
| don't have a real IP address and relies on a NAT router to
| forward your data, it needs to be in a protocol the router
| recognizes.
|
| Anyway, for the past two decades people have grown tired of
| that and just piles hacks on top of TCP or UDP instead. That's
| sad. Or who am I kidding? Really it's on top of HTTP. HTTP will
| likely live on long past anything IP.
| gruturo wrote:
| > NAT also destroyed much of the end-to-end principle. If you
| don't have a real IP address and relies on a NAT router to
| forward your data, it needs to be in a protocol the router
| recognizes.
|
| Not necessarily. Many protocols can survive being NATed if
| they don't carry IP/port related information inside their
| payload. FTP is a famous counterexample - it uses a control
| channel (TCP21) which contains commands to open data channels
| (TCP20), and those commands specify IP:port pairs, so,
| depending on the protocol, a NAT router has to rewrite them
| and/or open ports dynamically and/or create NAT entries on
| the fly. A lot of other stuff has no need for that and will
| happily go through without any rewriting.
| lxgr wrote:
| Of course NAT allows application layer protocols _layered
| on TCP or UDP_ to pass through without the NAT
| understanding the application layer - otherwise, NATted
| networks would be entirely broken.
|
| The end-to-end principle at the IP layer (i.e. having the
| IP forwarding layer be agnostic to the transport layer
| protocols above it) is still violated.
| Hikikomori wrote:
| You can NAT on IP protocol as well, just not to more than
| one per external IP.
| brewmarche wrote:
| I guess most people mean NAPT/PAT when they say NAT
| xorcist wrote:
| I think we agree. Of course a NAT router with an
| application proxy such as FTP or SIP can relay and rewrite
| traffic as needed.
|
| TCP and UDP have port numbers that the NAT software can
| extract and keep state tables for, so we can send the
| return traffic to its intended destination.
|
| For unknown IP protocols that is not possible. It may at
| best act like network diode, which is one way of violating
| the end-to-end principle.
| Hikikomori wrote:
| You can NAT on IP protocol as well, just not to more than
| one per external IP.
| gruturo wrote:
| Actually the observation about ports being mostly a
| TCP/UDP feature is a very good point I had failed to
| consider. This would indeed greatly limit the ability of
| a NAT gateway - it could keep just a state table of IP
| src/dst pairs and just direct traffic back to its source,
| but it's indeed very crude. Thanks for bringing it up!
| lxgr wrote:
| > You left out ICMP, my favourite!
|
| Even ICMP has a hard time traversing NATs and firewalls these
| days, for largely bad reasons. Try pinging anything in AWS,
| for example...
| 6031769 wrote:
| Have to say that I don't encounter any problems pinging
| hosts in AWS.
|
| If any host is firewalling out ICMP then it won't be
| pingable but that does not depend on the hosting provider.
| AWS is no better or worse than any other in that regard,
| IME.
| Hikikomori wrote:
| Doesn't really have anything to do with nat though.
| xyzzyz wrote:
| There is little point in inventing new protocols, given how
| low the overhead of UDP is. That's just 8 bytes per packet,
| and it enables going through NAT. Why come up with a new
| transport layer protocol, when you can just use UDP framing?
| mlhpdx wrote:
| Agreed. Building a custom protocol seems "hard" to many
| folks who are doing it without any fear on top of HTTP. The
| wild shenanigans I've seen with headers, query params and
| JSON make me laugh a little. Everything as text is
| _actually_ hard.
|
| A part of the problem with UDP is the lack of good
| platforms and tooling. Examples as well. I'm trying to help
| with that, but it's an uphill battle for sure.
| GardenLetter27 wrote:
| Probably not, loads of routers are even blocking parts of ICMP.
| eqvinox wrote:
| That's firewalls (or others), not routers. If it blocks
| things, it's by definition not a router anymore.
| lxgr wrote:
| You can call the things mangling IP addresses and TCP/UDP
| ports what you want, but that will unfortunately not make
| them go away and stop throwing away non-TCP/UDP traffic.
| marcosdumay wrote:
| Both things come on the same box nowadays.
|
| There are many routers that don't care at all about what's
| going through them. But there aren't any firewalls that
| don't route anymore (not even at the endpoints).
| rubatuga wrote:
| And by your definition my home router is not a router since
| it does NAT? There's really no point in arguing semantics
| like this.
| eqvinox wrote:
| We're discussing nonstandard IP protocols. In that
| context, your home router is a CPE, and not described by
| the term "router" without further qualifiers, because
| that's the level the discussion is at. I'm happy to call
| it a router when talking to the neighbors, when I'm not
| discussing IP protocols with them.
| gruturo wrote:
| Yep it's full of IP protocols other than the well-known TCP,
| UDP and ICMP (and, if you ever had the displeasure of learning
| IPSEC, its AH and ESP).
|
| A bunch of multicast stuff (IGMP, PIM)
|
| A few routing protocols (OSPF, but notably not BGP which just
| uses TCP, and (usually) not MPLS which just goes over the wire
| - it sits at the same layer as IP and not above it)
|
| A few VPN/encapsulation solutions like GRE, IP-in-IP, L2TP and
| probably others I can't remember
|
| As usual, Wikipedia has got you covered, much better than my
| own recollection:
| https://en.wikipedia.org/wiki/List_of_IP_protocol_numbers
| lxgr wrote:
| To GPs point, though, most of these will unfortunately be
| dropped by most middleboxes for various reasons.
|
| Behind a NA(P)T, you can obviously only use those protocols
| that the translator knows how to remap ports for.
| Hikikomori wrote:
| Can also do 1:1 NAT for IP protocols like ipsec, or your
| own protocol.
| lxgr wrote:
| Yes, but who else does? Network effects are important in
| a network.
| eqvinox wrote:
| > I'd think that all the intermediate routers would want to
| have an opinion about my packet, caring, at the very least,
| that it's either TCP or UDP.
|
| They absolutely don't. Routers are layer 3 devices; TCP & UDP
| are layer 4. The only impact is that the ECMP flow hashes will
| have less entropy, but that's purely an optimization thing.
|
| Note TCP, UDP and ICMP are nowhere near all the protocols
| you'll commonly see on the internet -- at minimum, SCTP, GRE,
| L2TP and ESP are reasonably widespread (even a tiny fraction of
| traffic is still a giant number considering internet scales).
|
| You can send whatever protocol number with whatever contents
| your heart desires. Whether the other end will do anything
| useful with it is another question.
| lxgr wrote:
| > They absolutely don't. Routers are layer 3 devices;
|
| Idealized routers are, yes.
|
| Actual IP paths these days usually involve at least one NAT,
| and these will absolutely throw away anything other than TCP,
| UDP, and if you're lucky ICMP.
| eqvinox wrote:
| See nearby comment about terminology. Either we're
| discussing odd IP protocols, then the devices you're
| describing aren't just "routers" (and particularly what
| you're describing is not part of a "router"), or we're not
| discussing IP protocols, then we're not having this thread.
|
| And note the GP talked about "intermediate routers". That's
| the ones in a telco service site or datacenter by my book.
| gsliepen wrote:
| If you start with the problem of how to create a reliable stream
| of data on top of an unreliable datagram layer, then the solution
| that comes out will look virtually identical to TCP. It just is
| the right solution for the job.
|
| The three drawbacks of the original TCP algorithm were the window
| size (the maximum value is just too small for today's speeds),
| poor handling of missing packets (addressed by extensions such as
| selective-ACK), and the fact that it only manages one stream at a
| time, and some applications want multiple streams that don't
| block each other. You could use multiple TCP connections, but
| that adds its own overhead, so SCTP and QUIC were designed to
| address those issues.
|
| The congestion control algorithm is not part of the on-the-wire
| protocol, it's just some code on each side of the connection that
| decides when to (re)send packets to make the best use of the
| available bandwidth. Anything that implements a reliable stream
| on top of datagrams needs to implement such an algorithm. The
| original ones (Reno, Vegas, etc) were very simple but already did
| a good job, although back then network equipment didn't have
| large buffers. A lot of research is going into making better
| algorithms that handle large buffers, large roundtrip times,
| varying bandwidth needs and also being fair when multiple
| connections share the same bandwidth.
| bobmcnamara wrote:
| > If you start with the problem of how to create a reliable
| stream of data on top of an unreliable datagram layer, then the
| solution that comes out will look virtually identical to TCP.
|
| I'll add that at the time of TCP's writing, the telephone
| people far outnumbered everyone else in the packet switching vs
| circuit switching debate. TCP gives you a virtual circuit over
| a packet switched network as a pair of reliable-enough
| independent byte streams over IP. This idea, that the endpoints
| could implement reliability through retransmission came from an
| earlier French network, Cylades, and ends up being a core
| principle of IP networks.
| Karrot_Kream wrote:
| We're still "suffering" from the latency and jitter effects
| of the packet switching victory. (The debate happened before
| my time and I don't know if I would have really agreed with
| circuit switching.) Latency and jitter on the modern Internet
| are very best effort emphasis on "effort".
| lxgr wrote:
| True, but with circuit switching, we'd probably still be
| paying by the minute, so most of these
| jittery/bufferbloated connections would not exist in the
| first place.
| hylaride wrote:
| Also, circuit switching is harder (well, more expensive)
| to do at scale, especially with different providers
| (probably a reason the traditional telecoms pushed it so
| hard - to protect their traditional positions). Even
| modern circuit technologies like MPLS are mostly
| contained to within a network (though there can be and is
| cross-networking peering) and aren't as connection
| oriented as previous circuits like ATM or Frame Relay.
| bcrl wrote:
| Circuit switching is not harder to do, it's simply less
| efficient. In the PSTN and ISDN world, circuits consumed
| bandwidth regardless of whether it was actively in use or
| not. There was no statistical multiplexing as a result.
|
| Circuit switching packets means carrying metadata about
| the circuit rather than simply using the destination MAC
| or IP address to figure out routing along the way. ATM
| took this to an extreme with nearly 10% protocol overhead
| (48 bytes of payload in a 53 byte cell) and 22 bytes of
| wasted space in the last ATM cell for a 1500 byte
| ethernet packet. That inefficiency is what really hurt.
| Sadly the ATM legacy lives on in GPON and XGSPON -- EPON
| / 10GEPON are far better protocols. As a result, GPON and
| XGSPON require gobs of memory per port for frame
| reassembly (128 ONUs x 8 priorities x 9KB for jumbo
| frames = 9MB per port worst case), whereas EPON / 10GEPON
| do not.
|
| MPLS also has certain issues that are solved by using the
| IPv6 next header feature which avoids having to push /
| pop headers (modifying the size of the packet which has
| implications for buffering and the associated QoS issues
| making the hardware more complex) in the transport
| network. MPLS labels made sense at the time of
| introduction in the early 2000s when transport network
| hardware was able to utilize a small table to look up the
| next hop of a frame instead of doing a full route lookup.
| The hardware constraints of those early days requiring
| small SRAMs have effectively gone away since modern ASICs
| have billions of transistors which make on chip route
| tables sufficient for many use-cases.
| jandrese wrote:
| As someone who at one point was working with people that
| were trying to keep an ATM network reliable there is a
| reason packet switching won.
| wmf wrote:
| L4S should improve latency and jitter.
| musicale wrote:
| The telephone people were basically right with their
| criticisms of TCP/IP such as:
|
| What about QoS? Jitter, bandwidth, latency, fairness
| guarantees? What about queuing delay? What about multiplexing
| and tunneling? Traffic shaping and engineering? What about
| long-haul performance? Easy integration with optical circuit
| networks? etc. ATM addressed these issues, but TCP/IP did
| not.
|
| All of these things showed up again once you tried to do VOIP
| and video conferencing, and in core ISPs as well as access
| networks, and they weren't (and in many cases still aren't)
| easy to solve.
| cachius wrote:
| How could a circuit switched network look like at today's
| scale?
| musicale wrote:
| The optical layer is still circuit-switched.
|
| Also MPLS is basically a virtual circuit network.
| hollerith wrote:
| If that is true, then why did the telcos rapidly move the
| entire backbone of the telephone network to IP in the
| 1990s?
|
| And why are they trying to persuade regulators to let them
| get rid of the remaining (peripheral) part of the old
| circuit-switched network, i.e., to phase out old-school
| telephone hardware, requiring all customers to have IP
| phone hardware?
| kragen wrote:
| Packet switching is cheaper; even though it can't make
| any guarantees about latency and bandwidth the way
| circuit switching could, it uses scarce long-haul
| bandwidth more efficiently. I _regularly_ see people
| falling off video calls, like, multiple times a week. So,
| in some ways, it 's a worse product, but costs much less.
| musicale wrote:
| They moved to IP because it was improving faster in speed
| and commoditization vs. ATM. But in order to make it
| work, they had to figure out how to make QoS work on IP
| networks, which wasn't easy. It still isn't easy (see:
| crappy zoom calls.)
|
| Modern circuit switched networks use optics rather than
| the legacy copper circuits which date back to telegraphy.
| mulmen wrote:
| You can criticize something and still select it as the
| best option. I do this daily with Apple. If you can't
| find a flaw in a technical solution you probably aren't
| looking close enough.
| NooneAtAll3 wrote:
| > If you start with the problem of how to create a reliable
| stream of data on top of an unreliable datagram layer
|
| > poor handling of missing packets
|
| so it was poor at exact thing it was designed for?
| allarm wrote:
| It was a trade off at the time. Selective acknowledgments
| require more resources.
| silvestrov wrote:
| Poor for high speed connections ( _) or very unreliable
| connections.
|
| _ ) compared to when TCP was invented.
|
| When I started at university the ftp speed from the US during
| daytime was 500 bytes per second! You don't have many
| unacknowledged packages in such a connection.
|
| Back then even a 1 megabits/sec connection was super high
| speed and very expensive.
| rini17 wrote:
| Might be obvious in hindsight, but it was not clear at all back
| then, that the congestion is manageable this way. There were
| legitimate concerns that it will all just melt down.
| rkagerer wrote:
| _it only manages one stream at a time_
|
| I'll take flak for saying it, but I feel web developers are
| partially at fault for laziness on this one. I've often seen
| them trigger a swath of connections (e.g. for uncoordinated
| async events), when carefully managed multiplexing over one or
| a handful will do just fine.
|
| Eg. In prehistoric times I wrote a JavaScript library that let
| you queue up several downloads over one stream, with control
| over prioritization and cancelability.
|
| It was used in a GreaseMonkey script on a popular dating
| website, to fetch thumbnails and other details of all your
| matches in the background. Hovering over a match would bring up
| all their photos, and if some hadn't been retrieved yet they'd
| immediately move to the top of the queue. I intentionally
| wanted to limit the number of connections, to avoid
| oversaturating the server or the user's bandwidth. Idle time
| was used to prefetch all matches on the page (IIRC in a
| sensible order responsive to your scroll location). If you
| picked a large enough pagination, then stepped away to top up
| your coffee, by the time you got back you could browse through
| all of your recent matches instantly, without waiting for any
| server roundtrip lag.
|
| It was pretty slick. I realize these days modern stacks give
| you multiplexing for free, but to put in context this was
| created in the era before even JQuery was well-known.
|
| Funny story, I shared it with one of my matches and she found
| it super useful but was a bit surprised that, in a way, I was
| helping my competition. Turned out OK... we're still together
| nearly two decades later and now she generously jokes I
| invented Tinder before it was a thing.
| xyzzyz wrote:
| Sure, you can reimplement multiplexing on the application
| level, but it just makes more sense to do it on the transport
| level, so that people don't have to do it in JavaScript.
| groundzeros2015 wrote:
| But unfortunately QUIC is a user space implementation over
| kernel UDP.
| MrDarcy wrote:
| How is that relevant? The user agent (browser) handles
| the transport.
| groundzeros2015 wrote:
| That's the problem. Browsers are billion dollar ventures
| and are operating systems unto themselves. So they like
| QUIC.
|
| But you have to include giant libraries and kernel can't
| see the traffic to better manage timing etc.
| adzm wrote:
| There is no real reason QUIC couldn't be implemented in
| the kernel though.
| karmakaze wrote:
| _[Not a web dev but]_ I thought each site gets a handful of
| connections (4) to each host and more requests would have to
| wait to use one of them. That 's pretty close to what I'd
| want with a reasonably fast connection.
| rkagerer wrote:
| That's basically right. Back when I made this, many servers
| out there still limited you to just 2 (or sometimes even 1)
| concurrent connections. As sites became more media-heavy
| that number trended up. HTTP/2 can handle many concurrent
| streams on one connection, I'm not sure if you get as fine-
| grained control as with the library I wrote (maybe!).
| rishabhaiover wrote:
| This is wonderful to hear. I have a naive question. Is this
| the reason most websites/web servers absolutely need CDNs
| (apart from their edge capabilities) because they understand
| caching much more than a web developer does? But I would
| think the person more closer to the user access pattern would
| know the optimal caching strategy.
| vbezhenar wrote:
| Most websites do not need CDNs.
|
| CDNs became popular back in the old days, when some people
| thought that if two websites are using jquery-1.2.3.min.js,
| CDN could cache it and second site would load quicker.
| These days, browser don't do that, they'll ignore cached
| assets from other websites because it somehow helps to
| protect user privacy and they value privacy over
| performance in this case.
|
| There are some reasons CDNs might be helpful. Edge
| capability probably is the most important one. Another
| reason is that serving lots of static data might be a
| complicated task for a small website, so it makes sense to
| offload it to a specialised service. These days, CDNs went
| beyond static data. They can hide your backend, so public
| user won't know its address and can't DDoS it. They can
| handle TLS for you. They can filter bots, tor and people
| from countries you don't like. All in a few clicks in the
| dashboard, no need to implement complicated solutions.
|
| But nothing you couldn't write yourself in a few days,
| really.
| 29athrowaway wrote:
| I was excited about SCTP over 10 years ago but getting it to
| work was hard.
|
| The Linux kernel supports it but at least when I had tried this
| those modules were disabled on most distros.
| 1vuio0pswjnm7 wrote:
| "... some applications want multiple streams that don't block
| each other. You could use multiple TCP connections, but that
| adds its own overhead, so SCTP and QUIC were designed to
| address those issues."
|
| Other applications work just fine with a single TCP connection
|
| If I am using TCP for DNS, for example, and I am retrieving
| data from a single host such as a DNS cache, I can send
| multiple queries over a single TCP connection and receive
| multiple responses over the same single TCP single connection,
| out of order. No blocking.^1 If the cache (application)
| supports it, this is much faster than receiving answers
| sequentially and it's more efficient and polite than opening
| multiple TCP connections
|
| 1. I do this every day outside the browser with DNS over TLS
| (DoT) using something like streamtcp from NLNet Labs. I'm not
| sure that QUIC is faster, server support for QUIC is much more
| limited, but QUIC may have other advantages
|
| I also do it with DNS over HTTPS (DoH), outside the browser,
| using HTTP/1.1 pipelining, but there I receive answers
| sequentially. I'm still not convinced that HTTP/2 is faster for
| this particular use case, i.e., downloading data from a single
| host using multiple HTTP requests (compared to something like
| integrating online advertising into websites, for example)
| do_not_redeem wrote:
| > I can send multiple queries over a single TCP connection
| and receive multiple responses over the same single TCP
| single connection, out of order. No blocking.
|
| You're missing the point. You have one TCP connection, and
| the sever sends you response1 and then response2. Now if
| response1 gets lost or delayed due to network conditions, you
| must wait for response1 to be retransmitted before you can
| read response2. That is blocking, no way around it. It has
| nothing to do with advertising(?), and the other protocols
| mentioned don't have this drawback.
| maccard wrote:
| I work on an application that does a lot of high frequency
| networking in a tcp like custom framework. Our protocol
| guarantees ordering per "channel" so you can send requesr1
| on channel 1 and request2 on channel 2 and receive the
| responses in any order. (But if you send request 1 and then
| request 2 on the same channel you'll get them back in
| order)
|
| It's a trade off, and there's a surprising amount of
| application code involved on the receiving side in the
| application waiting for state to be updated on both
| channels. I definitely prefer it, but it's not without its
| tradeoffs.
| tekne wrote:
| So, roll-your-own-QUIC?
| pverheggen wrote:
| > I can send multiple queries over a single TCP connection
| and receive multiple responses over the same single TCP
| single connection, out of order.
|
| This is because DoT allows the DNS server to resolve queries
| concurrently and send query responses out of order.
|
| However, this is an application layer feature, not a
| transport layer one. The underlying TCP packets still have to
| arrive in order and therefore are subject to blocking.
| kragen wrote:
| There are a lot of design alternatives possible to TCP within
| the "create a reliable stream of data on top of an unreliable
| datagram layer" space:
|
| * Full-duplex connections are probably a good idea, but
| certainly are not the only way, or the most obvious way, to
| create a reliable stream of data on top of an unreliable
| datagram layer. TCP's predecessor NCP was half-duplex.
|
| * TCP itself also supports a half-duplex mode--even if one end
| sends FIN, the other end can keep transmitting as long as it
| wants. This was probably also a good idea, but it's certainly
| not the only obvious choice.
|
| * Sequence numbers on messages or on bytes?
|
| * Wouldn't it be useful to expose message boundaries to
| applications, the way 9P, SCTP, and some SNA protocols do?
|
| * If you expose message boundaries to applications, maybe you'd
| also want to include a message type field? Protocol-level
| message-type fields have been found to be very useful in
| Ethernet and IP, and in a sense the port-number field in UDP is
| also a message-type field.
|
| * Do you really need urgent data?
|
| * Do servers need different port numbers? TCPMUX is a
| straightforward way of giving your servers port _names_ , like
| in CHAOSNET, instead of port numbers. It only creates extra
| overhead at connection-opening time, assuming you have the
| moral equivalent of file descriptor passing on your OS. The
| only limitation is that you have to use different client ports
| for multiple simultaneous connections to the same server host.
| But in TCP everyone uses different client ports for different
| connections _anyway_. TCPMUX itself incurs an extra round-trip
| time delay for connection establishment, because the requested
| server name can 't be transmitted until the client's ACK
| packet, but if you incorporated it into TCP, you'd put the
| server name in the SYN packet. If you eliminate the server port
| number in every TCP header, you can expand the client port
| number to 24 or even 32 bits.
|
| * Alternatively, maybe network addresses should be assigned to
| server processes, as in Appletalk (or IP-based virtual hosting
| before HTTP/1.1's Host: header, or, for TLS, before SNI became
| widespread), rather than assigning network addresses to hosts
| and requiring port numbers or TCPMUX to distinguish multiple
| servers on the same host?
|
| * Probably SACK was actually a good idea and should have always
| been the default? SACK gets a lot easier if you ack message
| numbers instead of byte numbers.
|
| * _Why_ is acknowledgement reneging allowed in TCP? That was a
| terrible idea.
|
| * It turns out that measuring round-trip time is really
| important for retransmission, and TCP has no way of measuring
| RTT on retransmitted packets, which can pose real problems for
| correcting a ridiculously low RTT estimate, which results in
| excessive retransmission.
|
| * Do you really need a PUSH bit? C'mon.
|
| * A modest amount of overhead in the form of erasure-coding
| bits would permit recovery from modest amounts of packet loss
| without incurring retransmission timeouts, which is especially
| useful if your TCP-layer protocol requires a modest amount of
| packet loss for congestion control, as TCP does.
|
| * Also you could use a "congestion experienced" bit instead of
| packet loss to detect congestion in the usual case. (TCP did
| eventually acquire CWR and ECE, but not for many years.)
|
| * The fact that you can't resume a TCP connection from a
| different IP address, the way you can with a Mosh connection,
| is a serious flaw that seriously impedes nodes from moving
| around the network.
|
| * TCP's hardcoded timeout of 5 minutes is also a major flaw.
| Wouldn't it be better if the application could set that to 1
| hour, 90 minutes, 12 hours, or a week, to handle intermittent
| connectivity, such as with communication satellites? Similarly
| for very-long-latency datagrams, such as those relayed by
| single LEO satellites. Together this and the previous flaw have
| resulted in TCP largely being replaced for its original
| session-management purpose with new ad-hoc protocols such as
| HTTP magic cookies, protocols which use TCP, if at all, merely
| as a reliable datagram protocol.
|
| * Initial sequence numbers turn out not to be a very good
| defense against IP spoofing, because that wasn't their original
| purpose. Their original purpose was preventing the erroneous
| reception of leftover TCP segments from a previous incarnation
| of the connection that have been bouncing around routers ever
| since; this purpose would be better served by using a different
| client port number for each new connection. The ISN namespace
| is far too small for current LFNs anyway, so we had to patch
| over the hole in TCP with timestamps and PAWS.
| musicale wrote:
| AppleTalk didn't get much love for its broadcast (or possibly
| multicast?) based service discovery protocol - but of course
| that is what inspired mDNS. I believe AppleTalk's LAN
| addresses were always dynamic (like 169.x IP addresses),
| simplifying administration and deployment.
|
| I tend to think that one of the reasons linux containers are
| needed for network services is that DNS traditionally only
| returns an IP address (rather than address + port) so each
| service process needs to have its own IP address, which in
| linux requires a container or at least a network namespace.
|
| AppleTalk also supported a reliable transaction (basically
| request-response RPC) protocol (ATP) and a session protocol,
| which I believe were used for Mac network services (printing,
| file servers, etc.) Certainly easier than
| serializing/deserializing byte streams.
| kragen wrote:
| Does "session protocol" mean that it provided packet
| retransmission and reordering, like TCP? How does that save
| you serializing and deserializing byte streams?
|
| I agree that, given the existing design of IP and TCP, you
| could get much of the benefit of first-class addresses for
| services by using, for example, DNS-SD, and that is what
| ZeroConf does. (It is not a coincidence that the DNS-SD RFC
| was written by a couple of Apple employees.) But, if that's
| the way you're going to be finding endpoints to initiate
| connections to, there's no benefit to having separate port
| numbers and IP addresses. And IP addresses are far scarcer
| than just requiring a Linux container or a network
| namespace: there are only 232 of them. But it is rare to
| find an IP address that is listening on more than 64 of its
| 216 TCP ports, so in an alternate history where you moved
| those 16 bits from the port number to the IP address, we
| would have one thousandth of the IP-address crunch that we
| do.
|
| Historically, possibly the reason that it wasn't done this
| way is that port numbers predated the DNS by about 10
| years.
| musicale wrote:
| The session protocol was for sessions with servers and
| was used for AFP (AppleShare file servers) I believe.
|
| The higher level protocols were built on ATP which was
| message based.
|
| ADSP was a stream protocol that could be used for remote
| terminal access or other applications where byte streams
| actually made sense.
|
| > Historically, possibly the reason that it wasn't done
| this way is that port numbers predated the DNS by about
| 10 years.
|
| Predated or postdated?
|
| My understanding is that DNS can potentially provide port
| numbers, but this is not widely used or supported.
| kragen wrote:
| DNS postdated port numbers.
|
| Mockapetris's DNS RFCs are from 01983, although I think
| I've talked to people who installed DNS a year or two
| before that. Port numbers were first proposed in RFC 38
| in 01970 https://datatracker.ietf.org/doc/html/rfc38
|
| > _The END and RDY must specify relevant sockets in
| addition to the link number. Only the local socket name
| need be supplied_
|
| and given actual numbers in RFC 54, also in 01970
| https://datatracker.ietf.org/doc/html/rfc54
|
| > _Connections are named by a pair of sockets. Sockets
| are 40 bit names which are known throughout the network.
| Each host is assigned a private subset of these names,
| and a command which requests a connection names one
| socket which is local to the requesting host and one
| local to the receiver of the request._
|
| > _Sockets are polarized; even numbered sockets are
| receive sockets; odd numbered ones are send sockets. One
| of each is required to make a connection._
|
| In RFC 129 in 01971 we see discussion about whether
| socketnames should include host numbers and/or user
| numbers, still with the low-order bit indicating the
| socket's gender (emissive or receptive).
| https://datatracker.ietf.org/doc/html/rfc129
|
| RFC 147 later that year
| https://datatracker.ietf.org/doc/html/rfc147 discusses
| within-machine port numbers and how they should or should
| not relate to the socketnames transmitted in NCP packets:
|
| > _Previous network papers postulated that a process
| running under control of the host 's operating system
| would have access to a number of ports. A port might be a
| physical input or output device, or a logical I/O device
| (...)_
|
| > _A socket has been defined to be the identification of
| a port for machine to machine communication through the
| ARPA network. Sockets allocated to each host must be
| uniquely associated with a known process or be undefined.
| The name of some sockets must be universally known and
| associated with a known process operating with a
| specified protocol. (e.g., a logger socket, RJE socket, a
| file transfer socket). The name of other sockets might
| not be universally known, but given in a transmission
| over a universally known socket, (c. g. the socket pair
| specified by the transmission over the logger socket
| under the Initial Connection Protocol (ICP). In any case,
| communication over the network is from one socket to
| another socket, each socket being identified with a
| process running at a known host._
|
| RFC 167 the same year
| https://datatracker.ietf.org/doc/html/rfc167 proposes
| that socketnames not be required to be unique network-
| wide but just within a host. It also points out that you
| really only need the socketname during the initial
| connection process, if you have some other way of knowing
| which packets belong to which connections:
|
| > _Although fields will be helpful in dealing with socket
| number allocation, it is not essential that such field
| designations be uniform over the network. In all network
| transactions the 32-bit socket number is handled with its
| 8-bit host number. Thus, if hosts are able to maintain
| uniqueness and repeatability internally, socket numbers
| in the network as a whole will also be unique and
| repeatable. If a host fails to do so, only connections
| with that offending host are affected._
|
| > _Because the size, use, and character of systems on the
| network are so varied, it would be difficult if not
| impossible to come up with an agreed upon particular
| division of the 32-bit socket number. Hosts have
| different internal restrictions on the number of users,
| processes per user, and connections per process they will
| permit._
|
| > _It has been suggested that it may not be necessary to
| maintain socket uniqueness. It is contended that there is
| really no significant use made of the socket number after
| a connection has been established. The only reason a host
| must now save a socket number for the life of a
| connection is to include it in the CLOSE of that
| connection._
|
| RFC 172 in June
| https://datatracker.ietf.org/doc/html/rfc172 proposes
| using port 3 for the second version of FTP:
|
| > _[6] It seems that socket 1 has been assigned to
| logger. Socket 3 seems a reasonable choice for File
| Transfer._
|
| This updates the first version in RFC 114 in April
| https://datatracker.ietf.org/doc/html/rfc114 which said:
|
| > _[16] It seems that socket 1 has been assigned to
| logger and socket 5 to NETRJS. Socket 3 seems a
| reasonable choice for the file transfer process._
|
| RFC 196 the same year
| https://datatracker.ietf.org/doc/html/rfc196 proposes to
| use port 5 to receive mail and/or print jobs:
|
| > _Initial Connection will be as per the Official Initial
| Connection Protocol, Documents #2, NIC 7101, to a
| standard socket not yet assigned. A candidate socket
| number would be socket #5._
|
| In RFC204 in August https://www.rfc-
| editor.org/rfc/rfc204.html Postel publishes the first
| list of port number assignments:
|
| > _I would like to collect information on the use of
| socket numbers for "standard" service programs. For
| example Loggers (telnet servers) Listen on socket 1. What
| sockets at your host are Listened to by what programs?_
|
| > _Recently Dick Watson suggested assigning socket 5 for
| use by a mail-box protocol (RFC196). Does any one object
| ? Are there any suggestions for a method of assigning
| sockets to standard programs? Should a subset of the
| socket numbers be reserved for use by future standard
| protocols?_
|
| > _Please phone or mail your answers and commtents to
| (...)_
|
| Amusingly in retrospect, Postel did not include an email
| address, presumably because they didn't have email
| working yet.
|
| FTP's assignment to port 3 was confirmed in RFC 265 in
| November:
|
| > _Socket 3 is the standard preassigned socket number on
| which the cooperating file transfer process at the
| serving host should "listen". (*)The connection
| establishment will be in accordance with the standard
| initial connection protocol, (*)establishing a full-
| duplex connection._
|
| In May of 01972 Postel published a list as RFC 349
| https://www.rfc-editor.org/rfc/rfc349.html:
|
| > _I propose that there be a czar (me ?) who hands out
| official socket numbers for use by standard protocols.
| This czar should also keep track of and publish a list of
| those socket numbers where host specific services can be
| obtained. I further suggest that the initial allocation
| be as follows:_ Sockets
| Assignment 0-63 Network wide
| standard functions 64-127 Host
| specific functions 128-239 Reserved
| for future use 240-255 Any
| experimental function
|
| > _and within the network wide standard functions the
| following particular assignment be made:_
| Socket Assignment 1
| Telnet 3 File Transfer
| 5 Remote Job Entry 7
| Echo 9 Discard
|
| Note that ports 7 and 9 are _still_ assigned to echo and
| discard in /etc/services, although Telnet and FTP got
| moved to ports 23 and 21, respectively.
| tcpmux 1/tcp # TCP
| port service multiplexer echo 7/tcp
| echo 7/udp discard 9/tcp
| sink null discard 9/udp sink
| null systat 11/tcp users
| daytime 13/tcp daytime 13/udp
| netstat 15/tcp qotd 17/tcp
| quote chargen 19/tcp ttytst
| source chargen 19/udp ttytst
| source ftp-data 20/tcp ftp
| 21/tcp fsp 21/udp fspd
| ssh 22/tcp # SSH
| Remote Login Protocol telnet 23/tcp
|
| So, internet port numbers in their current form are from
| 01971 (several years before the split between TCP and
| IP), and DNS is from about 01982.
|
| In December of 01972, Postel published RFC 433
| https://www.rfc-editor.org/rfc/rfc433.html, obsoleting
| the RFC 349 list with a list including chargen and some
| other interesting services: Socket
| Assignment 1 Telnet
| 3 File Transfer 5
| Remote Job Entry 7 Echo
| 9 Discard 19
| Character Generator [e.g. TTYTST] 65
| Speech Data Base @ ll-tx-2 (74) 67
| Datacomputer @ cca (31) 241
| NCP Measurement 243 Survey
| Measurement 245 LINK
|
| The gap between 9 and 19 is unexplained.
|
| RFC 503 https://www.rfc-editor.org/rfc/rfc503.html from
| 01973 has a longer list (including systat, datetime, and
| netstat), but _also_ listing which services were running
| on which ARPANet hosts, 33 at that time. So RFC 503
| contained a list of every server process running on what
| would later become the internet.
|
| Skipping RFC 604, RFC 739 from 01977 https://www.rfc-
| editor.org/rfc/rfc739.html is the first one that shows
| the modern port number assignments (still called "socket
| numbers") for FTP and Telnet, though those presumably
| dated back a couple of years at that point:
| Specific Assignments: Decimal Octal
| Description References
| ------- ----- -----------
| ---------- Network Standard Functions
| 1 1 Old Telnet
| [6] 3 3 Old File Transfer
| [7,8,9] 5 5 Remote Job Entry
| [10] 7 7 Echo
| [11] 9 11 Discard
| [12] 11 13 Who is on or SYSTAT
| 13 15 Date and Time 15
| 17 Who is up or NETSTAT 17 21
| Short Text Message 19 23
| Character generator or TTYTST [13]
| 21 25 New File Transfer
| [1,14,15] 23 27 New Telnet
| [1,16,17] 25 31 Distributed
| Programming System [18,19] 27 33
| NSW User System w/COMPASS FE [20]
| 29 35 MSG-3 ICP
| [21] 31 37 MSG-3
| Authentication [21]
|
| Etc. This time I have truncated the list. It also has
| Finger on port 79.
|
| You say, "My understanding is that DNS can potentially
| provide port numbers, but this is not widely used or
| supported." DNS SRV records have existed since 01996
| (proposed by Troll Tech and Paul Vixie in RFC 2052
| https://www.rfc-editor.org/rfc/rfc2052), but they're
| really only widely used in XMPP, in SIP, and in ZeroConf,
| which was Apple's attempt to provide the facilities of
| AppleTalk on top of TCP/IP.
| musicale wrote:
| > The fact that you can't resume a TCP connection from a
| different IP address, the way you can with a Mosh connection,
| is a serious flaw that seriously impedes nodes from moving
| around the network
|
| This 100% !! And basically the reason mosh had to be created
| in the first place (and it probably wasn't easy.)
| Unfortunately mosh only solves the problem for ssh. Exposing
| fixed IP addresses to the application layer probably doesn't
| help either.
|
| So annoying that TCP tends to break whenever you switch wi-fi
| networks or switch from wi-fi to cellular. (On iPhones at
| least you have MPTCP, but that requires server-side support.)
| Animats wrote:
| * Full-duplex connections are probably a good idea, but
| certainly are not the only way, or the most obvious way, to
| create a reliable stream of data on top of an unreliable
| datagram layer. TCP itself also supports a half-duplex mode--
| even if one end sends FIN, the other end can keep
| transmitting as long as it wants. This was probably also a
| good idea, but it's certainly not the only obvious choice.
|
| Much of that comes from the original applications being FTP
| and TELNET.
|
| * Sequence numbers on messages or on bytes?
|
| Bytes, because the whole TCP message might not fit in an IP
| packet. This is the MTU problem.
|
| * Wouldn't it be useful to expose message boundaries to
| applications, the way 9P, SCTP, and some SNA protocols do?
|
| Early on, there were some message-oriented, rather than
| stream-oriented, protocols on top of IP. Most of them died
| out. RDP was one such. Another was QNet.[2] Both still have
| assigned IP protocol numbers, but I doubt that a RDP packet
| would get very far across today's internet.
|
| This was a lack. TCP is not a great message-oriented
| protocol.
|
| * Do you really need urgent data?
|
| The purpose of urgent data is so that when your slow Teletype
| is typing away, and the recipient wants it to stop, there's a
| way to break in. See [1], p. 8.
|
| * It turns out that measuring round-trip time is really
| important for retransmission, and TCP has no way of measuring
| RTT on retransmitted packets, which can pose real problems
| for correcting a ridiculously low RTT estimate, which results
| in excessive retransmission.
|
| Yes, reliable RTT is a problem.
|
| * Do you really need a PUSH bit? C'mon.
|
| It's another legacy thing to make TELNET work on slow links.
| Is it even supported any more?
|
| * A modest amount of overhead in the form of erasure-coding
| bits would permit recovery from modest amounts of packet loss
| without incurring retransmission timeouts, which is
| especially useful if your TCP-layer protocol requires a
| modest amount of packet loss for congestion control, as TCP
| does.
|
| * Also you could use a "congestion experienced" bit instead
| of packet loss to detect congestion in the usual case. (TCP
| did eventually acquire CWR and ECE, but not for many years.)
|
| Originally, there was ICMP Source Quench for that, but
| Berkley didn't put it in BSD, so nobody used it. Nobody was
| sure when to send it or what to do when it was received.
|
| * The fact that you can't resume a TCP connection from a
| different IP address, the way you can with a Mosh connection,
| is a serious flaw that seriously impedes nodes from moving
| around the network.
|
| That would require a security system to prevent hijacking
| sessions.
|
| [1] https://archive.org/stream/rfc854/rfc854.txt_djvu.txt
|
| [2] https://en.wikipedia.org/wiki/List_of_IP_protocol_numbers
| musicale wrote:
| > how to create a reliable stream of data on top of an
| unreliable datagram layer, then the solution that comes out
| will look virtually identical to TCP. It just is the right
| solution for the job
|
| A stream of bytes made sense in the 1970s for remote terminal
| emulation. It still sort of makes sense for email, where a
| partial message is useful (though downloading headers in bulk
| followed by full message on demand probably makes more sense.)
|
| But in 2025 much of communication involves _messages_ that aren
| 't useful if you only get part of them. It's also a pain to
| have to serialize messages into a byte stream and then
| deserialize the byte stream into messages (see: gRPC etc.) and
| the byte stream ordering is costly, doesn't work well with
| multipathing, and doesn't provide much benefit if you are only
| delivering complete messages.
|
| TCP without congestion control isn't particularly useful. As
| you note traditional TCP congestion control doesn't respond
| well to reordering. Also TCP's congestion control traditionally
| doesn't distinguish between intentional packet drops (e.g. due
| to buffer overflow) and packet loss (e.g. due to corruption.)
| This means, for example that it can't be used directly over
| networks with wireless links (which is why wi-fi has its own
| link layer retransmission).
|
| TCP's traditional congestion control is designed to fill
| buffers up until packets are dropped, leading to undesirable
| buffer bloat issues.
|
| TCP's traditional congestion control algorithms (additive
| increase/multiplicative decrease on drop) also have the poor
| property that your data rate tends to drop as RTT increases.
|
| TCP wasn't designed for hardware offload, which can lead to
| software bottlenecks and/or increased complexity when you do
| try to offload it to hardware.
|
| TCP's three-way handshake is costly for one-shot RPCs, and slow
| start means that short flows may never make it out of slow
| start, neutralizing benefits from high-speed networks.
|
| TCP is also poor for mobility. A connection breaks when your IP
| address changes, and there is no easy way to migrate it. Most
| TCP APIs expose IP addresses at the application layer, which
| causes additional brittleness.
|
| Additionally, TCP is poorly suited for optical/WDM networks,
| which support dedicated bandwidth (signal/channel bandwidth as
| well as data rate), and are becoming more important in
| datacenters and as interconnects for GPU clusters.
|
| etc.
| kccqzy wrote:
| Yeah the fact that the congestion control algorithm isn't part
| of the wire protocol is very ahead of its time and gave the
| protocol flexibility that's much needed in retrospective. OTOH
| a lot of college courses about TCP don't really emphasize this
| fact and still many people I interacted with thought that TCP
| had a single defined congestion control algorithm.
| o11c wrote:
| TCP has another unfixable flaw - it cannot be properly secured.
| Writing a security layer on top of TCP can at most _detect_ ,
| not _avoid_ , attacks.
|
| It is very easy for a malicious actor anywhere in the network
| to _inject_ data into a connection. By contrast, it is much
| harder for a malicious actor to _break_ the legitimate traffic
| flow ... except for the fact that TCP RST grants any rando the
| power to upgrade "inject" to "break". This is quite common in
| the wild for any traffic that does not look like HTTP, even
| when both endpoints are perfectly healthy.
|
| Blocking TCP RST packets using your firewall will significantly
| improve reliability, but this still does not project you from
| more advanced attackers which cause a desynchronization due to
| forged sequence numbers with nonempty payload.
|
| As a result, it is _mandatory_ for every application to support
| a full-blown "resume on a separate connection" operation,
| which is complicated and hairy and also immediately runs into
| the additional flaw that TCP is very slow to start.
|
| ---
|
| While not an outright _flaw_ , I also think it has become clear
| by now that it is highly suboptimal for "address" and "port" to
| be separate notions.
| iberator wrote:
| Its trivial to develop your own protocols on top of IP. It was
| trivial like 15 years ago in python (without any libraries) just
| handcrafted packets (arp, ip etc).
| acosmism wrote:
| i have an idea for a new javascript framework
| cynicalsecurity wrote:
| I can easily spot it's an AI written article, because it actually
| explains the technology in understandable human language. A human
| would have written it the way it was either presented to them in
| university or in bloated IT books: absolutely useless.
| omnimus wrote:
| I can easily spot it's an AI written comment, because it
| actually explains their idea in understandable human language
| and brings nothing to the discussion. A human would have
| written it the way they understand it and bring their opinions
| along: absolutely useless.
| pjjpo wrote:
| At first wanted to give the benefit of the doubt that this is
| sarcasm but gave a skim through history and I guess it's just a
| committed anti-AI agenda.
|
| Personally I found the tone of the article quite genuine and
| the video at the end made a compelling case for it. Well I
| figure you commented having actually read it.
|
| Edit: I can't downvote but if I could it probably would have
| been better than this comment!
| shevy-java wrote:
| > The internet is incredible. It's nearly impossible to keep
| people away from.
|
| Well ... he seems very motivated. I am more skeptical.
|
| For instance, Google via chrome controls a lot of the internet,
| even more so via its search engine, AI, youtube and so forth.
|
| Even aside from this people's habits changed. In the 1990s
| everyone and their Grandma had a website. Nowadays ... it is a
| bit different. We suddenly have horrible blogging sites such as
| medium.com, pestering people with popups. Of course we also had
| popups in the 1990s, but the diversity was simply higher.
| Everything today is much more streamlined it seems. And top-down
| controlled. Look at Twitter, owned by a greedy and selfish
| billionaire. And the US president? Super-selfish too. We lost
| something here in the last some 25 years.
| __MatrixMan__ wrote:
| You're talking about the web, which is merely an app with the
| internet as its platform. We can scrap it and still use the
| internet to build a different one.
| FrankWilhoit wrote:
| TCP is one of the great works of the human mind, but it did not
| envision the dominance of semiconnected networks.
| cpach wrote:
| Are you referring to NAT?
| FrankWilhoit wrote:
| No. TCP _likes_ zero packet loss (connected), and it
| _understands_ 100% packet loss (disconnected). Its weakness
| is scenarios (semiconnected) in which packet loss is
| constantly fluctuating between substantial and nearly-total.
| It doesn 't know what is going on, and it may cope or it may
| not, because its designers did not envision a future in which
| most networks have a semiconnected last mile; but that is
| where we are. Without things like forward error correction,
| TCP would be nearly useless over wireless. It is interesting
| to envision a layer-4 protocol that would incorporate FEC-
| like capabilities.
| convolvatron wrote:
| if you went back to 1981 and said 'yeah, this is great. but
| what we really want to do is not have an internet, but kind of
| a piecewise internet. instead of a global address we'll use
| addresses that have a narrower scope. and naturally as
| consequence of this we'll need to start distinguishing between
| nodes that everyone can reach, service nodes, and nodes that no
| one can reach - client nodes. and as a consequence of this
| we'll start building links that are asymmetric in bandwidth,
| since one direction is only used for requests and acks and not
| any data volume.'
|
| they would have looked at you and asked straight out what you
| hoped to gain by making these things distinguished, because it
| certainly complicates things.
| FrankWilhoit wrote:
| Wireless networks are always going to have asymmetries of
| transmit power. Everything flows from that. ALOHAnet was
| 1971.
| api wrote:
| It's worth considering how the tiny computers of the era forced a
| simple clean design. IPv6 was designed starting in the early 90s
| and they couldn't resist loading it up with extensions, though
| the core protocol remains fine and is just IP with more bits.
| (Many of the extensions are rarely if ever used.)
|
| If the net were designed today it would be some complicated
| monstrosity where every packet was reminiscent of X.509 in terms
| of arcane complexity. It might even have JSON in it. It would be
| incredibly high overhead and we'd see tons of articles about how
| someone made it fast by leveraging CPU vector instructions or a
| GPU to parse it.
|
| This is called Eroom's law, or Moore's law backwards, and it is
| very real. Bigger machines let programmers and designers loose to
| indulge their desire to make things complicated.
| rubatuga wrote:
| What are some extensions? just curious.
| api wrote:
| IPSec was a big one that's now borderline obsolete, though it
| is still used for VPNs and was back ported to IPv4.
|
| Many networking folks including myself consider IPv6 router
| advertisements and SLAAC to be inferior, in practice, to
| DHCPv6, and that it would be better if we'd just left IP
| assignment out of the spec like it was in V4. Right now we
| have this mess where a lot of nets prefer or require DHCPv6
| but some vendors, like apparently Android, refuse to support
| it.
|
| The rules about how V6 addresses are chopped up and assigned
| are wasteful and dumb. The entire V4 space could have been
| mapped onto /32 and an encapsulation protocol made to allow
| V4 to carry V6, providing a seamless upgrade path that does
| not require full upgrade of the whole core, but that would
| have been too logical. Every machine should get like a /96 so
| it can use 32 bits of space to address apps, VMs, containers,
| etc. As it stands we waste 64 bits of the space to make SLAAC
| possible, as near as I can tell. The SLAAC tail must have
| wagged the dog in that people thought this feature was cool
| enough to waste 8 bytes per packet.
|
| The V6 header allows extension bits that are never used and
| blocked by most firewalls. There's really no point in them
| existing since middle boxes effectively freeze the base
| protocol in stone.
|
| Those are some of the big ones.
|
| Basically all they should have done was make IPs 64 or 128
| bits and left everything else alone. But I think there was a
| committee.
|
| As it stands we have what we have and we should just treat V6
| as IP128 and ignore the rest. I'm still in favor of the
| upgrade. V4 is too small, full stop. If we don't enlarge the
| addresses we will completely lose end to end connectivity as
| a supported feature of the network.
| toast0 wrote:
| > Every machine should get like a /96 so it can use 32 bits
| of space to address apps, VMs, containers, etc.
|
| You can just SLAAC some more addresses for whatever you
| want. Although hopefully you don't use more than the ~ARP~
| NDP table size on your router; then things get nasty. This
| should be trivial for VMs, and could be made possible for
| containers and apps.
|
| > The V6 header allows extension bits that are never used
| and blocked by most firewalls. [...] Basically all they
| should have done was make IPs 64 or 128 bits and left
| everything else alone.
|
| This feels contradictory... IPv4 also had extension headers
| that were mostly unused and disallowed. V6 changed the
| header extension mechanism, but offers the same
| opportunities to try things that might work on one network
| but probably won't work everywhere.
| throw0101a wrote:
| Any love for SCTP?
|
| > _The Stream Control Transmission Protocol (SCTP) is a computer
| networking communications protocol in the transport layer of the
| Internet protocol suite. Originally intended for Signaling System
| 7 (SS7) message transport in telecommunication, the protocol
| provides the message-oriented feature of the User Datagram
| Protocol (UDP) while ensuring reliable, in-sequence transport of
| messages with congestion control like the Transmission Control
| Protocol (TCP). Unlike UDP and TCP, the protocol supports
| multihoming and redundant paths to increase resilience and
| reliability._
|
| [...]
|
| > _SCTP may be characterized as message-oriented, meaning it
| transports a sequence of messages (each being a group of bytes),
| rather than transporting an unbroken stream of bytes as in TCP.
| As in UDP, in SCTP a sender sends a message in one operation, and
| that exact message is passed to the receiving application process
| in one operation. In contrast, TCP is a stream-oriented protocol,
| transporting streams of bytes reliably and in order. However TCP
| does not allow the receiver to know how many times the sender
| application called on the TCP transport passing it groups of
| bytes to be sent out. At the sender, TCP simply appends more
| bytes to a queue of bytes waiting to go out over the network,
| rather than having to keep a queue of individual separate
| outbound messages which must be preserved as such._
|
| > _The term multi-streaming refers to the capability of SCTP to
| transmit several independent streams of chunks in parallel, for
| example transmitting web page images simultaneously with the web
| page text. In essence, it involves bundling several connections
| into a single SCTP association, operating on messages (or chunks)
| rather than bytes._
|
| * https://en.wikipedia.org/wiki/Stream_Control_Transmission_Pr...
| nesarkvechnep wrote:
| As a BSD enjoyer and paid to write Erlang, I have nothing but
| love for SCTP.
| o11c wrote:
| No, SCTP only fixes half of a problem, but also gratuitously
| introduces several additional flaws, even ignoring the "router
| support" problem.
|
| The only good answer is "a reliability layer on top of UDP";
| fortunately everybody is now rallying around QUIC as the choice
| for that.
| mlhpdx wrote:
| TCP being the "default" meant it was chosen when the need for
| ordering and uniform reliability wasn't there. That was fine but
| left systems working less well than they could have with more
| carefully chosen underpinnings. With HTTP/3 gaining traction, and
| HTTP being the "next level up default choice" things potentially
| get better. The issue I see is that QUIC is far more complex, and
| the new power is fantastic for a few but irrelevant to most.
|
| UDP has its place as well, and if we have more simple and
| effective solutions like WireGuard's handshake and encryption on
| top of it we'd be better off as an industry.
| bmacho wrote:
| Otherwise please use the original title, unless it is misleading
| or linkbait; don't editorialize.
|
| https://news.ycombinator.com/newsguidelines.html
| cosmic_quanta wrote:
| Strange, the title was definitely the original title earlier
| today
| rfmoz wrote:
| RUDP from Plan9 was a nice step between TCP and UDP -
| https://en.wikipedia.org/wiki/Reliable_User_Datagram_Protoco...
| tolerance wrote:
| For the record I thought the TLD for this page was 'cerfbound',
| which sounds like the name for the race horse of the internet.
| brcmthrowaway wrote:
| How much energy does the internet use?
| brcmthrowaway wrote:
| Do extraterrestrial civilizations also use TCP?
| jccx70 wrote:
| Crap, that thing, the code etc... has been posted thousand of
| times on the internet. The final quote "Oh I am so happy this
| works", ok thanks bye.
| jiggawatts wrote:
| The congestion control algorithm in TCP has some interesting
| effects on throughput that a lot of developers aren't aware of.
|
| For example, sending some data on a fresh TCP connection is
| _slow_ , and the "ramp up time" to the bandwidth of the network
| is almost entirely determined by the _latency_.
|
| Amazing speed ups can be achieved in a data centre network by
| shaving microseconds off the round trip time!
|
| Similarly, many (all?) TCP stacks count segments, not bytes, when
| determining this ramp up rate. This means that jumbo frames can
| provide 6x the bandwidth during this period!
|
| If you read about the network design of AWS, they put a lot of
| effort into low switching latency and enabling jumbo frames.
|
| The real pros do this kind of network tuning, everyone else
| wonders why they don't get anywhere near 10 Gbps through a 10
| Gbps link.
___________________________________________________________________
(page generated 2025-11-15 23:00 UTC)