[HN Gopher] Back to basics: Why we chose long-polling over webso...
       ___________________________________________________________________
        
       Back to basics: Why we chose long-polling over websockets
        
       Author : lunarcave
       Score  : 197 points
       Date   : 2025-01-05 07:14 UTC (15 hours ago)
        
 (HTM) web link (www.inferable.ai)
 (TXT) w3m dump (www.inferable.ai)
        
       | CharlieDigital wrote:
       | Would there be any technical benefit to this over using server
       | sent events (SSE)?
       | 
       | Both are similar in that they hold the HTTP connection open and
       | have the benefit of being simply HTTP (the big plus here). SSE
       | (at least to me) feels like it's more suitable for some use cases
       | where updates/results could be streamed in.
       | 
       | A fitting use case might be where you're monitoring all job IDs
       | on behalf of a given client. Then you could move the job
       | monitoring loop to the server side and continuously yield results
       | to the client.
        
         | lunarcave wrote:
         | Good point! We did consider SSE, but ultimately decided against
         | it due to the way we have to re-implement response payloads
         | (one for application/json and one for text/event-stream).
         | 
         | I've not personally witnessed this, but people on the internets
         | have said that _some_ proxies/LBs have problems with SSE due to
         | the way it does buffering.
        
           | gorjusborg wrote:
           | > we have to re-implement response payloads (one for
           | application/json and one for text/event-stream)
           | 
           | I am curious about what you mean here. The 'text/event-
           | stream' allows for abitrary event formats, it just provides
           | structure for EventSource to be able to parse.
           | 
           | You should only need one 'text/event-stream' and should be
           | able send the same JSON via normal or SSE response.
        
             | josephg wrote:
             | What the GP commenter might have meant is that websockets
             | support binary message bodies. SSE does not.
        
         | vindex10 wrote:
         | I got interested and found this nice thread on SO:
         | https://stackoverflow.com/a/5326159
         | 
         | One of the drawbacks, as I learned - SSE have limit on number
         | of up to ~6 open connections (browser + domain name). This can
         | quickly become a limiting factor when you open the same web
         | page in multiple tabs.
        
           | bn-l wrote:
           | ...if you're using http/1.1. It's not an issue with 2+
        
           | Klonoar wrote:
           | Not an issue if you're using HTTP/2 due to how multiplexing
           | happens.
        
           | CharlieDigital wrote:
           | As the other two comments mentioned, this is a restriction
           | with HTTP/1.1 and it would apply also to long polling
           | connections as well.
        
           | _heimdall wrote:
           | Syncing state across multiple tabs and windows is always a
           | bit tricky. For SSE, I'd probably reach for the
           | BroadcastChannel API. Open the SSE connection in the first
           | tab and have it broadcast events to any other open tab or
           | window.
        
           | bioneuralnet wrote:
           | I've gotten around this by using the Page Visibility API -
           | https://developer.mozilla.org/en-
           | US/docs/Web/API/Page_Visibi.... Close the SSE connection when
           | the page is hidden, and re-open when it becomes visible
           | again.
        
         | csumtin wrote:
         | I tried using SSE and found it didn't work for my use case, it
         | was broken on mobile. When users switched from the browser to
         | an app and back, the SSE connection was broken and they
         | wouldn't receive state updates. Was easier to do long polling
        
           | CharlieDigital wrote:
           | Not sure about the default `EventSource` object in
           | JavaScript, but the Microsoft polyfill that I use
           | (https://github.com/Azure/fetch-event-source) supports `POST`
           | and there's an option `openWhenHidden` which controls how it
           | reacts to users tabbing away.
        
           | josephg wrote:
           | The standard way to fix that is to send ping messages every
           | ~15 seconds or something over the SSE stream. If the client
           | doesn't get a ping in any 20 second window, assume the sse
           | stream is broken somehow and restart it. It's complex but it
           | works.
           | 
           | The big downside of sse in mobile safari - at least a few
           | years ago - is you got a constant loading spinner on the
           | page. Thats bad UX.
        
       | bartvk wrote:
       | Refreshing to be reminded of a relatively simple alternative to
       | websockets. For a short time, I worked at a now-defunct startup
       | which had made the decision for websockets. It was an app that
       | would often be used on holiday so testing was done on hotel and
       | restaurant wifi. Websockets made that difficult.
        
         | ipnon wrote:
         | I feel like WebSockets are already as simple as it gets. It's
         | "just" an HTTP request with an indeterminate body. Just make an
         | HTTP request and don't close the connection. That's a
         | WebSocket.
        
           | bluepizza wrote:
           | It's surprisingly complex.
           | 
           | Connections are dropped all the time, and then your code, on
           | both client and server, need to account for retries (will the
           | reconnection use a cached DNS entry? how will load balancing
           | affect long term connections?), potentially missed events
           | (now you need a delta between pings), DDoS protections (is
           | this the same client connecting from 7 IPs in a row or is
           | this a botnet), and so on.
           | 
           | Regular polling great reduces complexity on some of these
           | points.
        
             | slau wrote:
             | Long polling has nearly all the same disadvantages.
             | Disconnections are harder to track, DNS works exactly the
             | same for both techniques, as does load balancing, and DDoS
             | is specifically about different IPs trying to DoS your
             | system, not the same IP creating multiple connections, so
             | irrelevant to this discussion.
             | 
             | Yes, WS is complex. Long polling is not much better.
             | 
             | I can't help but think that if front end connections are
             | destroying your database, then your code is not structured
             | correctly. You can accept both WS and long polls without
             | touching your DB, having a single dispatcher then send the
             | jobs to the waiting connections.
        
               | bluepizza wrote:
               | My understanding is that long polling has these issues
               | handled by assuming the connection will be regularly
               | dropped.
               | 
               | Clients using mobile phones tend to have their IPs
               | rapidly changed in sequence.
               | 
               | I didn't mention databases, so I can't comment on that
               | point.
        
               | josephg wrote:
               | Well, it's the same in both cases. You need to handle
               | disconnection and reconnection. You need a way to
               | transmit missed messages, if that's important to you.
               | 
               | But websockets also guarantee in-order delivery, which is
               | never guaranteed by long polling. And websockets play way
               | better with intermediate proxies - since nothing in the
               | middle will buffer the whole response before delivering
               | it. So you get better latency and better wire efficiency.
               | (No http header per message).
        
               | bluepizza wrote:
               | That very in order guarantee is the issue. It can't know
               | exactly where the connection died, which means that the
               | client must inform the last time it received an update,
               | and the server must then crawl back a log to find the
               | pending messages and redispatch them.
               | 
               | At this point, long polling seems to carry more benefits,
               | IMHO. WebSockets seem to be excellent for stable
               | conditions, but not quite what we need for mobile.
        
             | sn0wtrooper wrote:
             | If a connection is closed, isn't the browser's
             | responsibility to solve DNS when you open it again?
        
       | valenterry wrote:
       | Using websockets with graphql, I feel like a lot of the
       | challenges are then already solved. From the post:
       | 
       | - Observability: WebSockets are more stateful, so you need to
       | implement additional logging and monitoring for persistent
       | connections: solved with graphql if the existing monitoring is
       | already sufficient.
       | 
       | - Authentication: You need to implement a new authentication
       | mechanism for incoming WebSocket connections: solved with
       | graphql.
       | 
       | - Infrastructure: You need to configure your infrastructure to
       | support WebSockets, including load balancers and firewalls: True,
       | firewalls need to be updated.
       | 
       | - Operations: You need to manage WebSocket connections and
       | reconnections, including handling connection timeouts and errors:
       | normally already solved by the graphql library. For errors, it's
       | basically the same though.
       | 
       | - Client Implementation: You need to implement a client-side
       | WebSocket library, including handling reconnections and state
       | management: Just have to _use_ a graphql library that comes with
       | websocket support (I think most of them do) and configure it
       | accordingly.
        
         | anonzzzies wrote:
         | I hope (never needed this) client implementations that do this
         | all for you and pick the best implementation based on what the
         | client supports? Not sure why the transport is interesting
         | when/if you have freedom to choose.
        
           | josephg wrote:
           | Yeah there's plenty of high quality websocket client
           | libraries in all languages now. Support and feature are
           | excellent. And they've been supported in all browsers for a
           | decade or something at this point.
           | 
           | I vomit in my mouth a bit whenever people reach for socket.io
           | or junk like that. You don't want or need the complexity and
           | bugs these libraries bring. They're obsolete.
        
         | _heimdall wrote:
         | Using graphql comes with IRS own list of challenges and issues
         | though. Its a good solution for some situations, but it isn't
         | so universal that you can just switch to it without a problem.
        
         | rednafi wrote:
         | Where did graphql come from? It doesn't solve any of the
         | problems mentioned here.
        
       | feverzsj wrote:
       | Why not just use chunked encoding and get rid of extra requests.
        
       | justinl33 wrote:
       | yeah, authentication complexity with WebSockets is severely
       | underappreciated. We ran into major RBAC headaches when clients
       | needed to switch between different privilege contexts mid-
       | session. Long polling with standard HTTP auth patterns eliminates
       | this entire class of problems.
        
         | watermelon0 wrote:
         | Couldn't you just disconnect and reconnect websocket if
         | privileges change, since the same needs to be done with the
         | long polling?
        
           | josephg wrote:
           | Yeah, and you can send cookies in the websocket connection
           | headers. This used to be a problem in some browsers iirc -
           | they wouldn't send cookies properly over websocket connection
           | requests.
           | 
           | As a workaround in one project I wrote JavaScript code which
           | manually sent cookies in the first websocket message from the
           | client as soon as a connection opened. But I think this
           | problem is now solved in all browsers.
        
       | baumschubser wrote:
       | I like long polling, it's easy to understand from start to finish
       | and from client perspective it just works like a very slow
       | connection. You have to keep track of retries and client-side
       | cancelled connections to have one but only one (and the right
       | one) of requests at hand to answer to.
       | 
       | One thing that seems clumsy in the code example is the loop that
       | queries the data again and again. Would be nicer if the data
       | update could also resolve the promise of the response directly.
        
         | moribvndvs wrote:
         | You could have your job status update push an update into an
         | in-memory or distributed cache and check that in your long poll
         | rather than a DB lookup, but that may require adding a bunch of
         | complexity to wire the completion of the task to updating said
         | cache. If your database is tuned well and you don't have any
         | other restrictions (e.g. serverless where you pay by the IO),
         | it may be good enough and come out in the wash.
        
         | josephg wrote:
         | Hard disagree. Long polling can have complex message ordering
         | problems. You have completely different mechanisms for message
         | passing from client-to-server and server-to-client. And middle
         | boxes can and will stall long polled connections, stopping
         | incremental message delivery. (Or you can use one http query
         | per message - but that's insanely inefficient over the wire).
         | 
         | Websockets are simply a better technology. With long polling,
         | the devil is in the details and it's insanely hard to get those
         | details right in every case.
        
           | _nalply wrote:
           | One of them 2001 was that Netscape didn't render correctly if
           | the connection is still open. Hah. I am sure this issue has
           | been fixed a long, long time ago, but perhaps there are other
           | issues.
           | 
           | Nowadays I prefer SSE to long polling and websockets.
           | 
           | The idea is: the client doesn't know that the server has new
           | data before it makes a request. With a very simple SSE the
           | client is told that new data is there then it can request new
           | data separately if it wants. This said, SSE has a few quirks,
           | one of them that on HTTP/1 the connection counts to the
           | maximum limit of 6 concurrent connections per browser and
           | domain, so if you have several tabs, you need a SharedWorker
           | to share the connection between the tabs. But probably this
           | quirk also appllies to long polling and websockets. Another
           | quirk, SSE can't transmit binary data and has some
           | limitations in the textual data it represents. But for this
           | use case this doesn't matter.
           | 
           | I would use websockets only if you have a real bidirectional
           | data flow or need to transmit complex data.
        
             | Cheezmeister wrote:
             | Streaming SIMD Extensions?
             | 
             | Server-sent events.
        
               | crop_rotation wrote:
               | Streaming SIMD Extensions seems very unlikely to have any
               | relevance in the above statement, server-sent events is
               | the perfect fit.
        
             | zazaulola wrote:
             | WebSocket solves a very different problem. It may be only
             | partially related to organizing two-way communication, but
             | it has nothing to do with data complexity. Moreover, WS are
             | not good enough at transmitting binary data.
             | 
             | If you are using SSE and SW and you need to transfer some
             | binary data from client to server or from server to client,
             | the easiest solution is to use the Fetch API. `fetch()`
             | handles binary data perfectly well without transformations
             | or additional protocols.
             | 
             | If the data in SW is large enough to require displaying the
             | progress of the data transfer to the server, you will
             | probably be more suited to `XMLHttpRequest`.
        
             | snackbroken wrote:
             | > if you have several tabs, you need a SharedWorker to
             | share the connection between the tabs.
             | 
             | You don't _have_ to use a SharedWorker, you can also do
             | domain sharding. Since the concurrent connection limit is
             | per domain, you can add a bunch of DNS records like
             | SSE1.example.org - > 2001:db8::f00; SSE2.example.org ->
             | 2001:db8::f00; SSE3.example.org -> 2001:db8::f00; and so
             | on. Then it's just a matter of picking a domain at random
             | on each page load. A couple hundred tabs ought to be enough
             | for anyone ;)
        
           | wruza wrote:
           | _you can use one http query per message - but that's insanely
           | inefficient over the wire_
           | 
           | Use one http response per message queue snapshot. Send no
           | more than N messages at once. Send empty status if the queue
           | is empty for more than 30-60 seconds. Send cancel status to
           | an awaiting connection if a new connection opens successfully
           | (per channel singleton). If needed, send and accept "last"
           | id/timestamp. These are my usual rules for long-polling.
           | 
           | Prevents: connection overhead, congestion latency, connection
           | stalling, unwanted multiplexing, sync loss, respectively.
           | 
           |  _You have completely different mechanisms for message
           | passing from client-to-server and server-to-client_
           | 
           | Is this a problem? Why should this even be symmetric?
        
             | josephg wrote:
             | You can certainly you can do all that. You also need to
             | handle retransmission. And often you also need a way for
             | the client to send back confirmations that each side
             | received certain messages. So, as well as sequence numbers
             | like you mentioned, you probably want acknowledgement
             | numbers in messages too. (Maybe - it depends on the
             | application).
             | 
             | Implementing a stable, in-order, exactly once message
             | delivery system on top of long polling starts to look a lot
             | like implementing TCP on top of UDP. Its a solvable
             | problem. I've done it - 14 years ago I wrote the first
             | opensource implementation of (the server side) of google's
             | Browserchannel protocol, from back before websockets
             | existed:
             | 
             | https://github.com/josephg/node-browserchannel
             | 
             | This supports long polling on browsers, all the way back to
             | IE5.5. It works even when XHR isn't available! I wrote it
             | in literate coffeescript, from back when that was a thing.
             | 
             | But getting all of those little details right is really
             | very difficult. Its a lot of code, and there are a lot of
             | very subtle bugs lurking in this kind of code if you aren't
             | careful. So you also need good, complex testing. You can
             | see in that repo - I ended up with over 1000 lines of
             | server code+comments (lib/server.coffee), and 1500 lines of
             | testing code (test/server.coffee).
             | 
             | And once you've got all that working, my implementation
             | really wanted server affinity. Which made load balancing &
             | failover across over multiple application servers a huge
             | headache.
             | 
             | It sounds like your application allows you to simplify some
             | details of this network protocol code. You do you. I just
             | use websockets & server-sent events. Let TCP/IP handle all
             | the details of in-order message delivery. Its really quite
             | good.
        
               | wruza wrote:
               | This is a common library issue, it doesn't know and has
               | to be defensive and featureful at the same time.
               | 
               | Otoh, end-user projects usually know things and can make
               | simplifying decisions. These two are incomparable. I
               | respect the effort, but I also think that this level of
               | complexity is a wrong answer to the call in general. You
               | have to price-break requirements because they tend to
               | oversell themselves and rarely feature-intersect as much
               | as this library implies. Iow, when a client asks for
               | guarantees, statuses or something we just tell them to
               | fetch from a suitable number of seconds ago and see
               | themselves. Everyone works like this, you need some extra
               | - track it yourself based on your own metrics and our
               | rate limits.
        
       | peheje wrote:
       | What about HTTP/2 Multiplexing, how does it hold up against long-
       | polling and websockets?
       | 
       | I have only tried it briefly when we use gRPC:
       | https://grpc.io/docs/what-is-grpc/core-concepts/#server-stre...
       | 
       | Here it's easy to specify that a endpoint is a "stream", and then
       | the code-generation tool gives all tools really to just keep
       | serving the client with multiple responses. It looks deceptively
       | simple. We already have setup auth, logging and metrics for gRPC,
       | so I hope it just works off of that maybe with minor adjustments.
       | But I'm guessing you don't need the gRPC layer to use HTTP/2
       | Multiplexing?
        
         | toast0 wrote:
         | At least in a browser context, HTTP/2 doesn't address server to
         | client unsolicitied messages. So you'd still need a polling
         | request open from the client.
         | 
         | HTTP/2 does specify a server push mechanism (PUSH_PROMISE), but
         | afaik, browsers don't accept them and even if they did, (again
         | afaik) there's no mechanism for a page to listen for them.
         | 
         | But if you control the client and the server, you could use it.
        
       | bigbones wrote:
       | I don't know how meaningful it is any more, but with long polling
       | with a short timeout and a gracefully ended request (i.e. chunked
       | encoding with an eof chunk sent rather than disconnection), the
       | browser would always end up with one spare idle connection to the
       | server, making subsequent HTTP requests for other parts of the UI
       | far more likely to be snappier, even if the app has been left
       | otherwise idle for half the day
       | 
       | I guess at least this trick is still meaningful where HTTP/2 or
       | QUIC aren't in use
        
       | ipnon wrote:
       | Articles like this make me happy to use Phoenix and LiveView
       | every day. My app uses WebSockets and I don't think about them at
       | all.
        
         | leansensei wrote:
         | Same here. It truly is a godsend.
        
         | zacksiri wrote:
         | I was thinking this exact thing as I was reading the article.
        
         | tzumby wrote:
         | I came here to say exactly this! Elixir and OTP (and by
         | extension LiveView) are such a good match for the problem
         | described in the post.
        
           | j45 wrote:
           | I was kind of wondering how something hadn't solve this at
           | all, compared to a solution not readily being on one's path.
        
         | cultofmetatron wrote:
         | hah seriously. my app uses web sockets extensively but since we
         | are also using Phoenix, its never been source of conflict in
         | development. it really was just drop it and scale to thousands
         | of users.
        
           | arrty88 wrote:
           | Why couldn't nodejs with uWS library or golang + gorilla
           | handle 10s of thousands of connections?
        
             | apitman wrote:
             | I think GP's point is that they feel Phoenix is simpler to
             | use than alternatives, not necessarily that it scales
             | better.
        
         | dugmartin wrote:
         | The icing on the cake is that you can also enable Phoenix
         | channels to fallback to longpolling in your endpoint config.
         | The generator sets it to false by default.
        
         | pipes wrote:
         | Is this similar to Microsoft's blazer?
        
           | pipes wrote:
           | Odd, I wonder why I got down voted for it, it was a genuine
           | question
        
             | NicoJuicy wrote:
             | Blazor? Raw guess
        
         | jtchang wrote:
         | This is using elixir right?
        
         | diggan wrote:
         | Articles like this make me happy to use Microsoft FrontPage and
         | cPanel, I don't think about HTTP or WebSockets at all.
        
         | wutwutwat wrote:
         | every websocket setup is painless when running on a single
         | server or handling very few connections...
         | 
         | I was on the platform/devops/man w/ many hats team for an
         | elixir shop running Phoenix in k8s. WS get complicated even in
         | elxir when you have 2+ app instances behind a round robin load
         | balancer. You now need to share broadcasts between app servers.
         | Here's a situation you have to solve for w/ any app at scale
         | regardless of language
         | 
         | app server #1 needs to send a publish/broadcast message out to
         | a user, but the user who needs that message isn't connected to
         | app server #1 that generated the message, that user is
         | currently connected to app server #2.
         | 
         | How do you get a message from one app server to the other one
         | which has the user's ws connection?
         | 
         | A bad option is sticky connections. User #1 always connects to
         | server #1. Server #1 only does work for users connected to it
         | directly. Why is this bad? Hot spots. Overloaded servers.
         | Underutilized servers. Scaling complications. Forecasting
         | problems. Goes against the whole concept of horizontal scaling
         | and load balancing. It doesn't handle side-effect messages, ie
         | user #1000 takes some action which needs to broadcast a message
         | to user #1 which is connected to who knows where.
         | 
         | The better option: You need to broadcast to a shared broker.
         | Something all app servers share a connection to so they can
         | themselves subscribe to messages they should handle, and then
         | pass it to the user's ws connection. This is a message broker.
         | postgres can be that broker, just look at oban for real world
         | proof. Throw in pg's listen/notify and you're off to the races.
         | But that's heavy from a resources per db conn perspective so
         | lets avoid the acid db for this then. Ok. Redis is a good
         | option, or since this is elixir land, use the built in
         | distributed erlang stuff. But, we're not running raw elixir
         | releases on linux, we're running inside of containers, on top
         | of k8s. The whole distributed erlang concept goes to shit once
         | the erlang procs are isolated from each other and not in their
         | perfect Goldilocks getting started readme world. So ok, in
         | containers in k8s, so each app server needs to know about all
         | the other app servers running, so how do you do that? Hmm,
         | service discovery! Ok, well, k8s has service discovery already,
         | so how do I tell the erlang vm about the other nodes that I got
         | from k8s etcd? Ah, a hex package cool. lib_cluster to the
         | rescue https://github.com/bitwalker/libcluster
         | 
         | So we'll now tie the boot process of our entire app to fetching
         | the other app server pod ips from k8s service discovery, then
         | get a ring of distributed erlang nodes talking to each other,
         | sharing message passing between them, this way no matter which
         | server the lb routes the user to, a broadcast from any one of
         | them will be seen by all of them, and the one who holds the ws
         | connection will then forward it down the ws to the user.
         | 
         | So now there's a non trivial amount of complexity and risk that
         | was added here. More to reason about when debugging. More to
         | consider when working on features. More to understand when
         | scaling, deploying, etc. More things to potentially take the
         | service down or cause it not to boot. More things to have race
         | conditions, etc.
         | 
         | Nothing is ever so easy you don't have to think about it.
        
       | sgarland wrote:
       | The full schema isn't listed, but the indices don't make sense to
       | me.
       | 
       | (id, cluster_id) sounds like it could / should be the PK
       | 
       | If the jobs are cleared once they've succeeded, and presumably
       | retried if they've failed or stalled, then the table should be
       | quite small; so small, that a. The query planner is unlikely to
       | use the partial index on (status) b. The bloat from the rapidity
       | of DELETEs likely overshadows the live tuple size.
        
       | DougN7 wrote:
       | I implemented a long polling solution in desktop software over 20
       | years ago and it's still working great. It can even be used as a
       | tunnel to stream RDP sessions, through which YouTube can play
       | without a hiccup. Big fan of long polling, though I admit I
       | didn't get a chance to try web sockets back then.
        
         | jclarkcom wrote:
         | I did the same, were you at VMware by any chance? At the time
         | it was the only way to get comparability with older browsers.
        
       | mojuba wrote:
       | Can someone explain why TTL = 60s is a good choice? Why not more,
       | or less?
        
         | notatoad wrote:
         | i can't speak for why the author chose it, but if you're
         | operating behind AWS cloudfront then http requests have a
         | maximum timeout of 60s - if you don't close the request within
         | 60s, cloudfront will close it for you.
         | 
         | i suspect other firewalls, cdns, or reverse proxy products will
         | all do something similar. for me, this is one of the biggest
         | benefits of websockets over long-polling: it's a standard way
         | to communicate to proxies and firewalls "this connection is
         | supposed to stay open, don't close it on me"
        
       | k__ wrote:
       | Half-OT:
       | 
       | What's the most resource efficient way to push data to clients
       | over HTTP?
       | 
       | I can send data to a server via HTTP request, I just need a way
       | to notify a client about a change and would like to avoid polling
       | for it.
       | 
       | I heard talk about SSE, WebSockets, and now long-polling.
       | 
       | Is there something else?
       | 
       | What requires the least resources on the server?
        
         | mojuba wrote:
         | I don't think any of the methods give any significant advantage
         | since in the end you need to maintain a connection per each
         | client. The difference between the methods boils down to
         | complexity of implementation and reliability.
         | 
         | If you want to reduce server load then you'd have to sacrifice
         | responsiveness, e.g. you perform short polls at certain
         | intervals, say 10s.
        
           | k__ wrote:
           | Okay, thanks.
           | 
           | What's the least complex to implement then?
        
             | mojuba wrote:
             | For the browser and if you need only server-to-client
             | sends, I assume SSE would be the best option.
             | 
             | For other clients, such as mobile apps, I think long poll
             | would be the simplest.
        
       | amelius wrote:
       | > Corporate firewalls blocking WebSocket connections was one of
       | our other worries. Some of our users are behind firewalls, and we
       | don't need the IT headache of getting them to open up WebSockets.
       | 
       | Don't websockets look like ordinary https connections?
        
         | doublerabbit wrote:
         | It does. However DPI firewalls look at and block the upgrade
         | handshake.                   Connection: Upgrade
         | Upgrade: websocket
        
         | toast0 wrote:
         | Some corporate firewalls MITM all https connections. Websocket
         | does not look normal once you've terminated TLS.
        
           | amelius wrote:
           | Can websites detect this?
        
             | toast0 wrote:
             | AFAIK, only by symptoms. If https fetches work and
             | websockets don't, that's a sign. HSTS and assorted
             | reporting can help a bit in aggregate, but not if the
             | corporate MITM CA has been inserted into the browser's
             | trusted CA list. I don't think there's an API to get
             | certificate details from the browser side to compare.
             | 
             | A proxy may have a different TLS handshake than a real
             | browser would, depending on how good the MITM is, but the
             | better they are, the more likely it is that websockets
             | work.
        
       | yuliyp wrote:
       | I think this article is tying a lot of unrelated decisions to
       | "Websocket" vs "Long-polling" when they're actually independent.
       | A long-polling server could handle a websocket client with just a
       | bit of extra work to handle keep-alive.
       | 
       | For the other direction, to support long-polling clients if your
       | existing architecture is websockets which get data pushed to them
       | by other parts of the system, just have two layers of servers:
       | one which maintains the "state" of the connection, and then the
       | HTTP server which receives the long polling request can connect
       | to the server that has the connection state and wait for data
       | that way.
        
         | harrall wrote:
         | It sounded like the author(s) just had existing request-
         | oriented code and didn't want to rewrite it to be connection-
         | oriented.
         | 
         | Personally I would enjoyed solving that problem instead of
         | hacking around it but that's me.
        
         | lunarcave wrote:
         | Author here.
         | 
         | Having done this, I don't think I'd reduce it to "just a little
         | bit of work" to make it hum in production.
         | 
         | Everything in between your UI components and the database layer
         | needs to be reworked to work in the connection-oriented
         | (Websockets) model of the world vs request-oriented world.
        
       | vitus wrote:
       | I would appreciate if the article spent more time actually
       | discussing the benefits of websockets (and/or more modern
       | approaches to pushing data from server -> browser) and why the
       | team decided those benefits were not worth the purported
       | downsides. I could see the same simplicity argument being applied
       | to using unencrypted HTTP/1.1 instead of HTTP/2, or TCP Reno
       | instead of CUBIC.
       | 
       | The section at the end talking about "A Case for Websockets"
       | really only rehashes the arguments made in "Hidden Benefits of
       | Long-Polling" stating that you need to reimplement these various
       | mechanisms (or just use a library for it).
       | 
       | My experience in this space is from 2011, when websockets were
       | just coming onto the scene. Tooling / libraries were much more
       | nascent, websockets had much lower penetration (we still had to
       | support IE6 in those days!), and the API was far less stable
       | prior to IETF standardization. But we still wanted to use them
       | when possible, since they provided much better user experience
       | (lower latency, etc) and lower server load.
        
       | tguvot wrote:
       | Another reason: there is a patent troll suing companies over
       | usage of websockets.
        
       | vouwfietsman wrote:
       | The points mentioned against websockets are mostly fud, I've used
       | websockets in production for a very heavy global data streaming
       | application, and I would respond the following to the "upsides"
       | of not using websockets:
       | 
       | > Observability Remains Unchanged
       | 
       | Actually it doesn't, many standard interesting metrics will break
       | because long-polling is not a standard request either.
       | 
       | > Authentication Simplicity
       | 
       | Sure, auth is different than with http, but not more difficult.
       | You can easily pass a token.
       | 
       | > Infrastructure Compatibility
       | 
       | I'm sure you can find firewalls out there where websockets are
       | blocked, however for my use case I have never seen this reported.
       | I think this is outdated, for sure you don't need "special proxy
       | configurations or complex infrastructure setups".
       | 
       | > Operational Simplicity
       | 
       | Restarts will drop any persistent connection, state can be both
       | or neither in WS or in LP, it doesn't matter what you use.
       | 
       | > Client implementation
       | 
       | It mentions "no special WebSocket libraries needed" and also "It
       | works with any HTTP client". Guess what, websockets will work
       | with any websocket client! Who knew!
       | 
       | Finally, in the conclusion:
       | 
       | > For us, staying close to the metal with a simple HTTP long
       | polling implementation was the right choice
       | 
       | Calling simple HTTP long polling "close to the metal" in
       | comparison to websockets is weird. I wouldn't be surprised if
       | websockets scale much better and give much more control depending
       | on the type of data, but that's besides the point. If you want to
       | use long polling because you prefer it, go ahead. Its a great way
       | to stick to request/response style semantics that web devs are
       | familiar with. Its not necessary to regurgitate a bunch of random
       | hearsay arguments that may influence people in the wrong way.
       | 
       | Try to actually leave the reader with some notion of when to use
       | long polling vs when to use websockets, not a post-hoc
       | justification of your decision based on generalized arguments
       | that do not apply.
        
         | amatuer_sodapop wrote:
         | > > Observability Remains Unchanged
         | 
         | > Actually it doesn't, many standard interesting metrics will
         | break because long-polling is not a standard request either.
         | 
         | As a person who works in a large company handling millions of
         | websockets, I fundamentally disagree with discounting the
         | observability challenges. WebSockets completely transform your
         | observability stack - they require different logging patterns,
         | new debugging approaches, different connection tracking, and
         | change how you monitor system health at scale. Observability is
         | far more than metrics, and handwaving away these architectural
         | differences doesn't make the implementation easier.
        
       | Cort3z wrote:
       | I think they are mixing some problems here. They could probably
       | have used their original setup with Postgres NOTIFY+triggers in
       | stead of polling, and only have one "pickup poller" to catch any
       | missed events/jobs. In my opinion transaction medium should not
       | be linked to how the data is manage internally, but I know from
       | experience that this separation is often hard to achieve in
       | practice.
        
       | emilio1337 wrote:
       | The article does discuss a lot of mixed concepts. I would prefer
       | one process polling new jobs/state and one process handling http
       | connections/websockets. Hence no flooding the database and
       | completely scalable from the client side. The database process
       | pushes everything downstream via some queue while the other
       | process/server handles those and sends them to respective clients
        
       | imglorp wrote:
       | Since the article mentioned Postgres by name, isn't this a case
       | for using its asynchronous notification features? Servers can
       | LISTEN to a channel and PG can TRIGGER and NOTIFY them when the
       | data changes.
       | 
       | No polling needed, regardless of the frontend channel.
        
         | lunarcave wrote:
         | Yes, but the problems of detecting that changeset and
         | delivering it to the right connection remains to be solved in
         | the app layer.
        
         | cluckindan wrote:
         | It would be easier to run Hypermode's Dgraph as the database
         | and use GraphQL subscriptions from the frontend. But nobody
         | ever got fired for choosing postgres.
        
           | j45 wrote:
           | I have relatively recently taken steps towards Postgres from
           | it's abiality to be at the center of so much until a project
           | outgrows it.
           | 
           | In terms of not getting fired - Postgres is a lot more
           | innovative than most databases, and the insinuation of IBM.
           | 
           | By innovative I mean uniquely putting in performance related
           | items for the last 10-20 years.
        
       | wereHamster wrote:
       | Unrelated to the topic in the article...                   await
       | new Promise(resolve => setTimeout(resolve, 500));
       | 
       | In Node.js context, it's easier to:                   import {
       | setTimeout } from "node:timers/promises";         await
       | setTimeout(500);
        
         | treve wrote:
         | Is that easier? The first snippet is shorter and works on any
         | runtime.
        
           | joshmanders wrote:
           | In the context of Node.js, where op said, yes it is easier.
           | But it's a new thing and most people don't realize timers in
           | Node are awaitable yet, so the other way is less about "works
           | everywhere" and more "this is just what I know"
        
             | wereHamster wrote:
             | I guess most Node.js developers also don't realize that
             | there's "node:fs/promises" so you don't have to use
             | callbacks or manually wrap functions from "node:fs" with
             | util.promisify(). Doesn't mean need to stick with old
             | patterns forever.
             | 
             | When I said 'in the context of Node.js' I meant if you are
             | in a JS module where you already import other node:
             | modules, ie. when it's clear that code runs in a Node.js
             | runtime and not in a browser. Of course when you are
             | writing code that's supposed to be portable, don't use it.
             | Or don't use setTimeout at all because it's not guaranteed
             | to be available in all runtimes - it's not part of the
             | ECMA-262 language specification after all.
        
         | hombre_fatal wrote:
         | I haven't used that once since I found out that it exists.
         | 
         | I just don't see the point. It doesn't work in the browser and
         | it shadows global.setTimeout which is confusing. Meanwhile the
         | idiom works everywhere.
        
           | joshmanders wrote:
           | You can alias it if you're worried about shadowing.
           | import { setTimeout as loiter } from "node:timers/promises";
           | await loiter(500);
        
             | hombre_fatal wrote:
             | Sure, and that competes with a universal idiom.
             | 
             | To me it's kinda like adding a shallowClone(old) helper
             | instead of writing const obj = { ...old }.
             | 
             | But no point in arguing about it forever.
        
       | bob1029 wrote:
       | I think we could have the best of both worlds. E.g.:
       | 
       | https://socket.io/docs/v4/how-it-works/#upgrade-mechanism
        
         | j45 wrote:
         | I wonder if people only look for solutions in their language
         | when a tech like websockets is language independent.
        
       | rednafi wrote:
       | Neither Server-Sent Events nor WebSockets have replaced all use
       | cases of long polling reliably. The connection limit of SSE comes
       | up a lot, even if you're using HTTP/2. WebSockets, on the other
       | hand, are unreliable as hell in most environments. Also, WS is
       | hard to debug, and many of our prod issues with WS couldn't even
       | be reproduced locally.
       | 
       | Detecting changes in the backend and propagating them to the
       | right client is still an unsolved problem. Until then, long
       | polling is surprisingly simple and a robust solution that works.
        
         | pas wrote:
         | Robust WS solutions need a fallback anyway, and unless you are
         | doing something like Discord long polling is a reasonable
         | option.
        
       | Animats wrote:
       | Long polling has some problems of its own.
       | 
       | Second Life has an HTTPS long polling channel between client and
       | server. It's used for some data that's too bulky for the UDP
       | connection, not too time sensitive, or needs encryption. This has
       | caused much grief.
       | 
       | On the client side, the poller uses libcurl. Libcurl has
       | timeouts. If the server has nothing to send for a while, libcurl
       | times out. The client then makes the request again. This results
       | in a race condition if the server wants to send something between
       | timeout and next request. Messages get lost.
       | 
       | On top of that, the real server is front-ended by an Apache
       | server. This just passes through relevant requests, blocking the
       | endless flood of junk HTTP requests from scrapers, attacks, and
       | search engines. Apache has a timeout, and may close a connection
       | that's in a long poll and not doing anything.
       | 
       | Additional trouble can come from middle boxes and proxy servers
       | that don't like long polling.
       | 
       | There are a lot of things out there that just don't like holding
       | an HTTP connection open. Years ago, a connection idle for a
       | minute was fine. Today, hold a connection open for ten seconds
       | without sending any data and something is likely to disconnect
       | it.
       | 
       | The end result is an unreliable message channel. It has to have
       | sequence numbers to detect duplicates, and can lose messages. For
       | a long time, nobody had discovered that, and there were
       | intermittent failures that were not understood.
       | 
       | In the original article, the chart section labelled "loop"
       | doesn't mention timeout handling. That's not good. If you do long
       | polling, you probably need to send something every few seconds to
       | keep the connection alive. Not clear what a safe number is.
        
         | rednafi wrote:
         | Yeah, some servers close connections when there's no data
         | transfer. When the backend holds the connection while polling
         | the database until a timeout occurs or the database returns
         | data, it needs to send something back to the client to keep the
         | connection alive. I wonder what could be sent in this case and
         | whether it would require special client-side logic.
        
         | wutwutwat wrote:
         | Every problem you just listed is 100% in your control and able
         | to be configured, so the issue isn't long polling, it's your
         | setup/configs. If your client (libcurl) times out a request,
         | set the timeout higher. If apache is your web server and it
         | disconnects idle clients, increase the timeout, tell it not to
         | buffer the request and to pass it straight back to the app
         | server. If there's a cloud lb somewhere (sounds like it because
         | alb defaults to a 10s idle timeout), increase the timeouts...
         | 
         | Every timeout in every hop of the chain is within your control
         | to configure. Setup a subdomain and send long polling requests
         | through that so the timeouts can be set higher and not impact
         | regular http requests or open yourself up to slow client ddos.
         | 
         | Why would you try to do long polling and not configure your
         | request chain to be able to handle them without killing idle
         | connections? The problems you have only exist because you're
         | allowing them to exist. Set your idle timeouts higher. Send
         | keepalives more often. Tell your web servers to not do request
         | buffering, etc.
         | 
         | All of that is extremely easy to test and verify functioanlity.
         | Does the request live longer than your polling interval? Yes?
         | Great you're done! No? Tune some more timeouts and log the
         | request chain everywhere you can until you know where the
         | problems lie. Knock them out one by one going back to the
         | origin until you get what you want.
         | 
         | Long polling is easy to get right from an operations
         | perspective.
        
           | moritonal wrote:
           | Whilst it's possible you may be correct, I do have to point
           | out you are, I believe, lecturing John Nagle, known for
           | Nagle's algorithm, used in most TCP stacks in the world.
        
             | motorest wrote:
             | > Whilst it's possible you may be correct, I do have to
             | point out you are, I believe, lecturing John Nagle, known
             | for Nagle's algorithm, used in most TCP stacks in the
             | world.
             | 
             | Thank you for pointing that out. This thread alone is bound
             | to become a meme.
        
               | wutwutwat wrote:
               | Oh fuck a famous person!? I told a famous person to
               | adjust their server timeouts?!!
               | 
               | This changes everything!
               | 
               | I better go kill myself from this embarrassment so my
               | family doesn't have to live with my shame! Hopefully with
               | enough generations of time passing my family will rise
               | above the stain I've created for them!
        
               | motorest wrote:
               | > I better go kill myself from this embarrassment so my
               | family doesn't have to live with my shame!
               | 
               | There's no need to go to extremes, no matter how
               | embarrassing and notably laughable your comment was. I'd
               | say enjoy your fame.
        
             | wutwutwat wrote:
             | All the more reason that the poster should know that the
             | things they are complaining about are self inflicted
             | then...?
             | 
             | The comment makes long polling out to be the issue, but the
             | issue is the timeouts of the user's software, which are
             | preventing the long polling from being effective.
             | 
             | Names mean very little to me. Am I supposed to feel
             | embarrassed or stupid now that you've pointed out that I
             | gave advise to a tech OG that knows his shit? I don't care
             | who the person is, and in this case, the way I read their
             | message, they didn't know their shit.
             | 
             | That's the problem with attaching a name to things, because
             | if that name is known for being some God like person, they
             | then become incapable of being a human being capable of
             | saying something less informed than anyone else in the
             | room, even if they might say something that others could
             | give them some advise or insight on. Nope. This person's
             | name is known, so there's no way this other no-name person
             | could possibly be able to offer advise to them. That's now
             | how it works. The known name is the smartest one in the
             | room and that's just a fact of the universe, duh!
        
             | exe34 wrote:
             | Oh my gosh :-D
        
             | rezmason wrote:
             | I bet there's an online college credit transfer program
             | that'll accept this as a doctoral defense. Depending on how
             | Nagle's finagled.
        
           | tbillington wrote:
           | > Every timeout in every hop of the chain is within your
           | control to configure.
           | 
           | lol
        
             | wutwutwat wrote:
             | I wasn't talking about network switch hops and if you're
             | trying to long polling and don't have control over the web
             | servers going back to your systems then wtf are you trying
             | to do long polling for anyway.
             | 
             | I don't try to run red lights because I don't have control
             | over the lights on the road.
        
         | interroboink wrote:
         | I'm new to websockets, please forgive my ignorance -- how is
         | sending some "heartbeat" data over long polling different from
         | the ping/pong mechanism in websockets?
         | 
         | I mean, in both cases, it's a TCP connection over (eg) port 443
         | that's being kept open, right? Intermediaries can't snoop the
         | data if its SSL, so all they know is "has some data been sent
         | recently?" Why would they kill long-polling sessions after
         | 10sec and not web socket ones?
        
         | mike-cardwell wrote:
         | That race condition has nothing to do with long polling, it's
         | just poor design. The sender should stick the message in a
         | queue and the client reads from that queue. Perhaps with the
         | client specifying the last id it saw.
        
       | sneak wrote:
       | Given long polling, I have never ever understood why websockets
       | became a thing. I've never implemented them and never will, it's
       | a protocol extension where none is necessary.
        
         | ekkeke wrote:
         | Websockets can operate outside the request/response model used
         | in this long polling example, and allow you to stream data
         | continuously. They're also a lot more efficient in terms of
         | framing and connections if there are a lot of individual pieces
         | of data to push as you don't need to spin a up a connection +
         | request for each bit.
        
       | LeicaLatte wrote:
       | Long polling is my choice for simple, reliable and plug and play
       | like interfaces. HTTP requests tend to be standard and simplify
       | authentication as well. Systems with frequent but not constant
       | updates are ideal. Text yes. Voice maybe not.
       | 
       | Personal Case Study: I built mobile apps which used Flowise
       | assistants for RAG and found websockets compeletely out of line
       | with the rest of my system and interactions. Suddenly I was
       | fitting a round peg in a square hole. I switched to OpenAI
       | assistants and their polling system felt completely "natural" to
       | integrate.
        
       ___________________________________________________________________
       (page generated 2025-01-05 23:00 UTC)