[HN Gopher] An Analysis of the Performance of WebSockets in Vari...
       ___________________________________________________________________
        
       An Analysis of the Performance of WebSockets in Various Programming
       Languages (2021)
        
       Author : max0563
       Score  : 81 points
       Date   : 2024-11-23 03:34 UTC (19 hours ago)
        
 (HTM) web link (www.researchgate.net)
 (TXT) w3m dump (www.researchgate.net)
        
       | paulgb wrote:
       | The SSRN link doesn't have a login-wall:
       | https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3778525
        
         | chrisweekly wrote:
         | Thanks! Here's the direct link to the ungated PDF:
         | https://download.ssrn.com/21/02/03/ssrn_id3778525_code456891...
         | 
         | TLDR; NodeJS is the clear winner, and Python far and away the
         | worst of the bunch.
        
       | 5Qn8mNbc2FNCiVV wrote:
       | Too bad that uWebsockets was used for Node because a lot of
       | higher level libraries are built on top of
       | https://www.npmjs.com/package/ws
        
         | windlep wrote:
         | I was able to make a uWebsockets adapter for NestJS pretty
         | easily. It's a bit sensitive of a library to integrate though,
         | a single write when the connection is gone and you get a
         | segfault, which means a lot of checking before writing if
         | you've yielded since you last checked. This was a few years
         | ago, perhaps they fixed that.
        
       | travisgriggs wrote:
       | Thanks for the free access links. I did read through a bit.
       | 
       | The title is misleading because exactly one implementation was
       | chosen for each of the tested languages. They conclude "do not us
       | e Python" because the Python websockets library performs pretty
       | poorly.
       | 
       | Each language is scored based on the library chosen. I have to
       | believe there are more options for some of these languages.
       | 
       | As someone who is implementing an Elixir LiveView app right now,
       | I was particularly curious to see how Elixir performed given
       | LiveViews reliance on websockets, but as Elixir didn't make the
       | cut.
        
         | nelsonic wrote:
         | Was also surprised they omitted Elixir/Erlang from the list of
         | languages. Crazy considering how many messaging apps use OTP on
         | the backend.
        
         | Terretta wrote:
         | _> The title is misleading because exactly one implementation
         | was chosen for each of the tested languages. They conclude "do
         | not use Python" because the Python websockets library performs
         | pretty poorly._
         | 
         | On the contrary, they tried autobahn and aiohttp as well:
         | 
         |  _For the Python websocket, a generic module is used which is
         | simply named "websockets". ... This is most likely a module
         | that offers the simplest of websocket functionality. Now, it
         | was mentioned that this only partly explains the poor
         | performance. While writing this report, it seemed unjust not to
         | give Python a fighting chance. So, the websocket server has
         | been rebuilt with the more trusted Autobahn library and the
         | benchmark test has been rerun. This new server does lead to
         | better results ... still unable to finish the benchmark
         | test.... [T]he Python server is rebuilt one more time, this
         | time with a library by the name of "aiohttp." At last, all 100
         | rounds of the benchmark are able to be completed, though not
         | very well. Aiohttp still takes longer than Go, and becomes
         | substantially unreliable after round 50, dropping anywhere from
         | 30-50% of the messages. It can only be concluded that the
         | reason for this dreadful performance is Python itself._
        
       | latch wrote:
       | Their explanation for why Go performs badly didn't make any sense
       | to me. I'm not sure if they don't understand how goroutines work,
       | if I don't understand how goroutines work or if I just don't
       | understand their explanation.
       | 
       | Also, in the end, they didn't use the JSON payload. It would have
       | been interesting if they had just written a static string. I'm
       | curious how much of this is really measuring JSON
       | [de]serialization performance.
       | 
       | Finally, it's worth pointing out that WebSocket is a standard.
       | It's possible that some of these implementations follow the
       | standard better than others. For example, WebSocket requires that
       | a text message be valid UTF8. Personally, I think that's a dumb
       | requirement (and in my own websocket server implementation for
       | Zig, I don't enforce this - if the application wants to, it can).
       | But it's completely possible that some implementations enforce
       | this and others don't, and that (along with every other check)
       | could make a difference.
        
         | vandot wrote:
         | They didn't use goroutines, which is explains the poor perf.
         | https://github.com/matttomasetti/Go-Gorilla_Websocket-Benchm...
         | 
         | Also, this paper is from Feb 2021.
        
           | windlep wrote:
           | I was under the impression that the underlying net/http
           | library uses a new goroutine for every connection, so each
           | websocket gets its own goroutine. Or is there somewhere else
           | you were expecting goroutines in addition to the one per
           | connection?
        
             | donjoe wrote:
             | Which is perfectly fine. However, you will be able to
             | process only a single message per connection at once.
             | 
             | What you would do in go is:
             | 
             | - either a new goroutine per message
             | 
             | - or installing a worker pool with a predefined goroutine
             | size accepting messages for processing
        
               | jand wrote:
               | Another option is to have a read-, and a write-pump
               | goroutine associated with each gorilla ws client. I found
               | this useful for gateways wss <--> *.
        
           | initplus wrote:
           | http.ListenAndServe is implemented under the hood with a new
           | goroutine per incoming connection. You don't have to
           | explicitly use goroutines here, it's the default behaviour.
        
             | necrobrit wrote:
             | Yes _however_ the nodejs benchmark at least is handling
             | each message asynchronously, whereas the go implementation
             | is only handling connections asynchronously.
             | 
             | The client fires off all the requests before waiting for a
             | response:
             | https://github.com/matttomasetti/NodeJS_Websocket-
             | Benchmark-... so the comparison isn't quite apples to
             | apples.
             | 
             | Edit to add: looks like the same goes for the c++ and rust
             | implementations. So I think what we might be seeing in this
             | benchmark (particularly the node vs c++ since it is the
             | same library) is that asynchronously handling each message
             | is beneficial, and the go standard libraries json parser is
             | slow.
             | 
             | Edit 2: Actually I think the c++ version is async for each
             | message! Dont know how to explain that then.
        
               | josephg wrote:
               | Well, tcp streams are purely sequential. It's the ideal
               | use case for a single process, since messages can't be
               | received out of order. There's no computational advantage
               | to "handling each message asynchronously" unless the
               | message handling code itself does IO or something. And
               | that's not the responsibility of the websocket library.
        
               | necrobrit wrote:
               | Good point!
        
         | ikornaselur wrote:
         | Yeah I thought this looked familiar.. I went through this
         | article about a year and a half ago when exploring WebSockets
         | in Python for work. With some tuning and using a different
         | libraries + libuv we were easily able to get similar
         | performance to NodeJS.
         | 
         | I had a blog post somewhere to show the testing and results,
         | but can't seem to find it at the moment though.
        
         | tgv wrote:
         | > I'm curious how much of this is really measuring JSON
         | [de]serialization performance.
         | 
         | Well, they did use the standard library for that, so quite a
         | bit, I suppose. That thing is slow. I've got no idea how fast
         | those functions are in other languages, but you're right that
         | it would ruin the idea behind the benchmark.
        
           | bryancoxwell wrote:
           | Are you referring to Go's stdlib?
        
         | klabb3 wrote:
         | > Their explanation for why Go performs badly didn't make any
         | sense to me.
         | 
         | To me, the whole paper is full of misunderstanding, at least
         | the analysis. There's just speculation based on caricatures of
         | the language, like "node is async", "c++ is low level" etc. The
         | fact that their C++ impl using uWebSocket was _significantly_
         | slower than then Node, which used uWebSocket bindings, should
         | have led them to question the test setup (they probably used
         | threads which defeats the purpose of uWebSocket.
         | 
         | Anyway.. The "connection time" is just HTTP handshake. It could
         | be included as a side note. What's important in WS deployments
         | are:
         | 
         | - Unique message throughput (the only thing measured afaik).
         | 
         | - Broadcast/"multicast" throughput, i.e. say you have 1k
         | subscribers you wanna send the _same_ message.
         | 
         | - Idle memory usage (for say chat apps that have low traffic -
         | how many peers can a node maintain)
         | 
         | To me, the champion is uWebSocket. That's the _entire_ reason
         | why  "Node" wins - those language bindings were written by the
         | same genius who wrote that lib. Note that uWebSocket doesn't
         | have TLS support, so whatever reverse proxy you put in front is
         | gonna dominate usage because all of them have higher overheads,
         | even nginx.
         | 
         | Interesting to note is that uWebSocket perf (especially memory
         | footprint) can't be achieved even in Go, because of the
         | goroutine overhead (there's _no_ way in Go to read /write from
         | multiple sockets from a single goroutine, so you have to spend
         | 2 gorountines for realtime r/w). It could probably be achieved
         | with Tokio though.
        
           | Svenskunganka wrote:
           | The whole paper is not only full of misunderstandings, it is
           | full of errors and contradictions with the implementations.
           | 
           | - Rust is run in debug mode, by omitting the --release flag.
           | This is a very basic mistake.
           | 
           | - Some implementations is logging to stdout on each message,
           | which will lead to a lot of noise not only due to the
           | overhead of doing so, but also due to lock contention for
           | multi-threaded benchmarks.
           | 
           | - It states that the Go implementation is blocking and
           | single-threaded, while it in fact is non-blocking and multi-
           | threaded (concurrent).
           | 
           | - It implies the Rust implementation is not multi-threaded,
           | while it in fact is because the implementation spawns a
           | thread per connection. On that note, why not use an async
           | websocket library for Rust instead? They're used much more.
           | 
           | - Gives VM-based languages zero time to warm up, giving them
           | very little chance to do one of their jobs; runtime
           | optimizations.
           | 
           | - It is not benchmarking websocket implementations
           | specifically, it is benchmarking websocket implementations,
           | JSON serialization and stdout logging all at once. This adds
           | so much noise to the result that the result should be
           | considered entirely invalid.
           | 
           | > To me, the champion is uWebSocket. That's the entire reason
           | why "Node" wins [...]
           | 
           | A big part of why Node wins is because its implementation is
           | not logging to stdout on each message like the other
           | implementations do. Add a console.log in there and its
           | performance tanks.
        
           | austin-cheney wrote:
           | There is no HTTP handshake in RFC6455. A client sends a text
           | with a pseudo unique key. The server sends a text with a key
           | transform back to the client. The client then opens a socket
           | to the server.
           | 
           | The distinction is important because assuming HTTP implies
           | WebSockets is a channel riding over an HTTP server. Neither
           | the client or server cares if you provide any support for
           | HTTP so long as the connection is achieved. This is easily
           | provable.
           | 
           | It also seems you misunderstand the relationship between
           | WebSockets and TLS. TLS is TCP layer 4 while WebSockets is
           | TCP layers 5 and 6. As such WebSockets work the same way
           | regardless of TLS but TLS does provide an extra step of
           | message fragmentation.
           | 
           | There is a difference in interpreting how a thing works and
           | building a thing that does work.
        
       | emmanueloga_ wrote:
       | If the author is reading this, I think a single repository would
       | be more appropriate than multiple repos [1]. It would be nice to
       | set things up so we can simply git pull, docker run, and execute
       | the benchmarks for each language sequentially.
       | 
       | Something that stood out to me is the author's conclusion that
       | "Node.js wins." However, both the Node.js and C++ versions use
       | the same library, uWebSockets! I suspect the actual takeaway is
       | this:
       | 
       | "uWebSockets wins, and the uWebSockets authors know their library
       | well enough that even their JavaScript wrapper outperforms my own
       | implementation in plain C++ using the same library!" :-p
       | 
       | Makes me wonder if there's something different that could be done
       | in Go to achieve better performance. Alternatively, this may
       | highlight which language/library makes it easier to do the right
       | thing out of the box (for example, it seems easier to use
       | uWebsockets in nodejs than in C++). TechEmpower controversies
       | also come to mind, where "winning" implementations often don't
       | reflect how developers typically write code in a given language,
       | framework, or library.
       | 
       | --
       | 
       | 1:
       | https://github.com/matttomasetti?tab=repositories&q=websocke...
        
       | fnordpiglet wrote:
       | (2021) Was surprised it used a depreciated Rust crate until I
       | noticed how out of date it is
        
       | simpaticoder wrote:
       | Interesting that https://github.com/uNetworking/uWebSockets.js
       | (which is C++ with node bindings) outperforms the raw C++
       | uWebSockets implementation.
       | 
       | It's also interesting that https://github.com/websockets/ws does
       | not appear in this study, given that in the node ecosystem it is
       | ~3x more likely to be used (not a perfect measurement but ws has
       | 28k github stars vs uWebSockets 8k stars)
        
       | zo1 wrote:
       | Was this published as-is to some sort of prominent CS journal? I
       | honestly can't tell from the link. If that's the case, I'm very
       | disappointed and would have a few choice words about the state of
       | "academia".
        
         | ndusart wrote:
         | Yes, that would be concerning indeed...
         | 
         | The author couldn't tell why he didn't manage to make run the C
         | or python program but figured it is probably the blame of the
         | language for some obscure reasons.
         | 
         | He also mentioned that he should have implemented
         | multithreading in C++ to be comparable with Node, but meh
         | that's probably also not of his concern, let compare them as is
         | ^^`
         | 
         | Also he doesn't mention the actual language of the library
         | used, but that would have voided the interest of the article,
         | so I quite may understand that omission :P
         | 
         | But at the end, nothing can be learned from this and it is hard
         | to believe it is what "research" can produce
        
           | josephg wrote:
           | Yeah it's a rubbish paper. It's just a comparison of some
           | websocket implementations at some particular point in time.
           | It tells you how fast some of the fastest WS implementations
           | are in absolute terms, but there are no broad conclusions you
           | can make other than the fact that there's more room for
           | optimisation in a few libraries. Whoopty doo. News at 11.
        
       | indulona wrote:
       | The DX for websockets in Go(gorilla) is horrible. But i do not
       | believe these numbers one bit.
        
       | wuschel wrote:
       | Is this a peer reviewed paper? It does not seem to be. At a first
       | glance, the researchgate URI and the way the title was formulated
       | made me think it would be the case.
        
       | frizlab wrote:
       | Not including Swift in such a research seems to be a big
       | oversight to me.
        
       | cess11 wrote:
       | I'd like to know why Elixir and Erlang were excluded.
        
         | cess11 wrote:
         | Seems the author went silent after this, maybe he decided to
         | run a cafe or something instead.
        
       | austin-cheney wrote:
       | I have a home grown websocket library I wrote in TypeScript for
       | node.js. When I measured it a couple of years ago here were my
       | findings:
       | 
       | * I was able to send a little under 11x faster than I could
       | process the messages on the receiving end. I suspected this was
       | due to accounting for processing of frame headers with
       | consideration of the various forms of message fragmentation. I
       | also ran both send and receive operations on the same machine
       | which could have biased the numbers
       | 
       | * I was able to send messages on my hardware at 280,000 messages
       | per second. Bun claimed, at that time, a send rate of about
       | 780,000 messages per second. My hardware is old with DDR3 memory.
       | I suspect faster memory would increase those numbers more than
       | anything else, but I never validated that
       | 
       | * In real world practical use switching from HTTP for data
       | messaging to WebSockets made my big application about 8x faster
       | overall in test automation.
       | 
       | Things I suspect, my other assumptions:
       | 
       | * A WebSocket library can achieve superior performance if written
       | in a strongly typed language that is statically compiled and
       | without garbage collection. Bun achieved far superior numbers and
       | is written in Zig.
       | 
       | * I suspect that faster memory would lower the performance gap
       | between sending and receiving when perf testing on a single
       | machine
        
       | pier25 wrote:
       | I'm surprised at how well php is doing here. I'm guessing they
       | are using fibers?
        
         | alganet wrote:
         | It uses reactphp event-loop library:
         | 
         | https://github.com/reactphp/event-loop
         | 
         | That library can use either select, libuv, libev or libevent if
         | I'm not mistaken. Fibers are not used at this point, although
         | other libraries have explored the idea (revoltphp).
         | 
         | If we're assuming the paper author installed a typical PHP,
         | then it's using select for async I/O. It's the slowest
         | implementation of the event loop. Using something like swoole
         | would extract even more performance out of PHP for async io
         | scenarios.
        
       | fredtalty5 wrote:
       | The 2021 study titled "An Analysis of the Performance of
       | WebSockets in Various Programming Languages" benchmarks multiple
       | WebSocket implementations to determine which offers the best
       | performance. Key findings include:
       | 
       | Node.js emerged as the fastest option, primarily due to its
       | asynchronous capabilities, allowing for higher throughput during
       | concurrent requests.
       | 
       | Java and C# closely followed Node.js, demonstrating strong
       | performance in handling requests.
       | 
       | C++ and rust performed moderately well, while PHP lagged behind
       | them.
       | 
       | Python and C struggled significantly, with Python's websocket
       | library proving particularly inefficient, leading to high
       | failures during stress tests.
       | 
       | The analysis emphasises the importance of using asynchronous
       | libraries and suggests avoiding Python for web socket
       | implementations due to its performance limitations. The study
       | serves as a valuable resource for developers looking to select
       | the optimal programming language for WebSocket applications.
        
       | timkofu wrote:
       | It would be interesting to this repeated with Starlette and
       | Granian on Python 13 (with GIL and JIT).
        
       | fastaguy88 wrote:
       | A meta comment: This paper gives an example of a "teaser
       | abstract". It says what was done, but does not say anything about
       | the actual results. This style is relatively common, but I find
       | it very annoying. There was certainly enough room in the abstract
       | to provide a concise summary of the actual results, which would
       | both inform the reader and perhaps encourage more people to read
       | the entire paper.
        
       ___________________________________________________________________
       (page generated 2024-11-23 23:01 UTC)