[HN Gopher] 4-10x faster in-process pub/sub for Go
       ___________________________________________________________________
        
       4-10x faster in-process pub/sub for Go
        
       Author : kelindar
       Score  : 90 points
       Date   : 2025-06-29 15:19 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | kelindar wrote:
       | This might be useful to some if you need a very light pub/sub
       | inside one process.
       | 
       | I was building a small multiplayer game in Go. Started with a
       | channel fan-out but (for no particular reason) wanted to see if
       | we can do better. Put together this tiny event bus to test, and
       | on my i7-13700K it delivers events in 10-40ns, roughly 4-10x
       | faster than the plain channel loop, depending on the
       | configuration.
        
       | zx2c4 wrote:
       | > about 4x to 10x faster than channels.
       | 
       | I'd be interested to learn why/how and what the underlying
       | structural differences are that make this possible.
        
         | MathMonkeyMan wrote:
         | I didn't look, but I don't think of channels as a pub/sub
         | mechanism. You can have a producer close() a channel to notify
         | consumers of a value available somewhere else, or you can loop
         | through a bunch of buffered channels and do nonblocking sends.
         | 
         | A different design, without channels, could improve on those.
        
           | atombender wrote:
           | I prefer to think of channels as a memory-sharing mechanism.
           | 
           | In most cases where you want to send data between concurrent
           | goroutines, channels are a better primitive, as they allow
           | the sender and receiver to safely and concurrently process
           | data without needing explicit locks. (Internally, channels
           | are protected with mutexes, but that's a single, battle-
           | tested and likely bug-free implementation shared by all users
           | of channels.)
           | 
           | The fact that channels also block on send/receive means and
           | support buffering means that there's a lot more to them, but
           | that's how you should think of them. The fact that channels
           | look like a queue if you squint is a red herring that has
           | caused many a junior developer to abuse them for that
           | purpose, but they are a surprisingly poor fit for that. Even
           | backpressure tends to be something you want to control
           | manually (using intermediate buffers and so on), because
           | channels can be fiendishly hard to debug once you chain more
           | than a couple of them. Something forgets to close a close a
           | channel, and your whole pipeline can stall. Channels are also
           | slow, requiring mutex locking even in scenarios where data
           | isn't in need of locking and could just be passed directly
           | between functions.
           | 
           | Lots of libraries (such as Rill and go-stream) have sprung up
           | that wrap channels to model data pipelines (especially with
           | generics it's become easier to build generic operators like
           | deduping, fan-out, buffering and so on), but I've found them
           | to be a bad idea. Channels should remain a low-level
           | primitive to build pipelines, but they're not what you should
           | use as your main API surface.
        
             | MathMonkeyMan wrote:
             | > Channels should remain a low-level primitive to build
             | pipelines, but they're not what you should use as your main
             | API surface.
             | 
             | I remember hearing (not sure where) that this is a lesson
             | that was learned early on in Go. Channels were the new
             | hotness, so let's use them to do things that were not
             | possible before. But it turned out that Go was better for
             | doing what was already possible before, but more cleanly.
        
       | absolute_unit22 wrote:
       | > High Performance: Processes millions of events per second,
       | about 4x to 10x faster than channels.
       | 
       | Wow - that's a pretty impressive accomplishment. I've been
       | meaning to move some workers I have to a pub/sub on
       | https://www.typequicker.com.
       | 
       | I might try using this in prod. I don't really need the insane
       | performance benefits as I don't have my traffic lol - but I
       | always like experimenting with new open source libraries -
       | especially while the site isn't very large yet
        
         | MunishMummadi wrote:
         | that's a cool site. you will see me more frequently btw do some
         | tech twitter promos.
        
       | hinkley wrote:
       | It's always worth discussing what features were thrown out to get
       | the performance boost, whether it's fair for those features to
       | impose a tax on all users who don't or rarely use those features,
       | and whether there's a way to rearrange the code so that the
       | lesser used features are a low cost abstraction, one that you
       | mostly only pay if you use those features and are cheap if not
       | free if you don't.
       | 
       | There's a lot of spinoff libraries out there that have provoked a
       | reaction from the core team that cuts down cost of their
       | implementation by 25, 50%. And that's a rising tide that lifts
       | all boats.
        
       | tombert wrote:
       | Interesting, I need to dig into the guts of this because this
       | seems cool.
       | 
       | I'm a bit out of practice with Go but I never thought that the
       | channels were "slow", so getting 4-10x the speed is pretty
       | impressive. I wonder if it shares any design with LMAX
       | Disruptor...
        
         | bob1029 wrote:
         | > I wonder if it shares any design with LMAX Disruptor...
         | 
         | I've recently switched from using Disruptor.NET to Channel<T>
         | in many of my .NET implementations that require inter-thread
         | sync primitives. Disruptor can be faster, but I really like the
         | semantics of the built-in types.
         | 
         | https://learn.microsoft.com/en-us/dotnet/core/extensions/cha...
         | 
         | https://learn.microsoft.com/en-us/dotnet/api/system.threadin...
        
           | tombert wrote:
           | I've never used Disruptor.NET, only the Java version.
           | 
           | I personally will use traditional Java BlockingQueue for
           | about 95% of stuff, since they're built in and more than fast
           | enough for nearly everything, but Disruptor kicks its ass
           | when dealing with high-throughput stuff.
        
       | minaguib wrote:
       | OP: the readme could really benefit from a section describing the
       | underlying methodology, and comparing it to other approaches (Go
       | channels, LMAX, etc...)
        
         | karel-3d wrote:
         | The actual code and the actual bench is very short.
        
       | singron wrote:
       | After a brief skim, it looks like this implementation is highly
       | optimized for throughput and broadcasts whereas a channel has
       | many other usecases.
       | 
       | Consumers subscribing to the same event type are placed in a
       | group. There is a single lock for the whole group. When
       | publishing, the lock is taken once and the event is replicated to
       | each consumer's queue. Consumers take the lock and swap their
       | entire queue buffer, which lets them consume up to 128 events per
       | lock/unlock.
       | 
       | Since channels each have a lock and only take 1 element at a
       | time, they would require a lot more locking and unlocking.
       | 
       | There is also some frequent polling to maintain group metadata,
       | so this could be less ideal in low volume workloads where you
       | want CPU to go to 0%.
        
       | qudat wrote:
       | This is pretty neat, code looks minimal as well. At pico.sh we
       | wrote our own pubsub impl in Go that leveraged channels. We
       | primarily built it to use with https://pipe.pico.sh
       | 
       | https://github.com/picosh/pubsub
       | 
       | With this impl can you stream data or is it just for individual
       | events?
        
       | st3fan wrote:
       | "Processes millions of events per second" - yes, sure, when there
       | is nothing to process. But that is not representative of a real
       | app.
       | 
       | Add a database call or some simple data processing and then show
       | some numbers comparing between channels or throughput.
       | 
       | I hate these kind of claims. Similar with web frameworks that
       | shows reqs/s for an empty method.
        
       ___________________________________________________________________
       (page generated 2025-06-29 23:00 UTC)