[HN Gopher] 4-10x faster in-process pub/sub for Go
___________________________________________________________________
4-10x faster in-process pub/sub for Go
Author : kelindar
Score : 90 points
Date : 2025-06-29 15:19 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| kelindar wrote:
| This might be useful to some if you need a very light pub/sub
| inside one process.
|
| I was building a small multiplayer game in Go. Started with a
| channel fan-out but (for no particular reason) wanted to see if
| we can do better. Put together this tiny event bus to test, and
| on my i7-13700K it delivers events in 10-40ns, roughly 4-10x
| faster than the plain channel loop, depending on the
| configuration.
| zx2c4 wrote:
| > about 4x to 10x faster than channels.
|
| I'd be interested to learn why/how and what the underlying
| structural differences are that make this possible.
| MathMonkeyMan wrote:
| I didn't look, but I don't think of channels as a pub/sub
| mechanism. You can have a producer close() a channel to notify
| consumers of a value available somewhere else, or you can loop
| through a bunch of buffered channels and do nonblocking sends.
|
| A different design, without channels, could improve on those.
| atombender wrote:
| I prefer to think of channels as a memory-sharing mechanism.
|
| In most cases where you want to send data between concurrent
| goroutines, channels are a better primitive, as they allow
| the sender and receiver to safely and concurrently process
| data without needing explicit locks. (Internally, channels
| are protected with mutexes, but that's a single, battle-
| tested and likely bug-free implementation shared by all users
| of channels.)
|
| The fact that channels also block on send/receive means and
| support buffering means that there's a lot more to them, but
| that's how you should think of them. The fact that channels
| look like a queue if you squint is a red herring that has
| caused many a junior developer to abuse them for that
| purpose, but they are a surprisingly poor fit for that. Even
| backpressure tends to be something you want to control
| manually (using intermediate buffers and so on), because
| channels can be fiendishly hard to debug once you chain more
| than a couple of them. Something forgets to close a close a
| channel, and your whole pipeline can stall. Channels are also
| slow, requiring mutex locking even in scenarios where data
| isn't in need of locking and could just be passed directly
| between functions.
|
| Lots of libraries (such as Rill and go-stream) have sprung up
| that wrap channels to model data pipelines (especially with
| generics it's become easier to build generic operators like
| deduping, fan-out, buffering and so on), but I've found them
| to be a bad idea. Channels should remain a low-level
| primitive to build pipelines, but they're not what you should
| use as your main API surface.
| MathMonkeyMan wrote:
| > Channels should remain a low-level primitive to build
| pipelines, but they're not what you should use as your main
| API surface.
|
| I remember hearing (not sure where) that this is a lesson
| that was learned early on in Go. Channels were the new
| hotness, so let's use them to do things that were not
| possible before. But it turned out that Go was better for
| doing what was already possible before, but more cleanly.
| absolute_unit22 wrote:
| > High Performance: Processes millions of events per second,
| about 4x to 10x faster than channels.
|
| Wow - that's a pretty impressive accomplishment. I've been
| meaning to move some workers I have to a pub/sub on
| https://www.typequicker.com.
|
| I might try using this in prod. I don't really need the insane
| performance benefits as I don't have my traffic lol - but I
| always like experimenting with new open source libraries -
| especially while the site isn't very large yet
| MunishMummadi wrote:
| that's a cool site. you will see me more frequently btw do some
| tech twitter promos.
| hinkley wrote:
| It's always worth discussing what features were thrown out to get
| the performance boost, whether it's fair for those features to
| impose a tax on all users who don't or rarely use those features,
| and whether there's a way to rearrange the code so that the
| lesser used features are a low cost abstraction, one that you
| mostly only pay if you use those features and are cheap if not
| free if you don't.
|
| There's a lot of spinoff libraries out there that have provoked a
| reaction from the core team that cuts down cost of their
| implementation by 25, 50%. And that's a rising tide that lifts
| all boats.
| tombert wrote:
| Interesting, I need to dig into the guts of this because this
| seems cool.
|
| I'm a bit out of practice with Go but I never thought that the
| channels were "slow", so getting 4-10x the speed is pretty
| impressive. I wonder if it shares any design with LMAX
| Disruptor...
| bob1029 wrote:
| > I wonder if it shares any design with LMAX Disruptor...
|
| I've recently switched from using Disruptor.NET to Channel<T>
| in many of my .NET implementations that require inter-thread
| sync primitives. Disruptor can be faster, but I really like the
| semantics of the built-in types.
|
| https://learn.microsoft.com/en-us/dotnet/core/extensions/cha...
|
| https://learn.microsoft.com/en-us/dotnet/api/system.threadin...
| tombert wrote:
| I've never used Disruptor.NET, only the Java version.
|
| I personally will use traditional Java BlockingQueue for
| about 95% of stuff, since they're built in and more than fast
| enough for nearly everything, but Disruptor kicks its ass
| when dealing with high-throughput stuff.
| minaguib wrote:
| OP: the readme could really benefit from a section describing the
| underlying methodology, and comparing it to other approaches (Go
| channels, LMAX, etc...)
| karel-3d wrote:
| The actual code and the actual bench is very short.
| singron wrote:
| After a brief skim, it looks like this implementation is highly
| optimized for throughput and broadcasts whereas a channel has
| many other usecases.
|
| Consumers subscribing to the same event type are placed in a
| group. There is a single lock for the whole group. When
| publishing, the lock is taken once and the event is replicated to
| each consumer's queue. Consumers take the lock and swap their
| entire queue buffer, which lets them consume up to 128 events per
| lock/unlock.
|
| Since channels each have a lock and only take 1 element at a
| time, they would require a lot more locking and unlocking.
|
| There is also some frequent polling to maintain group metadata,
| so this could be less ideal in low volume workloads where you
| want CPU to go to 0%.
| qudat wrote:
| This is pretty neat, code looks minimal as well. At pico.sh we
| wrote our own pubsub impl in Go that leveraged channels. We
| primarily built it to use with https://pipe.pico.sh
|
| https://github.com/picosh/pubsub
|
| With this impl can you stream data or is it just for individual
| events?
| st3fan wrote:
| "Processes millions of events per second" - yes, sure, when there
| is nothing to process. But that is not representative of a real
| app.
|
| Add a database call or some simple data processing and then show
| some numbers comparing between channels or throughput.
|
| I hate these kind of claims. Similar with web frameworks that
| shows reqs/s for an empty method.
___________________________________________________________________
(page generated 2025-06-29 23:00 UTC)