[HN Gopher] Distributed Erlang
       ___________________________________________________________________
        
       Distributed Erlang
        
       Author : todsacerdoti
       Score  : 176 points
       Date   : 2024-12-03 02:51 UTC (20 hours ago)
        
 (HTM) web link (vereis.com)
 (TXT) w3m dump (vereis.com)
        
       | behnamoh wrote:
       | fn/n where n is parity of fn was confusing for me at first, but
       | then I started to like the idea, esp. in dynamic languages it's
       | helpful to have at least some sense of how things work by just
       | looking at them.
       | 
       | this made me think about a universal function definition syntax
       | that captures everything except implementation details of the
       | function. something like:                   fn (a: int, b: float,
       | c: T) {foo: callable, bar: callable, d: int} -> maybe(bool |
       | [str] | [T])
       | 
       | which shows that the function fn receives a (integer), b (float),
       | c (generic type T) and maybe returns something that's either a
       | boolean or a list of strings or a list of T, or it throws and
       | doesn't return anything. the function fn also makes use of foo
       | and bar (other functions) and variable d which are not given to
       | it as arguments but they exist in the "context" of fn.
        
         | jlarocco wrote:
         | Meh, if I need that much detail I'll just read the function.
         | 
         | And a function returning one of three types sounds like a
         | _TERRIBLE_ idea.
        
           | c0balt wrote:
           | > And a function returning one of three types sounds like a
           | TERRIBLE idea.
           | 
           | Should have been an enum :)
        
         | arlort wrote:
         | minor nitpick, I think you meant arity. If it's a typo sorry
         | for the nitpick if you had only ever heard it spoken out loud
         | before: arity is the number of parameters (in the context of
         | functions), parity is whether a number is even or odd
        
           | johnisgood wrote:
           | The distinctions are important, would not consider it a minor
           | nitpick. :P
        
         | elcritch wrote:
         | Well Erlang and Elixir both have a type checker, dialyzer,
         | which has a type specification not to far from what you
         | proposed. Well excluding the variable or function captures.
         | 
         | Elixir's syntax for declaring type-specs can be found at
         | https://hexdocs.pm/elixir/1.12/typespecs.html
        
           | the_duke wrote:
           | Elixir is also working on a proper type system with compile
           | time + runtime type checks.
        
         | itishappy wrote:
         | Looks similar to how languages with effect systems work. (Or
         | appear to work, I haven't actually sat down with one yet.) They
         | don't capture the exact functions, rather captuing the
         | capabilities of said functions, which are the important bit
         | anyway. The distinction between `print` vs `printline` doesn't
         | matter too much, the fact you're doing `IO` does. It seems to
         | be largely new languages trying this, but I believe there's a
         | few addons for languages such as Haskell as well.
         | 
         | https://koka-lang.github.io/koka/doc/index.html
         | 
         | https://www.unison-lang.org/docs/fundamentals/abilities/
         | 
         | https://dev.epicgames.com/documentation/en-us/uefn/verse-lan...
        
         | wbadart wrote:
         | As I understand it, Erlang inherited its arity notation from
         | Prolog (which early versions of Erlang were implemented in).
         | 
         | https://en.m.wikipedia.org/wiki/Erlang_(programming_language...
        
         | derefr wrote:
         | > fn/n where n is parity of fn was confusing for me at first,
         | but then I started to like the idea, esp. in dynamic languages
         | it's helpful to have at least some sense of how things work by
         | just looking at them.
         | 
         | To be clear, this syntax is not just "helpful", it's
         | semantically necessary given Erlang's compilation model.
         | 
         | In Erlang, foo/1 and foo/2 are separate functions; foo/2 could
         | be just as well named bar/1 and it wouldn't make a difference
         | to the compiler.
         | 
         | But on the other hand, `foo(X) when is_integer(X)` and `foo(X)
         | when is_binary(X)` _aren 't_ separate functions -- they're two
         | clause-heads of the same function foo/1, and they get unified
         | into a single function body during compilation.
         | 
         | So there's no way in Erlang for a (runtime) variable [i.e. a
         | BEAM VM register] to hold a handle/reference to "the `foo(X)
         | when is_integer(X)` part of foo/1" -- as that isn't a thing
         | that exists any more at runtime.
         | 
         | When you interrogate an Erlang module at runtime in the erl
         | REPL for the functions it exports
         | (`Module:module_info(exports)`), it gives you a list of
         | {FunctionName, Arity} tuples -- because those are the real
         | "names" of the functions in the module; you need both the
         | FunctionName and the Arity to uniquely reference/call the
         | function. But you _don 't_ need any kind of type information;
         | all the type information is lost from the "structure" of the
         | function, becoming just validation logic inside the function
         | body.
         | 
         | ---
         | 
         | In theory, you could have _compile-time_ type safety for
         | function handles, with type erasure at runtime ala Java 's
         | runtime erasure of compile-time type parameters. I feel like
         | Erlang is not the type of language to ever bother to add
         | support for this -- as it doesn't even accept or return
         | function handles from most system functions, instead accepting
         | or returning {FunctionName, Arity} tuples -- but in theory it
         | _could_.
        
           | toast0 wrote:
           | > But Erlang itself, as a language, doesn't even have
           | function-reference literals; you usually just pass
           | {FunctionName, Arity} tuples around.
           | 
           | I don't really understand what a literal is, but isn't fun
           | erlang:node/0 a literal? It operates the same way as 1, which
           | I'm quite sure is a numeric literal:                   Eshell
           | V12.3.2.17  (abort with ^G)         1> F = fun erlang:node/0.
           | fun erlang:node/0         2> X = 1.         1         3> F.
           | fun erlang:node/0         4> X.         1
           | 
           | I set the variable to the thing, and then when I output the
           | variable, I see the same value I set. AFAIK, both 1 and fun
           | erlang:node/0 must both be literals, as they behave the same;
           | but I'm happy to learn otherwise.
        
             | derefr wrote:
             | You're right! Edited. (I think I never wrote Erlang for
             | long enough before switching to Elixir to find this syntax.
             | Also, I don't think it's ever come up in any Erlang code
             | I've read. It's kind of obscure!)
        
       | qart wrote:
       | What's with the font? It seems to be flickering. Especially the
       | headings. Is there a way to turn it off?
        
         | dan353hehe wrote:
         | I think it's meant to replicate a CRT display? I has a real
         | hard time with those headings buzzing slightly.
        
           | emmanueloga_ wrote:
           | Uses @keyframes and text-shadow to try and mimic a CRT effect
           | but makes the text unreadable (for me at least). The browser
           | readability mode does work on the page though.
        
             | dkersten wrote:
             | Makes it unreadable for me too.
        
           | johnisgood wrote:
           | I think so too, and works on my old LCD monitor. If I focus
           | on it, I can see the subtle changes, but other than that, it
           | does not make it any less readable for me.
        
           | rbanffy wrote:
           | Speaking as someone who spent a lot of time in front of CRTs,
           | this is NOT what an average CRT looks like. This would be
           | reason to send it to maintenance back then - it looks like
           | breaking contacts or problems with the analog board.
           | 
           | A good CRT would show lines, but they'd be steady and not
           | flicker (unless you self-inflict an interlaced display on a
           | short phosphor screen). It might also show some color
           | convergence issues, but that, again, is adjustable (or you'd
           | send it to maintenance).
           | 
           | This looks like the kind of TV you wouldn't plug a VIC-20 to.
        
         | funkydata wrote:
         | If one of your browsers has this feature, just toggle reader
         | view.
        
         | leoff wrote:
         | type this in your console `document.body.style.cssText =
         | 'animation: none !important'`
        
         | desdenova wrote:
         | If a person considers putting this type of effect on text
         | reasonable, I really don't think they have anything of value to
         | say in the text content anyways.
        
         | dtquad wrote:
         | I think it looks cool. Of course there should be a
         | reader/accessibility mode but otherwise we need more creativity
         | on the web.
        
           | MisterTea wrote:
           | Maybe you need creativity but I just need information and
           | presentation isn't nearly as important as legibility.
        
         | penguin_booze wrote:
         | On Firefox, Ctrl+Alt+R - also known as Reader View.
        
         | amw wrote:
         | OP, if you're reading this, the animation pegs my CPU. Relying
         | on your readers to engage reader mode is going to turn away a
         | lot of people who might otherwise enjoy your content.
        
           | SrslyJosh wrote:
           | Unfortunately, scrolling is choppy for me even with reader
           | mode turned on. Great article, though.
        
       | openrisk wrote:
       | Quite readable even for those not familiar with Erlang.
       | 
       | Is there a list of projects that shows distributed erlang in
       | action (taking advantage of its strengths and avoiding the
       | pitfals)?
        
         | zaik wrote:
         | ejabberd and RabbitMQ are written in Erlang
        
           | brabel wrote:
           | And CouchDB
        
         | vereis wrote:
         | I recommend taking a look at the various open source Riak
         | applications too! Might not be updated to any sort of recent
         | versions of erlang but was a great resource to me early on.
        
       | toast0 wrote:
       | > This can lead to scalability issues in large clusters, as the
       | number of connections that each node needs to maintain grows
       | quadratically with the number of nodes in the cluster.
       | 
       | No, the total number of dist connections grows quadratically with
       | the number of nodes, but the number of dist connections each node
       | makes grows linearally.
       | 
       | > Not only that, in order to keep the cluster connected, each
       | node periodically sends heartbeat messages to every other node in
       | the cluster.
       | 
       | IIRC, heat beats are once every 30 seconds by default.
       | 
       | > This can lead to a lot of network traffic in large clusters,
       | which can put a strain on the network.
       | 
       | Lets say I'm right about 30 seconds between heart beats, and
       | you've got 1000 nodes. Every 30 seconds each node sends out 999
       | heartbeats (which almost certainly fit in a single tcp packet
       | each, maybe less if they're piggybacking on real data exchanges).
       | That's 999,000 packets every second, or 33k pps across your whole
       | cluster. For reference, GigE line rate with full 1500 mtu packets
       | is 80k pps. If you actually have 1000 nodes worth of work, the
       | heartbeats are not at all a big deal.
       | 
       | > Historically, a "large" cluster in Erlang was considered to be
       | around 50-100 nodes. This may have changed in recent years, but
       | it's still something to be aware of when designing distributed
       | Erlang systems.
       | 
       | I don't have recent numbers, but Rick Reed's presentation at
       | Erlang Factory in 2014 shows a dist cluster with 400 nodes. I'm
       | pretty sure I saw 1000+ node clusters too. I left WhatsApp in
       | 2019, and any public presentations from WA are less about raw
       | scale, because it's passe.
       | 
       | Really, 1000 dist connections is nothing when you're managing
       | 500k client connections. Dist connections weren't even a big deal
       | when we went to smaller nodes in FB.
       | 
       | It's good to have a solid backend network, and to try to bias
       | towards fewer larger nodes, rather than more smaller nodes. If
       | you want to play with large scale dist, so you spin up 1000 low
       | cpu, low memory VMs, you might have some trouble. It makes sense
       | to start with small nodes and whatever number makes you
       | comfortable for availability, and then when you run into limits,
       | reach for bigger nodes until you get to the point where adding
       | nodes is more cost effective: WA ran dual xeon 2690 servers
       | before the move to FB infra; facebook had better economics with
       | smaller single Xeon D nodes; I dunno what makes sense today,
       | maybe a single socket Epyc?
        
         | sausagefeet wrote:
         | > That's 999,000 packets every second, or 33k pps across your
         | whole cluster. For reference, GigE line rate with full 1500 mtu
         | packets is 80k pps. If you actually have 1000 nodes worth of
         | work, the heartbeats are not at all a big deal.
         | 
         | Using up almost half of your pps every 30 seconds for cluster
         | maintenance certainly seems like it's more than "not a big
         | deal", no?
        
           | desdenova wrote:
           | If you're at the point where you decided the 1000 nodes are
           | required, you should probably have already considered the
           | sensible alternatives first, and concluded this was somehow
           | better.
           | 
           | The network saturation is just a necessary cost of running
           | such a massive cluster.
           | 
           | I really have no idea what kind of system would require 1000
           | nodes, that couldn't be replaced by 100, 10x larger, nodes
           | instead. And at that point, you should probably be thinking
           | of ways to scale the network itself as well.
        
             | sausagefeet wrote:
             | I don't really understand what this comment is responding
             | to. The comment I responded to hand waved away consuming
             | almost 50% of your pps on heartbeats every 30 seconds as
             | "no big deal".
        
               | ricketycricket wrote:
               | > The comment I responded to hand waved away consuming
               | almost 50% of your pps on heartbeats every 30 seconds as
               | "no big deal".
               | 
               | > The network saturation is just a necessary cost of
               | running such a massive cluster.
               | 
               | I think this actually answers it perfectly.
               | 
               | 1. If you are running 1K distributed nodes, you have to
               | understand that means you have some overhead for running
               | such a large cluster. No one is hand waving this away,
               | it's just being acknowledged that this level of
               | complexity has a cost.
               | 
               | 2. If heartbeats are almost 50% of your pps, you are
               | trying to use 1Gbe to run a 1K-node cluster. No one would
               | do this in production and no one is claiming you should.
               | 
               | 3. If your system can tolerate it, change the heartbeat
               | interval to whatever you want.
               | 
               | 4. Don't use distributed Erlang if you don't have to.
               | Erlang/Elixir/Gleam work perfectly fine for non-
               | distributed workloads as do most languages that can't
               | distribute in the first place. But if you do need a
               | distributed system, you are unlikely to find a better way
               | to do it than the BEAM.
               | 
               | Basically, it seems you are taking issue with something
               | that 1) is that way because that's how things work, and
               | 2) is not how anyone would actually use it.
        
             | throwawaymaths wrote:
             | IIRC a preso by them, WA runs an eye popping number of
             | nodes in distribution and they are indeed fully connected,
             | and it's fine (though they do use a modified BEAM)
        
               | anonymousDan wrote:
               | Interesting, do you have a link by any chance?
        
               | toast0 wrote:
               | From 2014 there's a video link and slides here:
               | https://www.erlang-factory.com/sfbay2014/rick-reed FYI:
               | I've seen comments from after I left that wandist isn't
               | used anymore; I _think_ a lot of what we gained with that
               | was working around issues in pg2 that stem from global
               | locks not scaling... but the new pg doesn 't need global
               | locks at all. There was also some things in wandist to
               | work around difficulties communicating between SoftLayer
               | nodes and Facebook nodes, but that was a transitory need.
               | See the 2024 presentation, 40k nodes!
               | 
               | Fairly similar, but smaller numbers in 2012
               | http://www.erlang-
               | factory.com/conference/SFBay2012/speakers/...
               | 
               | The 2013 presentation is focused on MMS which I don't
               | remember if it was as impressive: http://www.erlang-
               | factory.com/conference/SFBay2013/speakers/... (note that
               | server side transcoding is from before end to end
               | encryption)
               | 
               | I don't think there were similar presentations on Erlang
               | in the large at WhatsApp after that. Big changes between
               | 2014 and 2019 (when I left) were
               | 
               | a) chat servers started doing a lot more, and clients per
               | server went down on the big SoftLayer boxes
               | 
               | b) hosting moved from SoftLayer to Facebook and much
               | smaller nodes --- also chat servers at SoftLayer were
               | individually addressable, using (augmented) round robin
               | DNS to select, at Facebook the chat servers did not have
               | public addresses, instead everything comes in through
               | load balancers
               | 
               | c) MMS was pretty much offloaded into a Facebook storage
               | service (c++); not because the Erlang wasn't sufficient,
               | but because MMS was loosely coupled with the rest of the
               | service, Facebook had a nice enough storage service, a
               | lot of storage, an awful lot of bandwidth, and it wasn't
               | a lot of work for that team to also support WhatsApp's
               | needs; also our Erlang MMS (and the PHP version before
               | it) was built around storing files on specific,
               | addressable nodes, but nodes at Facebook are much more
               | ephemeral and not easy to directly address by clients
               | 
               | d) some amount of data storage moved off of mnesia into
               | other Facebook data storage technology; again, not
               | because mnesia wasn't sufficient, but more ephemeral
               | nodes makes it cumbersome (addressable) and the available
               | hardware nodes at FB didn't really match --- there's a
               | very firm bias at FB towards using standard node sizes
               | and the available standard nodes were like a web machine
               | with not much ram or a big database machine with more ram
               | and tons of fast storage; WA mnesia wants lots of ram but
               | doesn't need a lot of storage (all data is in ram, and
               | dumped + logged to disk) so there was a mismatch there
               | --- things that stayed in mnesia needed much larger
               | clusters to manage data size
               | 
               | Presentations became less common because of more layers
               | to get approval, and also because it's less fun to share
               | how we built something on top of proprietary layers that
               | others don't really have access to. Anybody could have
               | gotten dual 2690 servers at SoftLayer and run a nice
               | Erlang cluster. Only a few people could run an even
               | bigger chat cluster in a Facebook like hosting
               | environment.
        
               | throwawaymaths wrote:
               | I remember seeing this when it hit the internet:
               | 
               | https://www.youtube.com/watch?v=A5bLRH-PoMY
               | 
               | 40,000 erlang nodes in a cluster
               | 
               | lots of specifics about what tweaks they use. Rewatching,
               | it seems like you don't have to really use the modified
               | BEAM except in a few small soft-code (replace OTP/stdlib
               | functionality) where they provide enough information that
               | you could probably write it yourself if you had to, and a
               | lot of the optimizations have been upstreamed into core
               | BEAM -- WA is a pretty good citizen of the ecosystem.
        
             | toast0 wrote:
             | > I really have no idea what kind of system would require
             | 1000 nodes, that couldn't be replaced by 100, 10x larger,
             | nodes instead.
             | 
             | Nodes only get so big. Way back when, a quad xeon 4650 v2
             | was definitely not 2x the throughput of a dual xeon 2690
             | v2; so you end up with two dual socket systems instead of
             | one quad socket. A quad socket server often costs
             | significantly more than two dual socket servers, and is
             | likely to take longer between order and delivery. There's
             | usually a point where you _can_ still scale up, but scaling
             | out is a better use of resources.
        
           | worthless-trash wrote:
           | It is possible to have a heartbeat network a data network if
           | so required.
        
           | anoother wrote:
           | On a gigabit connection. It's hard to imagine clusters
           | running on anything below 10Gb as an absolute minimum.
        
           | immibis wrote:
           | That's for 1000 nodes all sharing a single 1Gbps connection
           | and sending 1500-byte heartbeats for some reason. Make them
           | 150 bytes (more realistic) and now it's almost half the
           | throughput of the 100Mbps router that for some reason you are
           | sending every single packet through, for 1000 nodes.
        
             | gatnoodle wrote:
             | I see, it makes more sense now.
        
           | davisp wrote:
           | > If you actually have 1000 nodes worth of work, the
           | heartbeats are not at all a big deal.
           | 
           | I think you're missing the fact that the heart beats will be
           | combined with existing packets. Hence the quoted bit. If
           | you've got 1000 nodes, they should be doing something with
           | that network such that an extra 50 bytes (or so) every 30s
           | would not be an issue.
        
             | gpderetta wrote:
             | They would be combined if each node was sending messages
             | each second to every other node. Is that realistic?
        
           | sgarland wrote:
           | If you're running 1Gbe in prod at any kind of scale,
           | something has gone horribly awry.
        
           | crabbone wrote:
           | I doubt these are even sent if there's actual traffic going
           | through the network. I mean, why do you need to ping nodes,
           | if they are already talking to you?
        
           | toast0 wrote:
           | (Oops, I meant to say 999k packets every 30 seconds. Thanks
           | everyone for running with the pps number)
           | 
           | If your switching fabric can only deal with 1Gbps, yes,
           | you've used it halfway up with heartbeats. But if your
           | network is 1x 48 port 1G switch and 44x 24 port 1G switches,
           | you won't bottleneck on heartbeats, because that spine switch
           | should be able to simultaneously send and receive at line
           | rate on all ports whicj is plenty of bandwidth. You might
           | well bottleneck on other transmissions, but the nice thing
           | about dist heartbeats is on a connection, each node is
           | sending heartbeats on a timer and will close the connection
           | if it doesn't see a heartbeat in some timeframe; it's a
           | requirement for progress, it's not a requirement for a timely
           | response, so you can end up with epic round trip times for
           | net_adm:ping ... I've seen on the order of an hour once over
           | a long distance dist connection with an unexpected bandwidth
           | constraint.
           | 
           | It would probably be a lot more comfortable if your spine
           | switch was 10g and your node switches had a 10g uplink, and
           | you may want to consider LACP and double up all the
           | connections. You might also want to consider other
           | topologies, but this is just an illustration.
        
           | fwip wrote:
           | Wouldn't you have a lot more than one gigabit connection? I'm
           | struggling to imagine the 1000-node cluster layout that sends
           | all intra-cluster traffic through a single cable.
        
         | vereis wrote:
         | ahoy!! thanks for spotting the issue I wrote the post in a
         | stream of consciousness after a long day! I'll make that edit
         | and call it out!
         | 
         | the statement about what historically constitutes a large
         | erlang cluster was an anecdote told to by Francesco Cesarini
         | during lunch a few years ago -- I'm not actually sure of the
         | time frame (or my memory!)
         | 
         | likewise I'll update the post to reflect that! thanks ( * _ * )
        
       | anacrolix wrote:
       | awful font.
       | 
       | distributed Erlang is like saying ATM machine
        
         | hsavit1 wrote:
         | yeah, isolated + distributed processes is THE THING about
         | Erlang, OTP
        
           | throwawaymaths wrote:
           | I mean you can certainly run one off scripts in elixir and
           | Erlang.
        
           | vereis wrote:
           | my personal reality is that the majority of projects I've
           | consulted on have seldom actually leveraged distributed
           | erlang for anything. the concurrency part yes, clustering for
           | the sake of availability or spreading load yes, but actually
           | doing anything more complex than that has been the exception!
           | ymmv tho!
        
         | vereis wrote:
         | honest q: how do you distinguish between single node erlang
         | applications vs clustered erlang applications?
        
           | toast0 wrote:
           | This returns true if you're on a single node erlang
           | application :P                   erlang:node() ==
           | nonode@nohost andalso erlang:nodes(known) == [nonode@nohost]
           | 
           | A single node erlang application would be one that doesn't
           | use dist at all. Although, if it includes anything gen_event
           | or similar, and it happens to be dist connected, unless it
           | specifically checks, it will happily reply to remote Erlang
           | processes.
        
       ___________________________________________________________________
       (page generated 2024-12-03 23:01 UTC)