[HN Gopher] Distributed Erlang
___________________________________________________________________
Distributed Erlang
Author : todsacerdoti
Score : 176 points
Date : 2024-12-03 02:51 UTC (20 hours ago)
(HTM) web link (vereis.com)
(TXT) w3m dump (vereis.com)
| behnamoh wrote:
| fn/n where n is parity of fn was confusing for me at first, but
| then I started to like the idea, esp. in dynamic languages it's
| helpful to have at least some sense of how things work by just
| looking at them.
|
| this made me think about a universal function definition syntax
| that captures everything except implementation details of the
| function. something like: fn (a: int, b: float,
| c: T) {foo: callable, bar: callable, d: int} -> maybe(bool |
| [str] | [T])
|
| which shows that the function fn receives a (integer), b (float),
| c (generic type T) and maybe returns something that's either a
| boolean or a list of strings or a list of T, or it throws and
| doesn't return anything. the function fn also makes use of foo
| and bar (other functions) and variable d which are not given to
| it as arguments but they exist in the "context" of fn.
| jlarocco wrote:
| Meh, if I need that much detail I'll just read the function.
|
| And a function returning one of three types sounds like a
| _TERRIBLE_ idea.
| c0balt wrote:
| > And a function returning one of three types sounds like a
| TERRIBLE idea.
|
| Should have been an enum :)
| arlort wrote:
| minor nitpick, I think you meant arity. If it's a typo sorry
| for the nitpick if you had only ever heard it spoken out loud
| before: arity is the number of parameters (in the context of
| functions), parity is whether a number is even or odd
| johnisgood wrote:
| The distinctions are important, would not consider it a minor
| nitpick. :P
| elcritch wrote:
| Well Erlang and Elixir both have a type checker, dialyzer,
| which has a type specification not to far from what you
| proposed. Well excluding the variable or function captures.
|
| Elixir's syntax for declaring type-specs can be found at
| https://hexdocs.pm/elixir/1.12/typespecs.html
| the_duke wrote:
| Elixir is also working on a proper type system with compile
| time + runtime type checks.
| itishappy wrote:
| Looks similar to how languages with effect systems work. (Or
| appear to work, I haven't actually sat down with one yet.) They
| don't capture the exact functions, rather captuing the
| capabilities of said functions, which are the important bit
| anyway. The distinction between `print` vs `printline` doesn't
| matter too much, the fact you're doing `IO` does. It seems to
| be largely new languages trying this, but I believe there's a
| few addons for languages such as Haskell as well.
|
| https://koka-lang.github.io/koka/doc/index.html
|
| https://www.unison-lang.org/docs/fundamentals/abilities/
|
| https://dev.epicgames.com/documentation/en-us/uefn/verse-lan...
| wbadart wrote:
| As I understand it, Erlang inherited its arity notation from
| Prolog (which early versions of Erlang were implemented in).
|
| https://en.m.wikipedia.org/wiki/Erlang_(programming_language...
| derefr wrote:
| > fn/n where n is parity of fn was confusing for me at first,
| but then I started to like the idea, esp. in dynamic languages
| it's helpful to have at least some sense of how things work by
| just looking at them.
|
| To be clear, this syntax is not just "helpful", it's
| semantically necessary given Erlang's compilation model.
|
| In Erlang, foo/1 and foo/2 are separate functions; foo/2 could
| be just as well named bar/1 and it wouldn't make a difference
| to the compiler.
|
| But on the other hand, `foo(X) when is_integer(X)` and `foo(X)
| when is_binary(X)` _aren 't_ separate functions -- they're two
| clause-heads of the same function foo/1, and they get unified
| into a single function body during compilation.
|
| So there's no way in Erlang for a (runtime) variable [i.e. a
| BEAM VM register] to hold a handle/reference to "the `foo(X)
| when is_integer(X)` part of foo/1" -- as that isn't a thing
| that exists any more at runtime.
|
| When you interrogate an Erlang module at runtime in the erl
| REPL for the functions it exports
| (`Module:module_info(exports)`), it gives you a list of
| {FunctionName, Arity} tuples -- because those are the real
| "names" of the functions in the module; you need both the
| FunctionName and the Arity to uniquely reference/call the
| function. But you _don 't_ need any kind of type information;
| all the type information is lost from the "structure" of the
| function, becoming just validation logic inside the function
| body.
|
| ---
|
| In theory, you could have _compile-time_ type safety for
| function handles, with type erasure at runtime ala Java 's
| runtime erasure of compile-time type parameters. I feel like
| Erlang is not the type of language to ever bother to add
| support for this -- as it doesn't even accept or return
| function handles from most system functions, instead accepting
| or returning {FunctionName, Arity} tuples -- but in theory it
| _could_.
| toast0 wrote:
| > But Erlang itself, as a language, doesn't even have
| function-reference literals; you usually just pass
| {FunctionName, Arity} tuples around.
|
| I don't really understand what a literal is, but isn't fun
| erlang:node/0 a literal? It operates the same way as 1, which
| I'm quite sure is a numeric literal: Eshell
| V12.3.2.17 (abort with ^G) 1> F = fun erlang:node/0.
| fun erlang:node/0 2> X = 1. 1 3> F.
| fun erlang:node/0 4> X. 1
|
| I set the variable to the thing, and then when I output the
| variable, I see the same value I set. AFAIK, both 1 and fun
| erlang:node/0 must both be literals, as they behave the same;
| but I'm happy to learn otherwise.
| derefr wrote:
| You're right! Edited. (I think I never wrote Erlang for
| long enough before switching to Elixir to find this syntax.
| Also, I don't think it's ever come up in any Erlang code
| I've read. It's kind of obscure!)
| qart wrote:
| What's with the font? It seems to be flickering. Especially the
| headings. Is there a way to turn it off?
| dan353hehe wrote:
| I think it's meant to replicate a CRT display? I has a real
| hard time with those headings buzzing slightly.
| emmanueloga_ wrote:
| Uses @keyframes and text-shadow to try and mimic a CRT effect
| but makes the text unreadable (for me at least). The browser
| readability mode does work on the page though.
| dkersten wrote:
| Makes it unreadable for me too.
| johnisgood wrote:
| I think so too, and works on my old LCD monitor. If I focus
| on it, I can see the subtle changes, but other than that, it
| does not make it any less readable for me.
| rbanffy wrote:
| Speaking as someone who spent a lot of time in front of CRTs,
| this is NOT what an average CRT looks like. This would be
| reason to send it to maintenance back then - it looks like
| breaking contacts or problems with the analog board.
|
| A good CRT would show lines, but they'd be steady and not
| flicker (unless you self-inflict an interlaced display on a
| short phosphor screen). It might also show some color
| convergence issues, but that, again, is adjustable (or you'd
| send it to maintenance).
|
| This looks like the kind of TV you wouldn't plug a VIC-20 to.
| funkydata wrote:
| If one of your browsers has this feature, just toggle reader
| view.
| leoff wrote:
| type this in your console `document.body.style.cssText =
| 'animation: none !important'`
| desdenova wrote:
| If a person considers putting this type of effect on text
| reasonable, I really don't think they have anything of value to
| say in the text content anyways.
| dtquad wrote:
| I think it looks cool. Of course there should be a
| reader/accessibility mode but otherwise we need more creativity
| on the web.
| MisterTea wrote:
| Maybe you need creativity but I just need information and
| presentation isn't nearly as important as legibility.
| penguin_booze wrote:
| On Firefox, Ctrl+Alt+R - also known as Reader View.
| amw wrote:
| OP, if you're reading this, the animation pegs my CPU. Relying
| on your readers to engage reader mode is going to turn away a
| lot of people who might otherwise enjoy your content.
| SrslyJosh wrote:
| Unfortunately, scrolling is choppy for me even with reader
| mode turned on. Great article, though.
| openrisk wrote:
| Quite readable even for those not familiar with Erlang.
|
| Is there a list of projects that shows distributed erlang in
| action (taking advantage of its strengths and avoiding the
| pitfals)?
| zaik wrote:
| ejabberd and RabbitMQ are written in Erlang
| brabel wrote:
| And CouchDB
| vereis wrote:
| I recommend taking a look at the various open source Riak
| applications too! Might not be updated to any sort of recent
| versions of erlang but was a great resource to me early on.
| toast0 wrote:
| > This can lead to scalability issues in large clusters, as the
| number of connections that each node needs to maintain grows
| quadratically with the number of nodes in the cluster.
|
| No, the total number of dist connections grows quadratically with
| the number of nodes, but the number of dist connections each node
| makes grows linearally.
|
| > Not only that, in order to keep the cluster connected, each
| node periodically sends heartbeat messages to every other node in
| the cluster.
|
| IIRC, heat beats are once every 30 seconds by default.
|
| > This can lead to a lot of network traffic in large clusters,
| which can put a strain on the network.
|
| Lets say I'm right about 30 seconds between heart beats, and
| you've got 1000 nodes. Every 30 seconds each node sends out 999
| heartbeats (which almost certainly fit in a single tcp packet
| each, maybe less if they're piggybacking on real data exchanges).
| That's 999,000 packets every second, or 33k pps across your whole
| cluster. For reference, GigE line rate with full 1500 mtu packets
| is 80k pps. If you actually have 1000 nodes worth of work, the
| heartbeats are not at all a big deal.
|
| > Historically, a "large" cluster in Erlang was considered to be
| around 50-100 nodes. This may have changed in recent years, but
| it's still something to be aware of when designing distributed
| Erlang systems.
|
| I don't have recent numbers, but Rick Reed's presentation at
| Erlang Factory in 2014 shows a dist cluster with 400 nodes. I'm
| pretty sure I saw 1000+ node clusters too. I left WhatsApp in
| 2019, and any public presentations from WA are less about raw
| scale, because it's passe.
|
| Really, 1000 dist connections is nothing when you're managing
| 500k client connections. Dist connections weren't even a big deal
| when we went to smaller nodes in FB.
|
| It's good to have a solid backend network, and to try to bias
| towards fewer larger nodes, rather than more smaller nodes. If
| you want to play with large scale dist, so you spin up 1000 low
| cpu, low memory VMs, you might have some trouble. It makes sense
| to start with small nodes and whatever number makes you
| comfortable for availability, and then when you run into limits,
| reach for bigger nodes until you get to the point where adding
| nodes is more cost effective: WA ran dual xeon 2690 servers
| before the move to FB infra; facebook had better economics with
| smaller single Xeon D nodes; I dunno what makes sense today,
| maybe a single socket Epyc?
| sausagefeet wrote:
| > That's 999,000 packets every second, or 33k pps across your
| whole cluster. For reference, GigE line rate with full 1500 mtu
| packets is 80k pps. If you actually have 1000 nodes worth of
| work, the heartbeats are not at all a big deal.
|
| Using up almost half of your pps every 30 seconds for cluster
| maintenance certainly seems like it's more than "not a big
| deal", no?
| desdenova wrote:
| If you're at the point where you decided the 1000 nodes are
| required, you should probably have already considered the
| sensible alternatives first, and concluded this was somehow
| better.
|
| The network saturation is just a necessary cost of running
| such a massive cluster.
|
| I really have no idea what kind of system would require 1000
| nodes, that couldn't be replaced by 100, 10x larger, nodes
| instead. And at that point, you should probably be thinking
| of ways to scale the network itself as well.
| sausagefeet wrote:
| I don't really understand what this comment is responding
| to. The comment I responded to hand waved away consuming
| almost 50% of your pps on heartbeats every 30 seconds as
| "no big deal".
| ricketycricket wrote:
| > The comment I responded to hand waved away consuming
| almost 50% of your pps on heartbeats every 30 seconds as
| "no big deal".
|
| > The network saturation is just a necessary cost of
| running such a massive cluster.
|
| I think this actually answers it perfectly.
|
| 1. If you are running 1K distributed nodes, you have to
| understand that means you have some overhead for running
| such a large cluster. No one is hand waving this away,
| it's just being acknowledged that this level of
| complexity has a cost.
|
| 2. If heartbeats are almost 50% of your pps, you are
| trying to use 1Gbe to run a 1K-node cluster. No one would
| do this in production and no one is claiming you should.
|
| 3. If your system can tolerate it, change the heartbeat
| interval to whatever you want.
|
| 4. Don't use distributed Erlang if you don't have to.
| Erlang/Elixir/Gleam work perfectly fine for non-
| distributed workloads as do most languages that can't
| distribute in the first place. But if you do need a
| distributed system, you are unlikely to find a better way
| to do it than the BEAM.
|
| Basically, it seems you are taking issue with something
| that 1) is that way because that's how things work, and
| 2) is not how anyone would actually use it.
| throwawaymaths wrote:
| IIRC a preso by them, WA runs an eye popping number of
| nodes in distribution and they are indeed fully connected,
| and it's fine (though they do use a modified BEAM)
| anonymousDan wrote:
| Interesting, do you have a link by any chance?
| toast0 wrote:
| From 2014 there's a video link and slides here:
| https://www.erlang-factory.com/sfbay2014/rick-reed FYI:
| I've seen comments from after I left that wandist isn't
| used anymore; I _think_ a lot of what we gained with that
| was working around issues in pg2 that stem from global
| locks not scaling... but the new pg doesn 't need global
| locks at all. There was also some things in wandist to
| work around difficulties communicating between SoftLayer
| nodes and Facebook nodes, but that was a transitory need.
| See the 2024 presentation, 40k nodes!
|
| Fairly similar, but smaller numbers in 2012
| http://www.erlang-
| factory.com/conference/SFBay2012/speakers/...
|
| The 2013 presentation is focused on MMS which I don't
| remember if it was as impressive: http://www.erlang-
| factory.com/conference/SFBay2013/speakers/... (note that
| server side transcoding is from before end to end
| encryption)
|
| I don't think there were similar presentations on Erlang
| in the large at WhatsApp after that. Big changes between
| 2014 and 2019 (when I left) were
|
| a) chat servers started doing a lot more, and clients per
| server went down on the big SoftLayer boxes
|
| b) hosting moved from SoftLayer to Facebook and much
| smaller nodes --- also chat servers at SoftLayer were
| individually addressable, using (augmented) round robin
| DNS to select, at Facebook the chat servers did not have
| public addresses, instead everything comes in through
| load balancers
|
| c) MMS was pretty much offloaded into a Facebook storage
| service (c++); not because the Erlang wasn't sufficient,
| but because MMS was loosely coupled with the rest of the
| service, Facebook had a nice enough storage service, a
| lot of storage, an awful lot of bandwidth, and it wasn't
| a lot of work for that team to also support WhatsApp's
| needs; also our Erlang MMS (and the PHP version before
| it) was built around storing files on specific,
| addressable nodes, but nodes at Facebook are much more
| ephemeral and not easy to directly address by clients
|
| d) some amount of data storage moved off of mnesia into
| other Facebook data storage technology; again, not
| because mnesia wasn't sufficient, but more ephemeral
| nodes makes it cumbersome (addressable) and the available
| hardware nodes at FB didn't really match --- there's a
| very firm bias at FB towards using standard node sizes
| and the available standard nodes were like a web machine
| with not much ram or a big database machine with more ram
| and tons of fast storage; WA mnesia wants lots of ram but
| doesn't need a lot of storage (all data is in ram, and
| dumped + logged to disk) so there was a mismatch there
| --- things that stayed in mnesia needed much larger
| clusters to manage data size
|
| Presentations became less common because of more layers
| to get approval, and also because it's less fun to share
| how we built something on top of proprietary layers that
| others don't really have access to. Anybody could have
| gotten dual 2690 servers at SoftLayer and run a nice
| Erlang cluster. Only a few people could run an even
| bigger chat cluster in a Facebook like hosting
| environment.
| throwawaymaths wrote:
| I remember seeing this when it hit the internet:
|
| https://www.youtube.com/watch?v=A5bLRH-PoMY
|
| 40,000 erlang nodes in a cluster
|
| lots of specifics about what tweaks they use. Rewatching,
| it seems like you don't have to really use the modified
| BEAM except in a few small soft-code (replace OTP/stdlib
| functionality) where they provide enough information that
| you could probably write it yourself if you had to, and a
| lot of the optimizations have been upstreamed into core
| BEAM -- WA is a pretty good citizen of the ecosystem.
| toast0 wrote:
| > I really have no idea what kind of system would require
| 1000 nodes, that couldn't be replaced by 100, 10x larger,
| nodes instead.
|
| Nodes only get so big. Way back when, a quad xeon 4650 v2
| was definitely not 2x the throughput of a dual xeon 2690
| v2; so you end up with two dual socket systems instead of
| one quad socket. A quad socket server often costs
| significantly more than two dual socket servers, and is
| likely to take longer between order and delivery. There's
| usually a point where you _can_ still scale up, but scaling
| out is a better use of resources.
| worthless-trash wrote:
| It is possible to have a heartbeat network a data network if
| so required.
| anoother wrote:
| On a gigabit connection. It's hard to imagine clusters
| running on anything below 10Gb as an absolute minimum.
| immibis wrote:
| That's for 1000 nodes all sharing a single 1Gbps connection
| and sending 1500-byte heartbeats for some reason. Make them
| 150 bytes (more realistic) and now it's almost half the
| throughput of the 100Mbps router that for some reason you are
| sending every single packet through, for 1000 nodes.
| gatnoodle wrote:
| I see, it makes more sense now.
| davisp wrote:
| > If you actually have 1000 nodes worth of work, the
| heartbeats are not at all a big deal.
|
| I think you're missing the fact that the heart beats will be
| combined with existing packets. Hence the quoted bit. If
| you've got 1000 nodes, they should be doing something with
| that network such that an extra 50 bytes (or so) every 30s
| would not be an issue.
| gpderetta wrote:
| They would be combined if each node was sending messages
| each second to every other node. Is that realistic?
| sgarland wrote:
| If you're running 1Gbe in prod at any kind of scale,
| something has gone horribly awry.
| crabbone wrote:
| I doubt these are even sent if there's actual traffic going
| through the network. I mean, why do you need to ping nodes,
| if they are already talking to you?
| toast0 wrote:
| (Oops, I meant to say 999k packets every 30 seconds. Thanks
| everyone for running with the pps number)
|
| If your switching fabric can only deal with 1Gbps, yes,
| you've used it halfway up with heartbeats. But if your
| network is 1x 48 port 1G switch and 44x 24 port 1G switches,
| you won't bottleneck on heartbeats, because that spine switch
| should be able to simultaneously send and receive at line
| rate on all ports whicj is plenty of bandwidth. You might
| well bottleneck on other transmissions, but the nice thing
| about dist heartbeats is on a connection, each node is
| sending heartbeats on a timer and will close the connection
| if it doesn't see a heartbeat in some timeframe; it's a
| requirement for progress, it's not a requirement for a timely
| response, so you can end up with epic round trip times for
| net_adm:ping ... I've seen on the order of an hour once over
| a long distance dist connection with an unexpected bandwidth
| constraint.
|
| It would probably be a lot more comfortable if your spine
| switch was 10g and your node switches had a 10g uplink, and
| you may want to consider LACP and double up all the
| connections. You might also want to consider other
| topologies, but this is just an illustration.
| fwip wrote:
| Wouldn't you have a lot more than one gigabit connection? I'm
| struggling to imagine the 1000-node cluster layout that sends
| all intra-cluster traffic through a single cable.
| vereis wrote:
| ahoy!! thanks for spotting the issue I wrote the post in a
| stream of consciousness after a long day! I'll make that edit
| and call it out!
|
| the statement about what historically constitutes a large
| erlang cluster was an anecdote told to by Francesco Cesarini
| during lunch a few years ago -- I'm not actually sure of the
| time frame (or my memory!)
|
| likewise I'll update the post to reflect that! thanks ( * _ * )
| anacrolix wrote:
| awful font.
|
| distributed Erlang is like saying ATM machine
| hsavit1 wrote:
| yeah, isolated + distributed processes is THE THING about
| Erlang, OTP
| throwawaymaths wrote:
| I mean you can certainly run one off scripts in elixir and
| Erlang.
| vereis wrote:
| my personal reality is that the majority of projects I've
| consulted on have seldom actually leveraged distributed
| erlang for anything. the concurrency part yes, clustering for
| the sake of availability or spreading load yes, but actually
| doing anything more complex than that has been the exception!
| ymmv tho!
| vereis wrote:
| honest q: how do you distinguish between single node erlang
| applications vs clustered erlang applications?
| toast0 wrote:
| This returns true if you're on a single node erlang
| application :P erlang:node() ==
| nonode@nohost andalso erlang:nodes(known) == [nonode@nohost]
|
| A single node erlang application would be one that doesn't
| use dist at all. Although, if it includes anything gen_event
| or similar, and it happens to be dist connected, unless it
| specifically checks, it will happily reply to remote Erlang
| processes.
___________________________________________________________________
(page generated 2024-12-03 23:01 UTC)