[HN Gopher] How to do distributed locking (2016)
       ___________________________________________________________________
        
       How to do distributed locking (2016)
        
       Author : yusufaytas
       Score  : 160 points
       Date   : 2024-10-20 10:38 UTC (12 hours ago)
        
 (HTM) web link (martin.kleppmann.com)
 (TXT) w3m dump (martin.kleppmann.com)
        
       | galeaspablo wrote:
       | Many engineers don't truly care about the correctness issue,
       | until it's too late. Similar to security.
       | 
       | Or they care but don't bother checking whether what they're doing
       | is correct.
       | 
       | For example, in my field, where microservices/actors/processes
       | pass messages between each other over a network, I dare say >95%
       | of implementations I see have edge cases where messages might be
       | lost or processed out of order.
       | 
       | But there isn't an alignment of incentives that fixes this
       | problem. Ie the payment structures for executives and engineers
       | aren't aligned with the best outcome for customers and
       | shareholders.
        
         | mrkeen wrote:
         | I think there's a bit of an alignment of incentives: the edge
         | cases are tricky enough that your programmers probably need to
         | handle a lot of support tickets, which isn't good for anyone.
         | 
         | But I don't see anyway to convince yesterday's managers to give
         | us time to build it right.
        
         | sethammons wrote:
         | The path to fixing this requires first measuring and monitoring
         | it, then establishing service level objectives that represent
         | customer experience. Product and engineering teams have to
         | agree on them. If the SLOs become violated, focus shifts
         | towards system stability.
         | 
         | Getting everyone onboard is hard and that is why good
         | leadership is needed. When customers start to churn because
         | bugs pop up and new features are slow or non existent, then the
         | case is very easy to make quality part of the process. Mature
         | leaders get ahead of that as early as possible.
        
           | galeaspablo wrote:
           | Good leadership is spot on! Agreed. The cynic part of me sees
           | incentives that discourage mature leadership styles.
           | 
           | Leaders tend to be impatient and think of this quarter's OKRs
           | as opposed to the business' long term financial health. In
           | other word the leaders of leaders use standard MBA prescribed
           | incentive structures.
        
         | secondcoming wrote:
         | > 95% of implementations I see have edge cases where messages
         | might be lost or processed out of order.
         | 
         | Eek. This sort of thing can end up with innocent people in
         | jail, or dead.
         | 
         | [0] https://en.wikipedia.org/wiki/British_Post_Office_scandal
        
           | noprocrasted wrote:
           | The problem (or the solution, depending on which side you're
           | on) is that _innocent_ people are in jail or dead. The people
           | that _knowingly_ allowed this to happen are still free and
           | wealthy.
           | 
           | So I'm not particularly sure this is a good example - if
           | anything, it sets the opposite incentives, that even jailing
           | people or driving them to suicide won't actually have any
           | consequences for you.
        
         | noprocrasted wrote:
         | > there isn't an alignment of incentives that fixes this
         | problem
         | 
         | "Microservices" itself is often a symptom of this problem.
         | 
         | Everyone and their dog wants to introduce a network boundary in
         | between function calls for no good reason just so they can
         | subsequently have endless busywork writing HTTP (or gRPC if
         | you're lucky) servers, clients & JSON (de?)serializers for said
         | function calls and try to reimplement things like distributed
         | transactions across said network boundary and dealing with the
         | inevitable "spooky action at a distance" that this will yield.
        
           | sethammons wrote:
           | I've worked with microservices at scale and it was fantastic.
           | We couldn't break backwards compatibility with our API
           | without a lot of coordination. Outside of that, you could
           | deploy as frequently as needed and other services could
           | update as needed to make use of new features.
           | 
           | The monoliths I have worked in, very contrastingly, have had
           | issues coordinating changes within the codebases, code
           | crosses boundaries it should not and datastores get shared
           | and coupled to (what should be) different domains leading to
           | slow, inefficient code and ossified options for product
           | changes.
        
           | shepherdjerred wrote:
           | If you're hand-writing clients/servers/serializers instead of
           | generating them from schema definitions then you have more
           | fundamental issues than using microservices.
        
       | jmull wrote:
       | This overcomplicates things...
       | 
       | * If you have something like what the article calls a fencing
       | token, you don't need any locks.
       | 
       | * The token doesn't need to be monotonically increasing, just a
       | passive unique value that both the client and storage have.
       | 
       | Let's call it a version token. It could be monotonically
       | increasing, but a generated UUID, which is typically easier,
       | would work too. (Technically, it could even be a hash of all the
       | data in the store, though that's probably not practical.) The
       | logic becomes:
       | 
       | (1) client retrieves the current version token from storage,
       | along with any data it may want to modify. There's no external
       | lock, though the storage needs to retrieve the data and version
       | token atomically, ensuring the token is specifically for the
       | version of the data retrieved.
       | 
       | (2) client sends the version token back along with any changes.
       | 
       | (3) Storage accepts the changes if the current token matches the
       | one passed with the changes and creates a new version token
       | (atomically, but still no external locks).
       | 
       | Now, you can introduce locks for other reasons (hopefully goods
       | ones... they seem to be misused a lot). Just pointing out they
       | are/should be independent of storage integrity in a distributed
       | system.
       | 
       | (I don't even like the term lock, because they are
       | temporary/unguaranteed. Lease or reservation might be a term that
       | better conveys the meaning.)
        
         | wh0knows wrote:
         | This neglects the first reason listed in the article for why
         | you would use a lock.
         | 
         | > Efficiency: Taking a lock saves you from unnecessarily doing
         | the same work twice (e.g. some expensive computation). If the
         | lock fails and two nodes end up doing the same piece of work,
         | the result is a minor increase in cost (you end up paying 5
         | cents more to AWS than you otherwise would have) or a minor
         | inconvenience (e.g. a user ends up getting the same email
         | notification twice).
         | 
         | I think multiple nodes doing the same work is actually much
         | worse than what's listed, as it would inhibit you from having
         | any kind of scalable distributed processing.
        
           | karmakaze wrote:
           | As mentioned in the article, a non-100%-correct lock can be
           | used for efficiency purposes. So basically use an imperfect
           | locking mechanism for efficiency and a reliable one for
           | correctness.
        
             | jmull wrote:
             | > and a reliable one for correctness
             | 
             | To be clear, my point is don't use distributed locking for
             | correctness. There are much better options.
             | 
             | Now, the atomicity I mention implies some kind of internal
             | synchronization mechanism for multiple requests, which
             | could be based on locks, but those would be real, non-
             | distributed ones.
        
           | jmull wrote:
           | Sure, that's why I said you might introduce "locks"
           | (reservations is a much better term) for other reasons.
           | 
           | Efficiency is one, as you say.
           | 
           | The other main one that comes to mind is to implement other
           | "business rules" (hate that term, but that's what people
           | use), like for a online shopping app, the stock to fulfill an
           | order might be reserved for a time when the user starts the
           | checkout process.
        
         | karmakaze wrote:
         | This is known as 'optimistic locking'. But I wouldn't call it a
         | distributed locking mechanism.
        
           | jameshart wrote:
           | Optimistic locks are absolutely a distributed locking
           | mechanism, in that they are for coordinating activity among
           | distributed nodes - but they do require the storage node to
           | have strong guarantees about serialization and atomicity of
           | writes. That means it isn't a distributed storage solution,
           | but it is something you can build over the top of a
           | distributed storage solution that has strong read after write
           | guarantees.
        
             | karmakaze wrote:
             | I normally see it as a version column in a database where
             | it being with the data makes it non-distributed.
             | 
             | I'm not even sure how it could be used for exclusive update
             | to a resource elsewhere--all clients will think they 'have'
             | the lock and change the resource, then find out they didn't
             | when they update the lock. Or if they bump the lock first,
             | another client could immediately 'have' the lock too.
        
             | zeroxfe wrote:
             | This is unconventional use of the term "distributed
             | locking". This alternative just punts the hard part of
             | locking to the storage system.
        
         | bootsmann wrote:
         | Won't this lead to inconsistent states if you don't do
         | monotonically increasing tokens?
         | 
         | I.e. your storage system has two nodes and there are two read-
         | modify-write processes running. Process 1 acquires the first
         | token "abc" and process two also acquires the token "abc". Now
         | process 1 commits, the token is changed to "cde" and the change
         | streamed to node 2. Due to network delay, the change to node 2
         | is delayed. Meanwhile process 2 commits to node 2 with token
         | "abc". Node 2 accepts the change because it has not received
         | the message from node 1 and your system is now in an
         | inconsistent state.
         | 
         | Note that this cannot happen in a scenario where we have
         | monotonically increasing fencing tokens because that
         | requirement forces the nodes to agree on a total order of
         | operations before they can supply the fencing token.
        
           | computerfan494 wrote:
           | In the above description of optimistic locking, it is assumed
           | that it is impossible to issue the same token to multiple
           | clients. Nodes can agree that a given token has also never
           | been issued before just like a monotonically increasing
           | value. The nice property about non-monitonically-increasing
           | tokens is that nodes may generate them without coordinating
           | if you can make other assumptions about that system. A good
           | example is when nodes use an ID they were assigned beforehand
           | as part of the token generation, guaranteeing that the
           | leasing tokens they mint will not conflict with other nodes'
           | as long as node IDs are not reused.
        
             | bootsmann wrote:
             | I have a hard time wrapping my head around what you are
             | proposing here. Say client A requests data, they get the
             | token a-abc. Then client B requests data, they get the
             | token b-cde. Client A commits their write, does the storage
             | reject it because they already issued another token (the
             | one from client B) or does it accept it?
        
               | computerfan494 wrote:
               | My understanding of what the OP was discussing is an
               | optimistic locking system where the nodes only accept
               | commits if the last issued token matches the token
               | included in the commit. While agreeing on the last token
               | requested requires coordination, unlike monotonically
               | increasing tokens you could have well-behaved clients
               | generate token content themselves without coordination.
               | That may or may not be useful as a property.
        
               | bootsmann wrote:
               | Got it, thank you for clarifying this.
        
           | jmull wrote:
           | "node1", "node2", and "storage" are three separate things in
           | the distributed environment. Only storage accepts changes,
           | and it's what verifies the incoming token matches the current
           | token.
           | 
           | So node2 doesn't get to accept changes. It can only send
           | changes to storage, which may or may not be accepted by it.
        
             | bootsmann wrote:
             | If the storage is a singular entity then this is not a
             | distributed systems problem at all, no?
        
         | cnlwsu wrote:
         | You're describing compare and swap which is a good solution.
         | You're pushing complexity down to the database, and remember
         | this is distributed locking. When you have a single database
         | it's simple until the database crashes leaving you in state of
         | not knowing which of your CAS writes took effect. In major
         | systems that demand high availability and multi datacenter
         | backups this becomings pretty complicated with scenarios that
         | break this as well around node failure. Usually some form of
         | paxos transaction log is used. Never assume there is an easy
         | solution in distributed systems... it just always sucks
        
         | eru wrote:
         | Git push's `--force-with-lease` option does essentially this.
         | 
         | (Honestly, they should rename `--force-with-lease` to just
         | `--force`, and rename the old `--force` behaviour to `--force-
         | with-extreme-prejudice` or something like that. Basically make
         | the new behaviour the default `--force` behaviour.)
        
           | pyrolistical wrote:
           | `--force-unsafe`
        
         | zeroxfe wrote:
         | > This overcomplicates things...
         | 
         | You're misinterpreting the problem described, and proposing a
         | solution for a different problem.
        
       | antirez wrote:
       | I suggest reading the comment I left back then in this blog post
       | comments section, and the reply I wrote in my blog.
       | 
       | Btw, things to note in random order:
       | 
       | 1. Check my comment under this blog post. The author had missed a
       | _fundamental_ point in how the algorithm works. Then he based the
       | refusal of the algorithm on the remaining weaker points.
       | 
       | 2. It is not true that you can't wait an approximately correct
       | amount of time, with modern computers an APIs. GC pauses are
       | bound and monotonic clocks work. These are acceptable
       | assumptions.
       | 
       | 3. To critique the auto release mechanism in-se, because you
       | don't want to expose yourself to the fact that there is a
       | potential race, is one thing. To critique the algorithm in front
       | of its goals and its system model is another thing.
       | 
       | 4. Over the years Redlock was used in a huge amount of use cases
       | with success, because if you pick a timeout which is much larger
       | than: A) the time to complete the task. B) the random pauses you
       | can have in normal operating systems. Race conditions are very
       | hard to trigger, and the other failures in the article were,
       | AFAIK, never been observed. Of course if you have a super small
       | timeout to auto release the lock, and the task may easily take
       | this amount of time, you just committed a deisgn error, but
       | that's not about Redlock.
        
         | bluepizza wrote:
         | Could you provide links?
        
           | saikatsg wrote:
           | http://antirez.com/news/101
        
         | computerfan494 wrote:
         | To be honest I've long been puzzled by your response blog post.
         | Maybe the following question can help achieve common ground:
         | 
         | Would you use RedLock in a situation where the timeout is
         | fairly short (1-2 seconds maybe), the work done usually takes
         | ~90% of that timeout, and the work you do while holding a
         | RedLock lock MUST NOT be done concurrently with another lock
         | holder?
         | 
         | I think the correct answer here is always "No" because the risk
         | of the lease sometimes expiring before the client has finished
         | its work is very high. You must alter your work to be
         | idempotent because RedLock cannot guarantee mutual exclusion
         | under all circumstances. Optimistic locking is a good way to
         | implement this type of thing while the work done is idempotent.
        
           | kgeist wrote:
           | >because the risk of the lease sometimes expiring before the
           | client has finished its work is very high
           | 
           | We had corrupted data bacause of this.
        
           | antirez wrote:
           | The timeout must be much larger than the time required to do
           | the work. The point is that distributed locks without a
           | release mechanism are in practical terms very problematic.
           | 
           | Btw, things to note in random order:
           | 
           | 1. Check my comment under this blog post. The author had
           | missed a _fundamental_ point in how the algorithm works. Then
           | he based the refusal of the algorithm on the remaining weaker
           | points.
           | 
           | 2. It is not true that you can't wait an approximately
           | correct amount of time, with modern computers an APIs. GC
           | pauses are bound and monotonic clocks work. These are
           | acceptable assumptions.
           | 
           | 3. To critique the auto release mechanism in-se, because you
           | don't want to expose yourself to the fact that there is a
           | potential race, is one thing. To critique the algorithm in
           | front of its goals and its system model is another thing.
           | 
           | 4. Over the years Redlock was used in a huge amount of use
           | cases with success, because if you pick a timeout which is
           | much larger than: A) the time to complete the task. B) the
           | random pauses you can have in normal operating systems. Race
           | conditions are very hard to trigger, and the other failures
           | in the article were, AFAIK, never been observed. Of course if
           | you have a super small timeout to auto release the lock, and
           | the task may easily take this amount of time, you just
           | committed a deisgn error, but that's not about Redlock.
        
             | computerfan494 wrote:
             | Locking without a timeout is indeed in the majority of use-
             | cases a non-starter, we are agreed there.
             | 
             | The critical point that users must understand is that it is
             | _impossible_ to guarantee that the RedLock client never
             | holds its lease longer than the timeout. Compounding this
             | problem is that the longer you make your timeout to
             | minimize the likelihood of this from accidentally
             | happening, the less responsive your system becomes during
             | genuine client misbehaviour.
        
               | antirez wrote:
               | In most real world scenarios, the tradeoffs are a bit
               | softer than what people in the formal world dictates (and
               | doing so they forced certain systems to become suboptimal
               | for everything but during failures, kicking them out of
               | business...). Few examples:
               | 
               | 1. E-commerce system where there are a limited amount of
               | items of the same kind, you don't want to oversell.
               | 
               | 2. Hotel booking system where we don't want to reserve
               | the same dates/rooms multiple times.
               | 
               | 3. Online medical appointments system.
               | 
               | In all those systems, to re-open the item/date/... after
               | some time it's ok, even after one day. And if the lock
               | hold time is not too big, but a very strict compromise
               | (it's also a reasonable choice in the spectrum), and it
               | could happen that during edge case failures three items
               | are sold and there are two, orders can be cancelled.
               | 
               | So yes, there is a tension between timeout, race
               | condition, recovery time, but in many systems using
               | something like RedLock the development and end-user
               | experience can be both improved with a high rate of
               | success, and the random unhappy event can be handled. Now
               | the algorithm is very old, still used by many
               | implementations, and as we are talking problems are
               | solved in a straightforward way with very good
               | performances. Of course, the developers of the solution
               | should be aware that there are tradeoffs between certain
               | values: but when are distributed systems easy?
               | 
               | P.S. why 10 years of strong usage count, in the face of a
               | blog post telling that you can't trust a system like
               | that? Because even if DS issues emerge randomly and
               | sporadically, in the long run systems that create real-
               | world issues, if they reach mass usage, are known. A big
               | enough user base is a continuous integration test big
               | enough to detect when a solution has real world serious
               | issues. So of course RedLock users picking short timeouts
               | with tasks that take a very hard to predict amount of
               | time, will indeed incur into _knonw_ issues. But the
               | other systemic failure modes described in the blog post
               | are never mentioned by users AFAIK.
        
               | computerfan494 wrote:
               | I feel like you're dancing around admitting the core
               | issue that Martin points out - RedLock is _not_ suitable
               | for systems where correctness is paramount. It can get
               | close, but it is not robust in all cases.
               | 
               | If you want to say "RedLock is correct a very high
               | percentage of the time when lease timeouts are tuned for
               | the workload", I would agree with you actually. I even
               | possibly agree with the statements "most systems can
               | tolerate unlikely correctness failures due to RedLock
               | lease violations. Manual intervention is fine in those
               | cases. RedLock may allow fast iteration times and is
               | worth this cost". I just think it's important to be
               | crystal clear on the guarantees RedLock provides.
               | 
               | I first read Martin's blog post and your response years
               | ago when I worked at a company that was using RedLock
               | despite it not being an appropriate tool. We had an
               | outage caused by overlapping leases because the original
               | implementor of the system didn't understand what Martin
               | has pointed out from the RedLock documentation alone.
               | 
               | I've been a happy Redis user and fan of your work outside
               | of this poor experience with RedLock, by the way. I
               | greatly appreciate the hard work that has gone into
               | making it a fantastic database.
        
       | eknkc wrote:
       | I tend to use postgresql for distributed locking. As in, even if
       | the job is not db related, I start a transaction and obtain an
       | advisory lock which stays locked until the transaction is
       | released. Either by the app itself or due to a crash or
       | something.
       | 
       | Felt pretty safe about it so far but I just realised I never
       | check if the db connection is still ok. If this is a db related
       | job and I need to touch the db, fine. Some query will fail on the
       | connection and my job will fail anyway. Otherwise I might have
       | already lost the lock and not aware of it.
       | 
       | Without fencing tokens, atomic ops and such, I guess one needs a
       | two stage commit on everything for absolute correctness?
        
         | candiddevmike wrote:
         | One gotcha maybe with locks is they are connection specific
         | AFAIK, and in most libraries you're using a pool typically. So
         | you need to have a specific connection for locks, and ensure
         | you're using that connection when doing periodic lock tests.
        
       | jroseattle wrote:
       | We reviewed Redis back in 2018 as a potential solution for our
       | use case. In the end, we opted for a less sexy solution (not
       | Redis) that never failed us, no joke.
       | 
       | Our use case: handing out a ticket (something with an identifier)
       | from a finite set of tickets from a campaign. It's something akin
       | to Ticketmaster allocating seats in a venue for a concert. Our
       | operation was as you might expect: provide a ticket to a request
       | if one is available, assign some metadata from the request to the
       | allocated ticket, and remove it from consideration for future
       | client requests.
       | 
       | We had failed campaigns in the past (over-allocation, under-
       | allocation, duplicate allocation, etc.) so our concern was
       | accuracy. Clients would connect and request a ticket; we wanted
       | to exclusively distribute only the set of tickets available from
       | the pool. If the number of client requests exceeded the number of
       | tickets, the system should protect for that.
       | 
       | We tried Redis, including the naive implementation of getting the
       | lock, checking the lock, doing our thing, releasing the lock. It
       | was ok, but administrative overhead was a lot for us at the time.
       | I'm glad we didn't go that route, though.
       | 
       | We ultimately settled on...Postgres. Our "distributed lock" was
       | just a composite UPDATE statement using some Postgres-specific
       | features. We effectively turned requests into a SET operation,
       | where the database would return either a record that indicated
       | the request was successful, or something that indicated it
       | failed. ACID transactions for the win!
       | 
       | With accuracy solved, we next looked at scale/performance. We
       | didn't need to support millions of requests/sec, but we did have
       | some spikiness thresholds. We were able to optimize read/write db
       | instances within our cluster, and strategically load
       | larger/higher-demand campaigns to allocated systems. We continued
       | to improve on optimization over two years, but not once did we
       | ever have a campaign with ticket distribution failures.
       | 
       | Note: I am not an expert of any kind in distributed-lock
       | technology. I'm just someone who did their homework, focused on
       | the problem to be solved, and found a solution after trying a few
       | things.
        
         | wwarner wrote:
         | This is the best way, and actually the only sensible way to
         | approach the problem. I first read about it here
         | https://code.flickr.net/2010/02/08/ticket-servers-distribute...
        
           | hansvm wrote:
           | > only sensible way
           | 
           | That's a bit strong. Like most of engineering, it depends.
           | Postgres is a good solution if you only have maybe 100k QPS,
           | the locks are logically (if not necessarily fully physically)
           | partially independent, and they aren't held for long. Break
           | any of those constraints, or add anything weird (inefficient
           | postgres clients, high DB load, ...), and you start having to
           | explore either removing those seeming constraints or using
           | other solutions.
        
             | wwarner wrote:
             | Ok fair; I'm not really talking about postgres (the link i
             | shared uses mysql). I'm saying that creating a ticket
             | server that just issues and persists unique tokens, is a
             | way to provide coordination between loosely coupled
             | applications.
        
               | zbobet2012 wrote:
               | Yeah that's cookies. They are great.
        
         | nh2 wrote:
         | You are right that anything that needs up to 50000 atomic,
         | short-lived transactions per second can just use Postgres.
         | 
         | Your UPDATE transaction lasts just a few microseconds, so you
         | can just centralise the problem and that's good because it's
         | simpler, faster and safer.
         | 
         | But this is not a _distributed_ problem, as the article
         | explains:
         | 
         | > remember that a lock in a distributed system is not like a
         | mutex in a multi-threaded application. It's a more complicated
         | beast, due to the problem that different nodes and the network
         | can all fail independently in various ways
         | 
         | You need distributed locking if the transactions can take
         | seconds or hours, and the machines involved can fail while they
         | hold the lock.
        
           | fny wrote:
           | You could just have multiple clients attempt to update a row
           | that defines the lock. Postgres transactions have no limit
           | and will unwind on client failure. Since connections are
           | persistent, there's no need to play a game to determine the
           | state of a client.
        
             | nh2 wrote:
             | Your scenario still uses a centralised single postgres
             | server. Failure of that server takes down the whole locking
             | functionality. That's not what people usually mean by
             | "distributed".
             | 
             | "the machines involved can fail" must also include the
             | postgres machines.
             | 
             | To get that, you need to coordinate multiple postgres
             | servers, e.g. using ... distributed locking. Postgres does
             | not provide that out of the box -- neither multi-master
             | setups, nor master-standby synchronous replication with
             | automatic failover. Wrapper software that provides that,
             | such as Stolon and Patroni, use distributed KV stores /
             | lock managers such as etcd and Consul to provide it.
        
         | stickfigure wrote:
         | I think this illustrates something important, which is that:
         | You don't need _locking_. You need _< some high-level business
         | constraint that might or might not require some form of
         | locking>_.
         | 
         | In your case, the constraint is "don't sell more than N
         | tickets". For most realistic traffic volumes for that kind of
         | problem, you can solve it with traditional rdbms transactional
         | behavior and let it manage whatever locking it uses internally.
         | 
         | I wish developers were a lot slower to reach for "I'll build
         | distributed locks". There's almost always a better answer, but
         | it's specific to each application.
        
         | nasretdinov wrote:
         | So basically your answer (and the correct answer most of the
         | time) was that you don't really need distributed locks even if
         | you think you do :)
        
           | tonyarkles wrote:
           | Heh, in my local developer community I have a bit of a
           | reputation for being "the guy" to talk to about distributed
           | systems. I'd done a bunch of work in the early days of the
           | horizontal-scaling movement (vs just buying bigger servers)
           | and did an M.Sc focused on distributed systems performance.
           | 
           | Whenever anyone would come and ask for help with a planned
           | distributed system the first question I would always ask is:
           | does this system actually need to be distributed?! In my 15
           | years of consulting I think the answer was only actually
           | "yes" 2 or 3 times. Much more often than was helping them
           | solve the performance problems in their single server system;
           | without doing that they would usually just have ended up with
           | a slow complex distributed system.
           | 
           | Edit: lol this paper was not popular in the Distributed
           | Systems Group at my school: https://www.usenix.org/system/fil
           | es/conference/hotos15/hotos...
           | 
           | "You can have a second computer once you've shown you know
           | how to use the first one."
        
             | Agingcoder wrote:
             | I wanted to post the same paper. With Adrian Colyer's
             | explanations:
             | https://blog.acolyer.org/2015/06/05/scalability-but-at-
             | what-...
        
         | apwell23 wrote:
         | Classic tech interview question
        
         | etcd wrote:
         | I guess this is embarassingly parralelizable in that you can
         | shard by concert to different instances. Might even be a job
         | for that newfangled cloudflare sqlite thing.
        
         | OnlyMortal wrote:
         | Interesting. We went through a similar process and ended up
         | with Yugabyte to deal with the locks (cluster).
         | 
         | It's based on Postgres but performance was not good enough.
         | 
         | We're now moving to RDMA.
        
       | hoppp wrote:
       | I did distributed locking with Deno, and Deno KV hosted by Deno
       | Deploy.
       | 
       | Its using foundationdb, a distributed db. The deno instances
       | running on local devices all connect to the same Deno KV to
       | acquire the lock.
       | 
       | But using postgres, a select for update also works, the database
       | is not distributed tho.
        
       | egcodes wrote:
       | Once I wrote a dist. lock blog using this resource. Here it is:
       | https://medium.com/sahibinden-technology/an-easy-integration...
        
       | anonzzzies wrote:
       | I am updating my low level and algo knowledge; what are good
       | books about this (I have the one written by the author). I am
       | looking to build something for fun, but everything is either a
       | toy or very complicated.
        
         | cosmicradiance wrote:
         | System Design Interview I and II - Alex Xu. Take one of the
         | topics and do it practically.
        
       | dataflow wrote:
       | > The lock has a timeout (i.e. it is a lease), which is always a
       | good idea (otherwise a crashed client could end up holding a lock
       | forever and never releasing it). However, if the GC pause lasts
       | longer than the lease expiry period, and the client doesn't
       | realise that it has expired, it may go ahead and make some unsafe
       | change.
       | 
       | Hold on, this sounds absurd to me:
       | 
       | First, if your client _crashes_ , then you don't need a timed
       | lease on the lock to detect this in the first place. The lock
       | would get released by the OS or supervisor, whether there are any
       | timeouts or not. If both of _those_ crash too, then the
       | connection would eventually break, and the network system should
       | then detect that (via network resets or timeouts, lack of
       | heartbeats, etc.) and then invalidate all your connections before
       | releasing any locks.
       | 
       | Second, if the problem becomes that your client is _buggy_ and
       | thus holds the lock _too long_ without crashing, then shouldn 't
       | some kind of supervisor detect that and then kill the client
       | (e.g., by the OS terminating the process) before releasing the
       | lock for everybody else?
       | 
       | Third, if you _are_ going to have locks with timeouts to deal
       | with corner cases you can 't handle like the above, shouldn't
       | they notify the actual program somehow (e.g., by throwing an
       | exception, raising a signal, terminating it, etc.) instead of
       | letting it happily continue execution? And shouldn't those cases
       | wait for some kind of verification that the program was notified
       | before releasing the lock?
       | 
       | The whole notion that timeouts should somehow permit the program
       | execution to continue ordinary control flow sounds like the root
       | cause of the problem, and nobody is even batting an eye at it? Is
       | there an obvious reason why this makes sense? I feel I must be
       | missing something here... what am I missing?
        
         | neonbrain wrote:
         | The assumption that your server will always receive RST or FIN
         | from your client is incorrect. There are some cases when these
         | packets are being dropped, and your server will stay with an
         | open connection while the client on the remote machine is
         | already dead. P.S. BTW, it's not me who downvoted you
        
           | dataflow wrote:
           | I made no such assumption this will always happen though?
           | That's why the comment was so much longer than just "isn't
           | TCP RST enough?"... I listed a ton of ways to deal with this
           | that didn't involve letting the program continue happily on
           | its path.
        
             | neonbrain wrote:
             | Sorry didn't see your message. What I mean is that if you
             | are not getting RST/FIN or any other indication for your
             | closed communication channel, you only left to the
             | mechanism of timeouts to recognize a partitioned/dead/slow
             | worker client. Basically, you've mentioned them yourself
             | ("timeouts, lack of heartbeats, etc" in your post are all
             | forms of timeouts). So you can piggyback on these timeouts
             | or use a smaller timeout configured in the lease, whatever
             | suits your purpose, I guess. This is what I believe
             | Kleppmann referring here to. He's just being generic in his
             | description.
        
               | dataflow wrote:
               | > What I mean is that if you are not getting RST/FIN or
               | any other indication for your closed communication
               | channel, you only left to the mechanism of timeouts to
               | recognize a partitioned/dead/slow worker client.
               | 
               | Timeouts were a red herring in my comment. My problem
               | wasn't with the mere existence of timeouts in corner
               | cases, it was the fact that the worker is assumed to keep
               | working merrily on, despite the timeouts. That's what I
               | don't understand the justification for. If the worker is
               | dead, then it's a non-issue, and the lease can be broken.
               | If the system is alive, the host can discover (via RST,
               | heartbeats, or other timeouts) that the storage system is
               | unreachable, and thus prevent the program from continuing
               | execution -- and at that point the storage service can
               | still break the lease (via a timeout), but it would
               | actually come with a timing-based guarantee that the
               | program will no longer continue execution.
        
         | winwang wrote:
         | This isn't a mutex, but the distributed equivalent of one. The
         | storage service is the one who invalidates the lock on their
         | side. The client won't detect its own issues without additional
         | guarantees not given (supposedly) by Redlock.
        
           | neonbrain wrote:
           | My understanding is that the dataflow user was talking about
           | a notification which the server is supposed to receive from
           | the OS in the case of a broken client connection. This
           | notification is usually received, but cannot be guaranteed in
           | a distributed environment.
        
           | dataflow wrote:
           | I understand that. What I'm hung up on is, why does the
           | storage system feel it is at liberty to just invalidate a
           | lock and thus let someone else reacquire it without _any_
           | sort of acknowledgment (either from the owner or from the
           | communication systems connecting the owner to the outside
           | world) that the owner will no longer rely on it? It just
           | seems fundamentally wrong. The lock service just... doesn 't
           | have that liberty, as I see it.
        
             | winwang wrote:
             | What if the rack goes down? But I think the author is
             | saying a similar thing to you. The fenced token is
             | essentially asserting that the client will no longer rely
             | on the lock, even if it tries to. The difference is the
             | service doesn't need any acknowledgement, no permission
             | needed to simlly deny the client later.
        
               | dataflow wrote:
               | To be clear, my objection is to the premise, not to the
               | offered solution.
               | 
               | To your question, could you clarify what exactly you mean
               | by the rack "going down"? This encompasses a lot of
               | different scenarios, I'm not sure which one you're asking
               | about. The obvious interpretation would break all the
               | connections the program has to the outside world, thus
               | preventing the problem by construction.
        
               | winwang wrote:
               | The rack could go down from the point of view of the
               | storage service, but the machine/VM itself could be
               | perfectly fine.
        
               | dataflow wrote:
               | In that scenario the machine would become aware that it
               | can't reach the storage service either, no? In which case
               | the host can terminate the program, or the network can
               | break all the connections between them, or whatever. By
               | default I would think that the lease shouldn't be broken
               | until the network partition gets resolved, but I think
               | the storage system _could_ have a timeout for breaking
               | the lease in that scenario if you really want, but then
               | it would come with a time-based guarantee that the
               | program isn 't running anymore, no?
        
               | winwang wrote:
               | Everything you're saying is plausibly possible in the
               | absurdly large search space of all possible scenarios.
               | The author's premise, however, is rooted in the specific
               | scenario they lay out, with historical supporting
               | examples which you can look into. Even then, the premise
               | before all that was essentially: Redlock does not do what
               | people might expect of a distributed lock. Btw I do have
               | responses to your questions, but often times in these
               | sorts of discussions, I find that there can always be an
               | objection to an objection to ... etc. The "sense" (or
               | flavor) in this case is that "we are taking a complex
               | topic too lightly". In fact, I should probably continue
               | reading the author's book (DDIA) at some point...
        
               | wbl wrote:
               | The process that owns the lock is never heard from again.
        
       | jojolatulipe wrote:
       | At work we use Temporal and ended up using a dedicated workflow
       | and signals to do distributed locking. Working well so far and
       | the implementation is rather simple, relying on Temporal's
       | facilities to do the distributed parts of the lock.
        
         | robertlagrant wrote:
         | I'm keen to use Temporal, but I've heard it can be flaky. In
         | your experience has it worked well?
        
           | calmoo wrote:
           | Rock solid in my experience and kind of a game changer. I'm
           | surprised it's not more widespread in large orgs.
        
             | Icathian wrote:
             | We use it a ton at my shop for internal things like release
             | rollouts. Fairly big tech company, and same experience.
             | It's an excellent product.
        
       ___________________________________________________________________
       (page generated 2024-10-20 23:00 UTC)