[HN Gopher] How to do distributed locking (2016)
___________________________________________________________________
How to do distributed locking (2016)
Author : yusufaytas
Score : 160 points
Date : 2024-10-20 10:38 UTC (12 hours ago)
(HTM) web link (martin.kleppmann.com)
(TXT) w3m dump (martin.kleppmann.com)
| galeaspablo wrote:
| Many engineers don't truly care about the correctness issue,
| until it's too late. Similar to security.
|
| Or they care but don't bother checking whether what they're doing
| is correct.
|
| For example, in my field, where microservices/actors/processes
| pass messages between each other over a network, I dare say >95%
| of implementations I see have edge cases where messages might be
| lost or processed out of order.
|
| But there isn't an alignment of incentives that fixes this
| problem. Ie the payment structures for executives and engineers
| aren't aligned with the best outcome for customers and
| shareholders.
| mrkeen wrote:
| I think there's a bit of an alignment of incentives: the edge
| cases are tricky enough that your programmers probably need to
| handle a lot of support tickets, which isn't good for anyone.
|
| But I don't see anyway to convince yesterday's managers to give
| us time to build it right.
| sethammons wrote:
| The path to fixing this requires first measuring and monitoring
| it, then establishing service level objectives that represent
| customer experience. Product and engineering teams have to
| agree on them. If the SLOs become violated, focus shifts
| towards system stability.
|
| Getting everyone onboard is hard and that is why good
| leadership is needed. When customers start to churn because
| bugs pop up and new features are slow or non existent, then the
| case is very easy to make quality part of the process. Mature
| leaders get ahead of that as early as possible.
| galeaspablo wrote:
| Good leadership is spot on! Agreed. The cynic part of me sees
| incentives that discourage mature leadership styles.
|
| Leaders tend to be impatient and think of this quarter's OKRs
| as opposed to the business' long term financial health. In
| other word the leaders of leaders use standard MBA prescribed
| incentive structures.
| secondcoming wrote:
| > 95% of implementations I see have edge cases where messages
| might be lost or processed out of order.
|
| Eek. This sort of thing can end up with innocent people in
| jail, or dead.
|
| [0] https://en.wikipedia.org/wiki/British_Post_Office_scandal
| noprocrasted wrote:
| The problem (or the solution, depending on which side you're
| on) is that _innocent_ people are in jail or dead. The people
| that _knowingly_ allowed this to happen are still free and
| wealthy.
|
| So I'm not particularly sure this is a good example - if
| anything, it sets the opposite incentives, that even jailing
| people or driving them to suicide won't actually have any
| consequences for you.
| noprocrasted wrote:
| > there isn't an alignment of incentives that fixes this
| problem
|
| "Microservices" itself is often a symptom of this problem.
|
| Everyone and their dog wants to introduce a network boundary in
| between function calls for no good reason just so they can
| subsequently have endless busywork writing HTTP (or gRPC if
| you're lucky) servers, clients & JSON (de?)serializers for said
| function calls and try to reimplement things like distributed
| transactions across said network boundary and dealing with the
| inevitable "spooky action at a distance" that this will yield.
| sethammons wrote:
| I've worked with microservices at scale and it was fantastic.
| We couldn't break backwards compatibility with our API
| without a lot of coordination. Outside of that, you could
| deploy as frequently as needed and other services could
| update as needed to make use of new features.
|
| The monoliths I have worked in, very contrastingly, have had
| issues coordinating changes within the codebases, code
| crosses boundaries it should not and datastores get shared
| and coupled to (what should be) different domains leading to
| slow, inefficient code and ossified options for product
| changes.
| shepherdjerred wrote:
| If you're hand-writing clients/servers/serializers instead of
| generating them from schema definitions then you have more
| fundamental issues than using microservices.
| jmull wrote:
| This overcomplicates things...
|
| * If you have something like what the article calls a fencing
| token, you don't need any locks.
|
| * The token doesn't need to be monotonically increasing, just a
| passive unique value that both the client and storage have.
|
| Let's call it a version token. It could be monotonically
| increasing, but a generated UUID, which is typically easier,
| would work too. (Technically, it could even be a hash of all the
| data in the store, though that's probably not practical.) The
| logic becomes:
|
| (1) client retrieves the current version token from storage,
| along with any data it may want to modify. There's no external
| lock, though the storage needs to retrieve the data and version
| token atomically, ensuring the token is specifically for the
| version of the data retrieved.
|
| (2) client sends the version token back along with any changes.
|
| (3) Storage accepts the changes if the current token matches the
| one passed with the changes and creates a new version token
| (atomically, but still no external locks).
|
| Now, you can introduce locks for other reasons (hopefully goods
| ones... they seem to be misused a lot). Just pointing out they
| are/should be independent of storage integrity in a distributed
| system.
|
| (I don't even like the term lock, because they are
| temporary/unguaranteed. Lease or reservation might be a term that
| better conveys the meaning.)
| wh0knows wrote:
| This neglects the first reason listed in the article for why
| you would use a lock.
|
| > Efficiency: Taking a lock saves you from unnecessarily doing
| the same work twice (e.g. some expensive computation). If the
| lock fails and two nodes end up doing the same piece of work,
| the result is a minor increase in cost (you end up paying 5
| cents more to AWS than you otherwise would have) or a minor
| inconvenience (e.g. a user ends up getting the same email
| notification twice).
|
| I think multiple nodes doing the same work is actually much
| worse than what's listed, as it would inhibit you from having
| any kind of scalable distributed processing.
| karmakaze wrote:
| As mentioned in the article, a non-100%-correct lock can be
| used for efficiency purposes. So basically use an imperfect
| locking mechanism for efficiency and a reliable one for
| correctness.
| jmull wrote:
| > and a reliable one for correctness
|
| To be clear, my point is don't use distributed locking for
| correctness. There are much better options.
|
| Now, the atomicity I mention implies some kind of internal
| synchronization mechanism for multiple requests, which
| could be based on locks, but those would be real, non-
| distributed ones.
| jmull wrote:
| Sure, that's why I said you might introduce "locks"
| (reservations is a much better term) for other reasons.
|
| Efficiency is one, as you say.
|
| The other main one that comes to mind is to implement other
| "business rules" (hate that term, but that's what people
| use), like for a online shopping app, the stock to fulfill an
| order might be reserved for a time when the user starts the
| checkout process.
| karmakaze wrote:
| This is known as 'optimistic locking'. But I wouldn't call it a
| distributed locking mechanism.
| jameshart wrote:
| Optimistic locks are absolutely a distributed locking
| mechanism, in that they are for coordinating activity among
| distributed nodes - but they do require the storage node to
| have strong guarantees about serialization and atomicity of
| writes. That means it isn't a distributed storage solution,
| but it is something you can build over the top of a
| distributed storage solution that has strong read after write
| guarantees.
| karmakaze wrote:
| I normally see it as a version column in a database where
| it being with the data makes it non-distributed.
|
| I'm not even sure how it could be used for exclusive update
| to a resource elsewhere--all clients will think they 'have'
| the lock and change the resource, then find out they didn't
| when they update the lock. Or if they bump the lock first,
| another client could immediately 'have' the lock too.
| zeroxfe wrote:
| This is unconventional use of the term "distributed
| locking". This alternative just punts the hard part of
| locking to the storage system.
| bootsmann wrote:
| Won't this lead to inconsistent states if you don't do
| monotonically increasing tokens?
|
| I.e. your storage system has two nodes and there are two read-
| modify-write processes running. Process 1 acquires the first
| token "abc" and process two also acquires the token "abc". Now
| process 1 commits, the token is changed to "cde" and the change
| streamed to node 2. Due to network delay, the change to node 2
| is delayed. Meanwhile process 2 commits to node 2 with token
| "abc". Node 2 accepts the change because it has not received
| the message from node 1 and your system is now in an
| inconsistent state.
|
| Note that this cannot happen in a scenario where we have
| monotonically increasing fencing tokens because that
| requirement forces the nodes to agree on a total order of
| operations before they can supply the fencing token.
| computerfan494 wrote:
| In the above description of optimistic locking, it is assumed
| that it is impossible to issue the same token to multiple
| clients. Nodes can agree that a given token has also never
| been issued before just like a monotonically increasing
| value. The nice property about non-monitonically-increasing
| tokens is that nodes may generate them without coordinating
| if you can make other assumptions about that system. A good
| example is when nodes use an ID they were assigned beforehand
| as part of the token generation, guaranteeing that the
| leasing tokens they mint will not conflict with other nodes'
| as long as node IDs are not reused.
| bootsmann wrote:
| I have a hard time wrapping my head around what you are
| proposing here. Say client A requests data, they get the
| token a-abc. Then client B requests data, they get the
| token b-cde. Client A commits their write, does the storage
| reject it because they already issued another token (the
| one from client B) or does it accept it?
| computerfan494 wrote:
| My understanding of what the OP was discussing is an
| optimistic locking system where the nodes only accept
| commits if the last issued token matches the token
| included in the commit. While agreeing on the last token
| requested requires coordination, unlike monotonically
| increasing tokens you could have well-behaved clients
| generate token content themselves without coordination.
| That may or may not be useful as a property.
| bootsmann wrote:
| Got it, thank you for clarifying this.
| jmull wrote:
| "node1", "node2", and "storage" are three separate things in
| the distributed environment. Only storage accepts changes,
| and it's what verifies the incoming token matches the current
| token.
|
| So node2 doesn't get to accept changes. It can only send
| changes to storage, which may or may not be accepted by it.
| bootsmann wrote:
| If the storage is a singular entity then this is not a
| distributed systems problem at all, no?
| cnlwsu wrote:
| You're describing compare and swap which is a good solution.
| You're pushing complexity down to the database, and remember
| this is distributed locking. When you have a single database
| it's simple until the database crashes leaving you in state of
| not knowing which of your CAS writes took effect. In major
| systems that demand high availability and multi datacenter
| backups this becomings pretty complicated with scenarios that
| break this as well around node failure. Usually some form of
| paxos transaction log is used. Never assume there is an easy
| solution in distributed systems... it just always sucks
| eru wrote:
| Git push's `--force-with-lease` option does essentially this.
|
| (Honestly, they should rename `--force-with-lease` to just
| `--force`, and rename the old `--force` behaviour to `--force-
| with-extreme-prejudice` or something like that. Basically make
| the new behaviour the default `--force` behaviour.)
| pyrolistical wrote:
| `--force-unsafe`
| zeroxfe wrote:
| > This overcomplicates things...
|
| You're misinterpreting the problem described, and proposing a
| solution for a different problem.
| antirez wrote:
| I suggest reading the comment I left back then in this blog post
| comments section, and the reply I wrote in my blog.
|
| Btw, things to note in random order:
|
| 1. Check my comment under this blog post. The author had missed a
| _fundamental_ point in how the algorithm works. Then he based the
| refusal of the algorithm on the remaining weaker points.
|
| 2. It is not true that you can't wait an approximately correct
| amount of time, with modern computers an APIs. GC pauses are
| bound and monotonic clocks work. These are acceptable
| assumptions.
|
| 3. To critique the auto release mechanism in-se, because you
| don't want to expose yourself to the fact that there is a
| potential race, is one thing. To critique the algorithm in front
| of its goals and its system model is another thing.
|
| 4. Over the years Redlock was used in a huge amount of use cases
| with success, because if you pick a timeout which is much larger
| than: A) the time to complete the task. B) the random pauses you
| can have in normal operating systems. Race conditions are very
| hard to trigger, and the other failures in the article were,
| AFAIK, never been observed. Of course if you have a super small
| timeout to auto release the lock, and the task may easily take
| this amount of time, you just committed a deisgn error, but
| that's not about Redlock.
| bluepizza wrote:
| Could you provide links?
| saikatsg wrote:
| http://antirez.com/news/101
| computerfan494 wrote:
| To be honest I've long been puzzled by your response blog post.
| Maybe the following question can help achieve common ground:
|
| Would you use RedLock in a situation where the timeout is
| fairly short (1-2 seconds maybe), the work done usually takes
| ~90% of that timeout, and the work you do while holding a
| RedLock lock MUST NOT be done concurrently with another lock
| holder?
|
| I think the correct answer here is always "No" because the risk
| of the lease sometimes expiring before the client has finished
| its work is very high. You must alter your work to be
| idempotent because RedLock cannot guarantee mutual exclusion
| under all circumstances. Optimistic locking is a good way to
| implement this type of thing while the work done is idempotent.
| kgeist wrote:
| >because the risk of the lease sometimes expiring before the
| client has finished its work is very high
|
| We had corrupted data bacause of this.
| antirez wrote:
| The timeout must be much larger than the time required to do
| the work. The point is that distributed locks without a
| release mechanism are in practical terms very problematic.
|
| Btw, things to note in random order:
|
| 1. Check my comment under this blog post. The author had
| missed a _fundamental_ point in how the algorithm works. Then
| he based the refusal of the algorithm on the remaining weaker
| points.
|
| 2. It is not true that you can't wait an approximately
| correct amount of time, with modern computers an APIs. GC
| pauses are bound and monotonic clocks work. These are
| acceptable assumptions.
|
| 3. To critique the auto release mechanism in-se, because you
| don't want to expose yourself to the fact that there is a
| potential race, is one thing. To critique the algorithm in
| front of its goals and its system model is another thing.
|
| 4. Over the years Redlock was used in a huge amount of use
| cases with success, because if you pick a timeout which is
| much larger than: A) the time to complete the task. B) the
| random pauses you can have in normal operating systems. Race
| conditions are very hard to trigger, and the other failures
| in the article were, AFAIK, never been observed. Of course if
| you have a super small timeout to auto release the lock, and
| the task may easily take this amount of time, you just
| committed a deisgn error, but that's not about Redlock.
| computerfan494 wrote:
| Locking without a timeout is indeed in the majority of use-
| cases a non-starter, we are agreed there.
|
| The critical point that users must understand is that it is
| _impossible_ to guarantee that the RedLock client never
| holds its lease longer than the timeout. Compounding this
| problem is that the longer you make your timeout to
| minimize the likelihood of this from accidentally
| happening, the less responsive your system becomes during
| genuine client misbehaviour.
| antirez wrote:
| In most real world scenarios, the tradeoffs are a bit
| softer than what people in the formal world dictates (and
| doing so they forced certain systems to become suboptimal
| for everything but during failures, kicking them out of
| business...). Few examples:
|
| 1. E-commerce system where there are a limited amount of
| items of the same kind, you don't want to oversell.
|
| 2. Hotel booking system where we don't want to reserve
| the same dates/rooms multiple times.
|
| 3. Online medical appointments system.
|
| In all those systems, to re-open the item/date/... after
| some time it's ok, even after one day. And if the lock
| hold time is not too big, but a very strict compromise
| (it's also a reasonable choice in the spectrum), and it
| could happen that during edge case failures three items
| are sold and there are two, orders can be cancelled.
|
| So yes, there is a tension between timeout, race
| condition, recovery time, but in many systems using
| something like RedLock the development and end-user
| experience can be both improved with a high rate of
| success, and the random unhappy event can be handled. Now
| the algorithm is very old, still used by many
| implementations, and as we are talking problems are
| solved in a straightforward way with very good
| performances. Of course, the developers of the solution
| should be aware that there are tradeoffs between certain
| values: but when are distributed systems easy?
|
| P.S. why 10 years of strong usage count, in the face of a
| blog post telling that you can't trust a system like
| that? Because even if DS issues emerge randomly and
| sporadically, in the long run systems that create real-
| world issues, if they reach mass usage, are known. A big
| enough user base is a continuous integration test big
| enough to detect when a solution has real world serious
| issues. So of course RedLock users picking short timeouts
| with tasks that take a very hard to predict amount of
| time, will indeed incur into _knonw_ issues. But the
| other systemic failure modes described in the blog post
| are never mentioned by users AFAIK.
| computerfan494 wrote:
| I feel like you're dancing around admitting the core
| issue that Martin points out - RedLock is _not_ suitable
| for systems where correctness is paramount. It can get
| close, but it is not robust in all cases.
|
| If you want to say "RedLock is correct a very high
| percentage of the time when lease timeouts are tuned for
| the workload", I would agree with you actually. I even
| possibly agree with the statements "most systems can
| tolerate unlikely correctness failures due to RedLock
| lease violations. Manual intervention is fine in those
| cases. RedLock may allow fast iteration times and is
| worth this cost". I just think it's important to be
| crystal clear on the guarantees RedLock provides.
|
| I first read Martin's blog post and your response years
| ago when I worked at a company that was using RedLock
| despite it not being an appropriate tool. We had an
| outage caused by overlapping leases because the original
| implementor of the system didn't understand what Martin
| has pointed out from the RedLock documentation alone.
|
| I've been a happy Redis user and fan of your work outside
| of this poor experience with RedLock, by the way. I
| greatly appreciate the hard work that has gone into
| making it a fantastic database.
| eknkc wrote:
| I tend to use postgresql for distributed locking. As in, even if
| the job is not db related, I start a transaction and obtain an
| advisory lock which stays locked until the transaction is
| released. Either by the app itself or due to a crash or
| something.
|
| Felt pretty safe about it so far but I just realised I never
| check if the db connection is still ok. If this is a db related
| job and I need to touch the db, fine. Some query will fail on the
| connection and my job will fail anyway. Otherwise I might have
| already lost the lock and not aware of it.
|
| Without fencing tokens, atomic ops and such, I guess one needs a
| two stage commit on everything for absolute correctness?
| candiddevmike wrote:
| One gotcha maybe with locks is they are connection specific
| AFAIK, and in most libraries you're using a pool typically. So
| you need to have a specific connection for locks, and ensure
| you're using that connection when doing periodic lock tests.
| jroseattle wrote:
| We reviewed Redis back in 2018 as a potential solution for our
| use case. In the end, we opted for a less sexy solution (not
| Redis) that never failed us, no joke.
|
| Our use case: handing out a ticket (something with an identifier)
| from a finite set of tickets from a campaign. It's something akin
| to Ticketmaster allocating seats in a venue for a concert. Our
| operation was as you might expect: provide a ticket to a request
| if one is available, assign some metadata from the request to the
| allocated ticket, and remove it from consideration for future
| client requests.
|
| We had failed campaigns in the past (over-allocation, under-
| allocation, duplicate allocation, etc.) so our concern was
| accuracy. Clients would connect and request a ticket; we wanted
| to exclusively distribute only the set of tickets available from
| the pool. If the number of client requests exceeded the number of
| tickets, the system should protect for that.
|
| We tried Redis, including the naive implementation of getting the
| lock, checking the lock, doing our thing, releasing the lock. It
| was ok, but administrative overhead was a lot for us at the time.
| I'm glad we didn't go that route, though.
|
| We ultimately settled on...Postgres. Our "distributed lock" was
| just a composite UPDATE statement using some Postgres-specific
| features. We effectively turned requests into a SET operation,
| where the database would return either a record that indicated
| the request was successful, or something that indicated it
| failed. ACID transactions for the win!
|
| With accuracy solved, we next looked at scale/performance. We
| didn't need to support millions of requests/sec, but we did have
| some spikiness thresholds. We were able to optimize read/write db
| instances within our cluster, and strategically load
| larger/higher-demand campaigns to allocated systems. We continued
| to improve on optimization over two years, but not once did we
| ever have a campaign with ticket distribution failures.
|
| Note: I am not an expert of any kind in distributed-lock
| technology. I'm just someone who did their homework, focused on
| the problem to be solved, and found a solution after trying a few
| things.
| wwarner wrote:
| This is the best way, and actually the only sensible way to
| approach the problem. I first read about it here
| https://code.flickr.net/2010/02/08/ticket-servers-distribute...
| hansvm wrote:
| > only sensible way
|
| That's a bit strong. Like most of engineering, it depends.
| Postgres is a good solution if you only have maybe 100k QPS,
| the locks are logically (if not necessarily fully physically)
| partially independent, and they aren't held for long. Break
| any of those constraints, or add anything weird (inefficient
| postgres clients, high DB load, ...), and you start having to
| explore either removing those seeming constraints or using
| other solutions.
| wwarner wrote:
| Ok fair; I'm not really talking about postgres (the link i
| shared uses mysql). I'm saying that creating a ticket
| server that just issues and persists unique tokens, is a
| way to provide coordination between loosely coupled
| applications.
| zbobet2012 wrote:
| Yeah that's cookies. They are great.
| nh2 wrote:
| You are right that anything that needs up to 50000 atomic,
| short-lived transactions per second can just use Postgres.
|
| Your UPDATE transaction lasts just a few microseconds, so you
| can just centralise the problem and that's good because it's
| simpler, faster and safer.
|
| But this is not a _distributed_ problem, as the article
| explains:
|
| > remember that a lock in a distributed system is not like a
| mutex in a multi-threaded application. It's a more complicated
| beast, due to the problem that different nodes and the network
| can all fail independently in various ways
|
| You need distributed locking if the transactions can take
| seconds or hours, and the machines involved can fail while they
| hold the lock.
| fny wrote:
| You could just have multiple clients attempt to update a row
| that defines the lock. Postgres transactions have no limit
| and will unwind on client failure. Since connections are
| persistent, there's no need to play a game to determine the
| state of a client.
| nh2 wrote:
| Your scenario still uses a centralised single postgres
| server. Failure of that server takes down the whole locking
| functionality. That's not what people usually mean by
| "distributed".
|
| "the machines involved can fail" must also include the
| postgres machines.
|
| To get that, you need to coordinate multiple postgres
| servers, e.g. using ... distributed locking. Postgres does
| not provide that out of the box -- neither multi-master
| setups, nor master-standby synchronous replication with
| automatic failover. Wrapper software that provides that,
| such as Stolon and Patroni, use distributed KV stores /
| lock managers such as etcd and Consul to provide it.
| stickfigure wrote:
| I think this illustrates something important, which is that:
| You don't need _locking_. You need _< some high-level business
| constraint that might or might not require some form of
| locking>_.
|
| In your case, the constraint is "don't sell more than N
| tickets". For most realistic traffic volumes for that kind of
| problem, you can solve it with traditional rdbms transactional
| behavior and let it manage whatever locking it uses internally.
|
| I wish developers were a lot slower to reach for "I'll build
| distributed locks". There's almost always a better answer, but
| it's specific to each application.
| nasretdinov wrote:
| So basically your answer (and the correct answer most of the
| time) was that you don't really need distributed locks even if
| you think you do :)
| tonyarkles wrote:
| Heh, in my local developer community I have a bit of a
| reputation for being "the guy" to talk to about distributed
| systems. I'd done a bunch of work in the early days of the
| horizontal-scaling movement (vs just buying bigger servers)
| and did an M.Sc focused on distributed systems performance.
|
| Whenever anyone would come and ask for help with a planned
| distributed system the first question I would always ask is:
| does this system actually need to be distributed?! In my 15
| years of consulting I think the answer was only actually
| "yes" 2 or 3 times. Much more often than was helping them
| solve the performance problems in their single server system;
| without doing that they would usually just have ended up with
| a slow complex distributed system.
|
| Edit: lol this paper was not popular in the Distributed
| Systems Group at my school: https://www.usenix.org/system/fil
| es/conference/hotos15/hotos...
|
| "You can have a second computer once you've shown you know
| how to use the first one."
| Agingcoder wrote:
| I wanted to post the same paper. With Adrian Colyer's
| explanations:
| https://blog.acolyer.org/2015/06/05/scalability-but-at-
| what-...
| apwell23 wrote:
| Classic tech interview question
| etcd wrote:
| I guess this is embarassingly parralelizable in that you can
| shard by concert to different instances. Might even be a job
| for that newfangled cloudflare sqlite thing.
| OnlyMortal wrote:
| Interesting. We went through a similar process and ended up
| with Yugabyte to deal with the locks (cluster).
|
| It's based on Postgres but performance was not good enough.
|
| We're now moving to RDMA.
| hoppp wrote:
| I did distributed locking with Deno, and Deno KV hosted by Deno
| Deploy.
|
| Its using foundationdb, a distributed db. The deno instances
| running on local devices all connect to the same Deno KV to
| acquire the lock.
|
| But using postgres, a select for update also works, the database
| is not distributed tho.
| egcodes wrote:
| Once I wrote a dist. lock blog using this resource. Here it is:
| https://medium.com/sahibinden-technology/an-easy-integration...
| anonzzzies wrote:
| I am updating my low level and algo knowledge; what are good
| books about this (I have the one written by the author). I am
| looking to build something for fun, but everything is either a
| toy or very complicated.
| cosmicradiance wrote:
| System Design Interview I and II - Alex Xu. Take one of the
| topics and do it practically.
| dataflow wrote:
| > The lock has a timeout (i.e. it is a lease), which is always a
| good idea (otherwise a crashed client could end up holding a lock
| forever and never releasing it). However, if the GC pause lasts
| longer than the lease expiry period, and the client doesn't
| realise that it has expired, it may go ahead and make some unsafe
| change.
|
| Hold on, this sounds absurd to me:
|
| First, if your client _crashes_ , then you don't need a timed
| lease on the lock to detect this in the first place. The lock
| would get released by the OS or supervisor, whether there are any
| timeouts or not. If both of _those_ crash too, then the
| connection would eventually break, and the network system should
| then detect that (via network resets or timeouts, lack of
| heartbeats, etc.) and then invalidate all your connections before
| releasing any locks.
|
| Second, if the problem becomes that your client is _buggy_ and
| thus holds the lock _too long_ without crashing, then shouldn 't
| some kind of supervisor detect that and then kill the client
| (e.g., by the OS terminating the process) before releasing the
| lock for everybody else?
|
| Third, if you _are_ going to have locks with timeouts to deal
| with corner cases you can 't handle like the above, shouldn't
| they notify the actual program somehow (e.g., by throwing an
| exception, raising a signal, terminating it, etc.) instead of
| letting it happily continue execution? And shouldn't those cases
| wait for some kind of verification that the program was notified
| before releasing the lock?
|
| The whole notion that timeouts should somehow permit the program
| execution to continue ordinary control flow sounds like the root
| cause of the problem, and nobody is even batting an eye at it? Is
| there an obvious reason why this makes sense? I feel I must be
| missing something here... what am I missing?
| neonbrain wrote:
| The assumption that your server will always receive RST or FIN
| from your client is incorrect. There are some cases when these
| packets are being dropped, and your server will stay with an
| open connection while the client on the remote machine is
| already dead. P.S. BTW, it's not me who downvoted you
| dataflow wrote:
| I made no such assumption this will always happen though?
| That's why the comment was so much longer than just "isn't
| TCP RST enough?"... I listed a ton of ways to deal with this
| that didn't involve letting the program continue happily on
| its path.
| neonbrain wrote:
| Sorry didn't see your message. What I mean is that if you
| are not getting RST/FIN or any other indication for your
| closed communication channel, you only left to the
| mechanism of timeouts to recognize a partitioned/dead/slow
| worker client. Basically, you've mentioned them yourself
| ("timeouts, lack of heartbeats, etc" in your post are all
| forms of timeouts). So you can piggyback on these timeouts
| or use a smaller timeout configured in the lease, whatever
| suits your purpose, I guess. This is what I believe
| Kleppmann referring here to. He's just being generic in his
| description.
| dataflow wrote:
| > What I mean is that if you are not getting RST/FIN or
| any other indication for your closed communication
| channel, you only left to the mechanism of timeouts to
| recognize a partitioned/dead/slow worker client.
|
| Timeouts were a red herring in my comment. My problem
| wasn't with the mere existence of timeouts in corner
| cases, it was the fact that the worker is assumed to keep
| working merrily on, despite the timeouts. That's what I
| don't understand the justification for. If the worker is
| dead, then it's a non-issue, and the lease can be broken.
| If the system is alive, the host can discover (via RST,
| heartbeats, or other timeouts) that the storage system is
| unreachable, and thus prevent the program from continuing
| execution -- and at that point the storage service can
| still break the lease (via a timeout), but it would
| actually come with a timing-based guarantee that the
| program will no longer continue execution.
| winwang wrote:
| This isn't a mutex, but the distributed equivalent of one. The
| storage service is the one who invalidates the lock on their
| side. The client won't detect its own issues without additional
| guarantees not given (supposedly) by Redlock.
| neonbrain wrote:
| My understanding is that the dataflow user was talking about
| a notification which the server is supposed to receive from
| the OS in the case of a broken client connection. This
| notification is usually received, but cannot be guaranteed in
| a distributed environment.
| dataflow wrote:
| I understand that. What I'm hung up on is, why does the
| storage system feel it is at liberty to just invalidate a
| lock and thus let someone else reacquire it without _any_
| sort of acknowledgment (either from the owner or from the
| communication systems connecting the owner to the outside
| world) that the owner will no longer rely on it? It just
| seems fundamentally wrong. The lock service just... doesn 't
| have that liberty, as I see it.
| winwang wrote:
| What if the rack goes down? But I think the author is
| saying a similar thing to you. The fenced token is
| essentially asserting that the client will no longer rely
| on the lock, even if it tries to. The difference is the
| service doesn't need any acknowledgement, no permission
| needed to simlly deny the client later.
| dataflow wrote:
| To be clear, my objection is to the premise, not to the
| offered solution.
|
| To your question, could you clarify what exactly you mean
| by the rack "going down"? This encompasses a lot of
| different scenarios, I'm not sure which one you're asking
| about. The obvious interpretation would break all the
| connections the program has to the outside world, thus
| preventing the problem by construction.
| winwang wrote:
| The rack could go down from the point of view of the
| storage service, but the machine/VM itself could be
| perfectly fine.
| dataflow wrote:
| In that scenario the machine would become aware that it
| can't reach the storage service either, no? In which case
| the host can terminate the program, or the network can
| break all the connections between them, or whatever. By
| default I would think that the lease shouldn't be broken
| until the network partition gets resolved, but I think
| the storage system _could_ have a timeout for breaking
| the lease in that scenario if you really want, but then
| it would come with a time-based guarantee that the
| program isn 't running anymore, no?
| winwang wrote:
| Everything you're saying is plausibly possible in the
| absurdly large search space of all possible scenarios.
| The author's premise, however, is rooted in the specific
| scenario they lay out, with historical supporting
| examples which you can look into. Even then, the premise
| before all that was essentially: Redlock does not do what
| people might expect of a distributed lock. Btw I do have
| responses to your questions, but often times in these
| sorts of discussions, I find that there can always be an
| objection to an objection to ... etc. The "sense" (or
| flavor) in this case is that "we are taking a complex
| topic too lightly". In fact, I should probably continue
| reading the author's book (DDIA) at some point...
| wbl wrote:
| The process that owns the lock is never heard from again.
| jojolatulipe wrote:
| At work we use Temporal and ended up using a dedicated workflow
| and signals to do distributed locking. Working well so far and
| the implementation is rather simple, relying on Temporal's
| facilities to do the distributed parts of the lock.
| robertlagrant wrote:
| I'm keen to use Temporal, but I've heard it can be flaky. In
| your experience has it worked well?
| calmoo wrote:
| Rock solid in my experience and kind of a game changer. I'm
| surprised it's not more widespread in large orgs.
| Icathian wrote:
| We use it a ton at my shop for internal things like release
| rollouts. Fairly big tech company, and same experience.
| It's an excellent product.
___________________________________________________________________
(page generated 2024-10-20 23:00 UTC)