[HN Gopher] A Heisenbug lurking in async Python
       ___________________________________________________________________
        
       A Heisenbug lurking in async Python
        
       Author : willm
       Score  : 336 points
       Date   : 2023-02-11 17:25 UTC (5 hours ago)
        
 (HTM) web link (textual.textualize.io)
 (TXT) w3m dump (textual.textualize.io)
        
       | dataflow wrote:
       | The notion of fire-and-forget is itself the problem. Even with
       | threads, you should have them join the main thread before the
       | program exits. Which implies you should hold strong references to
       | them until then. Most people don't go out of their way to do this
       | even when they're able to, but that's what you're supposed to do.
        
       | bornfreddy wrote:
       | Wow. What a strange design decision, as evidenced by sheer number
       | of developers who don't / didn't know about this (myself
       | included). I hope this gets _fixed_ instead of just documented.
        
         | jcheng wrote:
         | Agreed, I'm really surprised at all the comments defending this
         | behavior. I suspect there is a non-obvious reason why it's this
         | way, but "you should've read the docs" and "but why _wouldn't_
         | you hold your own strong reference" are weird takes IMHO.
        
       | boomskats wrote:
       | As someone who happens to be eternally grateful to the author for
       | his contribution to the Python ecosystem [0], I kinda feel like
       | this comment thread is overreacting to his overreaction. When I
       | look at this post all I see is a useful, well explained, byte-
       | size writeup that a search engine might recommend to someone
       | looking for help in writing async Python.
       | 
       | Maybe it's because a bunch of my friends are Scottish and I get
       | their sense of humour.
       | 
       | [0]: https://rich.readthedocs.io/ (yes I'm talking about the
       | fancy new progress bar that pip got recently)
        
       | rlpb wrote:
       | This issue doesn't exist with Trio's structured concurrency
       | model. In other words, the problem is already solved.
        
         | nbadg wrote:
         | I'll +1 the Trio shoutout [1], but it's worth emphasizing that
         | the core concept of Trio (nurseries) now exists in the stdlib
         | in the form of task groups [2]. The article mentions this very
         | briefly, but it's easy to miss, and I wouldn't describe it as a
         | solution to this bug, anyways. Rather, it's more of a different
         | way of writing multitasking code, which happens to make this
         | class of bug impossible.
         | 
         | [1] https://github.com/python-trio/trio
         | 
         | [2] https://docs.python.org/3/library/asyncio-task.html#task-
         | gro...
        
           | Tanjreeve wrote:
           | Oh good so now we can all move to this years Async flavour in
           | Python.
        
       | edfletcher_t137 wrote:
       | This is a great blog post. Concise, lacking fluff or extraneous
       | prose, it gets right to the point, presents the primary-source
       | reference and then gets right to the solution. A bit of
       | editorializing in the middle but that's completely allowed when
       | writing this tightly. Well damn done, OP.
       | 
       | And also it's _great_ information that I - like I 'm sure many of
       | you - also never noticed. THANK YOU!
        
         | [deleted]
        
         | mgsk wrote:
         | What does this add this isn't already right there in the
         | documentation?
        
           | nkrisc wrote:
           | If there was nothing to add then there wouldn't be loads of
           | projects on GitHub making exactly this mistake.
        
           | Jtsummers wrote:
           | It draws attention to a problem that a lot of people have
           | created for themselves by not reading the documentation (or
           | not recalling it if they read it). I guess the author could
           | have just linked the documentation but then they couldn't
           | have added the additional context of the github search
           | demonstrating how common it is.
        
             | newaccount74 wrote:
             | I must have looked through the docs for create_task a dozen
             | times while trying to figure out how async/await works in
             | Python but still managed to overlook this part.
        
               | edflsafoiewq wrote:
               | That is unsurprising. It was first added as a brief note
               | only in 3.9, and expanded to its present length only in
               | 3.10.
        
           | klyrs wrote:
           | The author doesn't go into much detail on that point: this
           | warning should be present in documentation of many Python
           | libraries that use create_task and return the result to the
           | user unless that library stores those tasks in a collection
           | as is recommended -- at which point the library author had
           | better roll their own garbage collection!
        
         | isoprophlex wrote:
         | Well, I don't know, I kinda miss the human angle. I'd have
         | loved to first read six paragraphs about how the author's
         | grandmother raised them on home grown threads and greenlets :^)
        
           | nickjj wrote:
           | > I'd have loved to first read six paragraphs about how the
           | author's grandmother raised them on home grown threads and
           | greenlets.
           | 
           | With recipes, often times your problem is you want to learn
           | how to make something where having the steps listed out is
           | the most important thing. The story behind the recipe isn't
           | important to solve your problem but for tech the story around
           | the choice is important. Often times the "why" is really
           | important and I really like hearing about what led someone to
           | use something first. Often times that's more important or
           | equally as important as the implementation details.
           | 
           | It wouldn't make sense for this post given its title but if
           | someone were making a post about why they chose to use async
           | in Python I'd expect and hope that half of the post goes into
           | the gory details of how they tried alternatives and what
           | their shortcomings were for their specific use cases. That
           | would help me as the reader generalize their post to my
           | specific use cases and see if it applies.
        
             | bialpio wrote:
             | Off-topic but the life story is there to make them eligible
             | to be protected by copyright. IANAL.
             | 
             | Source: https://copyrightalliance.org/are-recipes-
             | cookbooks-protecte...
        
               | flandish wrote:
               | Interesting. I always thought it was search engine
               | optimization.
        
               | aidenn0 wrote:
               | SEO is definitely a big part of it; Google penalized
               | pages where people closed or navigated away quickly.
        
               | fbdab103 wrote:
               | I immediately bounce from those Stackoverflow clones that
               | keep appearing up at the top of searches. So, I am
               | wondering how much this is still weighted in the scores.
        
               | gdprrrr wrote:
               | https://github.com/quenhus/uBlock-Origin-dev-filter
        
               | jonas21 wrote:
               | You might. But many people don't. They just want an
               | answer and don't care if it's a clone or not.
        
               | chucksmash wrote:
               | Had this driven home recently, watching a younger dev
               | happily clicking links I've long ago blocked via browser
               | extension (w3schools AND geeksforgeeks _in one session_ )
        
               | rmbyrro wrote:
               | SEO makes total sense. I always add grandma keywords when
               | I'm searching for Python stuff on Google.
               | 
               | Like: "grandma, how the hell have I still not memorized
               | the API and keep needing to resort to the same doc pages
               | again and again?"
               | 
               | Now I trained ChatGPT with grandma letters from when I
               | was young, so it will answer just like if it was my
               | grandma.
        
               | water-your-self wrote:
               | Its engagement optimization. Adsense pays more if you
               | spend more time on the page
        
               | yunohn wrote:
               | When is the last time you heard of online recipe blogs
               | enforcing copyright claims on other blogspam? Ridiculous.
               | 
               | The real reason is simple, people who write recipes
               | aren't robots - they're expressing their stories and
               | emotions, while explaining how to make food that's dear
               | to them..
        
       | throwaway81523 wrote:
       | There's a similar thing in tkinter but I guess users discover it
       | faster, since the failure if you don't save the reference shows
       | up fairly quickly.
        
       | Lammy wrote:
       | I experienced a heisenbug exactly like this in Ruby when trying
       | to `while case Ractor::receive`:
       | https://github.com/okeeblow/DistorteD/blob/dd2a99285072982d3...
        
       | zzzeek wrote:
       | I think asyncio is kind of neat for what it's good at, but
       | beginner programmers who have never wrote code before are going
       | directly to using Python asyncio (i know this because they are
       | telling me so when they post sqlalchemy discussions). This is
       | just wrong.
        
       | samwillis wrote:
       | This is one of many reasons I'm sceptical of the current trend in
       | Python to "async all the things". The nuance to how it operates
       | is often opaque to the developer, particularly those less
       | experienced.
       | 
       | GUI toolkits (like Textual) however are a really good use case
       | for Asyncio. Human interaction with a program is inherently
       | asynchronous, using async/await so that you can more cleanly
       | specify your control flow is so much better than complicated
       | callbacks. Using async/await in front end JS code for example is
       | a delight.
       | 
       | Where I'm particularly unconvinced of their use is in server side
       | view and api end point processing. The majority of the time you
       | have maybe a couple of IO opps that depend on each other. There
       | is often little than can be parallelised (within a request) and
       | so there are few performance gains to be a made. Traditional
       | synchronous imperative code run with a multithreaded server is
       | proven, scalable and much easier to debug.
       | 
       | There are always places where it's useful though, things such as
       | long running requests (websockets, long polling), or those very
       | rare occurrences where you do have many easily parallelizable IO
       | opps within one short request.
        
         | heavyset_go wrote:
         | > _Where I 'm particularly unconvinced of their use is in
         | server side view and api end point processing. The majority of
         | the time you have maybe a couple of IO opps that depend on each
         | other. There is often little than can be parallelised (within a
         | request) and so there are few performance gains to be a made.
         | Traditional synchronous imperative code run with a
         | multithreaded server is proven, scalable and much easier to
         | debug. Traditional synchronous imperative code run with a
         | multithreaded server is proven, scalable and much easier to
         | debug._
         | 
         | Python doesn't have multithreading that scales or supports real
         | parallelism. asyncio has very measurable performance benefits
         | for exactly that use case you've mentioned versus threaded
         | servers.
        
           | zzzeek wrote:
           | Sorry that's not accurate. Asyncio and threading offer the
           | same variety of "parallelism" , which is that both can wait
           | on multiple io streams at once (the gil is released waiting
           | on io). Neither offer CPU parallelism, unless lots of your
           | CPU work is in native extensions that release the gil. In
           | that unusual case, threading would offer parallelism where
           | asyncio wouldn't.
           | 
           | Asyncio's single advantage is you can wait on _lots_ of io
           | streams, like many thousands, very cheaply without having to
           | roll non blocking IO queueing code directly.
        
             | heavyset_go wrote:
             | I didn't say that asyncio offered parallelism, I'm pointing
             | out that normal assumptions about multithreading you'd make
             | with other languages don't always apply to Python. You'd
             | typically assume that threads offer parallelism, a property
             | you might choose to use them for over something like
             | single-threaded asyncio.
             | 
             | I've found that for even IO bound workloads, the amount of
             | throughput plateaus when using a relatively small amount of
             | threads despite the GIL being released on IO.
        
         | Topgamer7 wrote:
         | These days with graphql, or complex microservices
         | architectures, you could have multiple hops to fulfil l the
         | original request.
         | 
         | Flask sync will hold that thread hostage until the request is
         | done. Where async with properly used async libs will allow
         | other requests to process.
         | 
         | We often have medium sized reports take seconds. That is a lot
         | of time to wait. And would just end up bloating your service
         | scaling to handle more connections.
         | 
         | Any service with decently long lived network requests will
         | benefit from event loop handled scheduling.
        
         | traverseda wrote:
         | >Where I'm particularly unconvinced of their use is in server
         | side view and api end point processing.
         | 
         | Sure, performance isn't going to get better, but for websockets
         | and server sent events the occasional long-lived async task can
         | be great. Especially when you need to poll something, or check
         | in on a subprocess.
        
         | nbadg wrote:
         | The thing is, there's a lot more nuance to it than this.
         | Async/await is part of the language syntax in python, but
         | asyncio is only one particular implementation of an event loop
         | framework to power it. But really what async/await provides is
         | a general-purpose cooperative multitasking syntax. This allows
         | other libraries to implement their own event loop frameworks,
         | each with their own different semantics and considerations (the
         | two best-known alternatives being Curio and Trio). At a
         | language level, there's nothing even forcing you to use
         | async/await for ascync IO -- you could, if you really wanted,
         | probably write a library that used it to start threads and
         | await their completion.
         | 
         | So you have, from highest-level to lowest-level: application
         | code, async/await language syntax, the event loop framework,
         | and then the implementation of the event loop itself. The OP
         | article concerns a peculiar implementation detail in the lowest
         | level that makes it very easy to write bugs at the highest
         | level.
         | 
         | But that means that even if you do "async all the things",
         | you'll only encounter this situation if you write your
         | application code in a particular way. It just so happens that
         | "in a particular way" is, in this case, the overwhelming
         | majority of how people write it, which is, of course, why the
         | OP article is relevant.
        
           | heavyset_go wrote:
           | > _The OP article concerns a peculiar implementation detail
           | in the lowest level that makes it very easy to write bugs at
           | the highest level._
           | 
           | Are other async implementations using the asyncio.Task
           | abstraction? I haven't looked into it, but I assumed that
           | asyncio.Task was tied to the asyncio implementation and event
           | loop.
        
         | pdonis wrote:
         | _> GUI toolkits (like Textual) however are a really good use
         | case for Asyncio._
         | 
         | Only if the GUI toolkit is explicitly written to be asyncio-
         | aware and use asyncio's event loop. Textual appears to be
         | written specifically to do that.
         | 
         | However, other GUI toolkits that I'm aware of that have Python
         | bindings aren't written that way. Qt, for example, uses its own
         | event loop, and if you want anything other than a GUI event to
         | be fed into Qt's event loop so your event-driven code can
         | process it, you have to do that by hand and make sure it works.
         | There is no point in even trying to use another event loop,
         | such as Python's asyncio event loop, since that loop will never
         | run while Qt's event loop is running.
        
         | samsquire wrote:
         | I am a huge fan of parallel and async code. I spend a lot of
         | time researching it and trying to design software that is
         | easily parallelisable.
         | 
         | Many GUIs use the event/message pump pattern, such as Windows
         | 32 API. Qt does something with its event loop (QEventLoop)
         | 
         | Threads are a rather low level instrument to get background
         | tasks going because the interface between the main thread and
         | the threads is rather omitted.
         | 
         | In Java you could use a ConcurrentLinkedQueue. And in Python
         | you can use JoinableQueue.
         | 
         | I am heavily interested in this space because I want to write
         | understandable software that anybody can pick up and work with.
         | I worked on a JMS log viewer that used threads but would crash
         | with ConcurrentModificationException due to not being thread
         | safe. I changed it to be thread safe but its performance
         | dropped through the floor. In my learnings since then I should
         | hast sharded each JMS connection topic to its own thread or
         | multiplexed multiple JMS topics per thread and loop over them.
         | The main thread can interrogate the thread with a lock, that
         | should be faster than every thread trying to acquire the lock.
         | It would be driven by the main thread but the work is done in
         | the background. The threads can keep the fetched messages in
         | memory until the main thread is ready for them.
         | 
         | I think with the right abstraction, thread safety can be
         | achieved and concurrency shouldn't be something to be afraid
         | of. It is very difficult and challenging working at the low
         | levels of concurrency such as a concurrent browser engine.
         | (I've not done that though.)
         | 
         | This is why languages such as Pony lang, Inko, Cyber and
         | Erlang, Elixir are so promising. We can build high performance
         | systems that parallelise.
         | 
         | Writing an async/await pipeline that looks synchronous is far
         | easier to understand and maintain than nested callbacks. So I
         | can see where async is useful. I just hope we can design async
         | software to be simpler to maintain and extend.
        
         | whoopdeepoo wrote:
         | I don't write any colored function code in python, I'd much
         | rather work with process/thread pools
        
           | Animats wrote:
           | Me too, but threading is botched in Python. Not just the
           | Global Interpreter Lock. Some Python packages are not thread-
           | safe, and it's not documented which ones are not. Years ago I
           | discovered that CPickle was not thread safe, and that wasn't
           | considered a problem.
        
         | michael_j_x wrote:
         | I am not sure I agree that the GUI is a good use case for
         | async. A human interaction with the program must almost always
         | pre-empt whatever the program was running, so I can not see how
         | a cooperative multi-threading runtime like async Python can
         | work in such a scenario.
        
       | kodablah wrote:
       | It is for this reason in Temporal Python[0], where we wrote a
       | custom durable asyncio event loop, that we maintain strong
       | references to tasks that are created in workflows. This wouldn't
       | be hard for other event loop implementations to do too.
       | 
       | 0 - https://github.com/temporalio/sdk-python
        
         | make3 wrote:
         | he never said it was hard, his point is that it's unintuitive &
         | a lot of people don't know or don't remember
        
           | kodablah wrote:
           | I mean the default asyncio event loop can be
           | replaced/extended where you won't have to know/remember on
           | each create_task. But yes, it is an unintuitive default.
        
       | NelsonMinar wrote:
       | Does anyone understand why the event loop only keeps weak
       | references to tasks? It'd seem wise to do something to stop it
       | from being garbage collected while running, maybe also while
       | waiting to run.
        
         | coopsmoss wrote:
         | I agree, I think this is very unpythonic behavior
        
         | masklinn wrote:
         | Only guess I'd have is to protect the system against infinite-
         | loop tasks, but I don't remember any other runtime caring and
         | an a task which never terminates seems easier to diagnose than
         | one which disappears on you.
        
         | kortex wrote:
         | Because it's almost always the case that the consumer is going
         | to keep a reference to the task in some way, so that is the
         | logical choice for the "primary owner" of the task. Python
         | doesn't have ownership per se like rust, but if you keep more
         | than one hard reference to an object around, it'll prevent
         | collection, so in cases such as this it makes sense to
         | designate one primary owner and have all other references be
         | weakref.
        
           | skitter wrote:
           | > if you keep more than one hard reference to an object
           | around, it'll prevent collection
           | 
           | Which is the behavior the parent comment asks for.
        
       | anthomtb wrote:
       | Well, looks like I know what I am doing first thing on Monday. I
       | converted a bunch of code to asyncio a while back. I have yet to
       | run into any heisenbug in that code and want to keep it that way.
        
       | cpburns2009 wrote:
       | I've been working on a PySide6 application recently using
       | asyncio. I read the docs but totally overlooked the requirement
       | to hold references to tasks created with `create_task()`.
        
       | dehrmann wrote:
       | Eww. What's especially nasty is this is the opposite behavior of
       | threads.
        
       | aeturnum wrote:
       | I really think this writer doth protest too much.
       | 
       | Yes, the base async interface is confusing and overly complex.
       | It's a downside! As they note lots of people have stepped in to
       | provide better helpers (like TaskGroups) - but these are the docs
       | for the base library!
       | 
       | > _But who reads all the docs? And who has perfect recall if they
       | do?_
       | 
       | Everyone reads the docs? That is why you don't need perfect
       | recall because you can read them whenever you want.
       | 
       | Python has lots of confusing corner cases ("" is truthy, you need
       | to remember to call copy [or maybe deepcopy!] sometimes, all the
       | other situations where you confuse weak v.s. strong references).
       | They cause really common bugs. It's just a hazard of the language
       | in general and the choices it makes (much like tasks being
       | objects is a hazard). I do understand why people think they can
       | throw away task references (based on other languages) - but this
       | is Python! The garbage collector exists and you gotta check if
       | you own the object or something else does.
       | 
       | Edit: this feels like an experienced Python developer, who has
       | already internalized all the older, non-async Python weirdness,
       | being taken aback by weirdness they didn't expect. Like, I feel
       | you, it does suck - but it's not a bug that values you don't
       | retain may get garbage collected.
        
         | No1 wrote:
         | He didn't even have to read "all the docs" - just the ones that
         | pertain the the function that he is using. And then not ignore
         | the section marked "Important" _and_ the highlighted  "Note".
        
           | richbell wrote:
           | What if he read the docs for that function prior to the
           | "important" note being added?
        
         | Karunamon wrote:
         | > _Everyone reads the docs?_
         | 
         | The author goes on to say they found this pattern lurking in
         | various projects on github. So, no. The problem is that this
         | behavior is subtle, not intuitive, and unless you are reading
         | the actual documentation top to bottom (and not just the
         | function signature and first paragraph from the pop up in your
         | IDE) you will likely get bitten by this.
         | 
         | What is the point of your comment? The author _shouldn 't_ have
         | called out the upturned rake in the darkened shed?
        
           | rollcat wrote:
           | > The author goes on to say they found this pattern lurking
           | in various projects on github.
           | 
           | I'd call it an anti-pattern. If you spawn a process/thread,
           | and never wait/join it, it means you don't actually care what
           | it does, if it crashes, etc. I don't see a problem with
           | Python's behavior here.
        
           | aeturnum wrote:
           | I wouldn't say _shouldn 't_ - they are free to do what they
           | want. But this is a blog post about something that can trip
           | you up that the docs highlight - which the author calls a
           | "heisenbug". The author doesn't even have a suggestion for
           | the docs, which already calls out the problem they
           | encountered, they just note that there are helpers for this
           | problem (which is true).
           | 
           | The point of my comment is that subtle, non intuitive things
           | like this are all over Python and, while this one is
           | _particularly bad_ , this blog post makes it seem like more
           | of an aberration than it is.
        
         | IshKebab wrote:
         | > Everyone reads the docs?
         | 
         | Wow I've heard people say that everyone _should_ read all of
         | the docs (which isn 't really true) but I've never heard anyone
         | claim that everyone _does_ read all of the docs! Wild.
        
         | raverbashing wrote:
         | > "" is truthy
         | 
         | Humm, no? Unless you mean ("",)                   >>> not ""
         | True
        
           | aeturnum wrote:
           | Oh, sorry, you are right - "" is false-y, even though it's a
           | valid empty value. So it's hard to tell the difference
           | between a value not being filled and a value being filled
           | with an empty value.
           | 
           | ex:                 answers = {}       answers["I exist"] =
           | ""       if answers["I exist"]:           print("a")
           | 
           | does not print.
        
             | fbdab103 wrote:
             | I guess I am too deeply in the Python ecosystem to see a
             | problem here. Unless you want to check for the existence of
             | "I exist"? In which case, the Python Way would be
             | answers = {}       answers["I exist"] = ""       if "I
             | exist" in answers:           print("a")
        
               | pacaro wrote:
               | Maybe                 ...       if answers.get('I
               | exist'):         print('a')
               | 
               | Which is why you should always explicitly check for
               | _None_ if that is your intent.
        
               | aeturnum wrote:
               | It's not a problem? The async interface isn't a problem
               | either. It's just a thing you have to remember about
               | python: "most input is truthy except for the input that
               | isn't"
               | 
               | "Most of the time you don't disrupt your program by not
               | keeping the returned reference in scope except for when
               | you do"
               | 
               | It's just a thing that trips people up.
        
               | dwattttt wrote:
               | > It's just a thing you have to remember ...
               | 
               | The more of these things there are, the more brainpower
               | you devote to remembering the right way to do things; if
               | you don't you introduce bugs, a subtle, painful one here.
        
               | heavyset_go wrote:
               | "Empty containers are falsy" is a Python fundamental,
               | this isn't a subtle bug, but an obvious one.
        
               | fbdab103 wrote:
               | Truthy is a Pythonic core principle of the language. It
               | is not an edge case phenomenon in the language which I
               | would expect a regular practitioner to confuse.
               | 
               | https://docs.python.org/3/library/stdtypes.html#truth-
               | value-...
        
               | aeturnum wrote:
               | I mean, I've seen bugs around that in code I've worked on
               | and I've created bugs where it's a factor.
               | 
               | Weakrefs are also a core part of the language:
               | https://docs.python.org/3/library/weakref.html . You
               | can't use python without using them.
        
               | fiddlerwoaroof wrote:
               | What I learned when I wrote Python professionally was
               | "never rely on truthiness" explicitly writing out a
               | boolean expression that does what you want is more
               | explicit ("explicit is better than implicit", PEP 8) and
               | prevents a whole class of bugs down the line.
        
               | nemetroid wrote:
               | PEP 8, which you mention, explicitly recommends relying
               | on truthiness:
               | 
               | > For sequences, (strings, lists, tuples), use the fact
               | that empty sequences are false:                 #
               | Correct:       if not seq:       if seq:            #
               | Wrong:       if len(seq):       if not len(seq):
        
               | AeroNotix wrote:
               | PEP8 is touted a lot as if it is a perfectly correct tome
               | of ... correctness. I've worked in Python long enough to
               | know that it both doesn't cover everything and the advice
               | is sometimes actively bad.
        
             | heavyset_go wrote:
             | > _if answers[ "I exist"]:_                   if "I exist"
             | in answers:              ...
        
             | wizzwizz4 wrote:
             | > _So it 's hard to tell the difference between a value not
             | being filled and a value being filled with an empty value._
             | >>> answers = {}       >>> if answers["I don't exist"]:
             | ...     print("a")            Traceback (most recent call
             | last):         File "<pyshell#3>", line 1, in <module>
             | if answers["I don't exist"]:       KeyError: "I don't
             | exist"
             | 
             | The method you're trying to use doesn't work _anyway_ : it
             | doesn't matter that it's confusing. You'd have the same
             | problem with the value False.
        
         | Etheryte wrote:
         | I think you may be too bold with the assumption here,
         | personally I would wager that the majority of people who write
         | Python don't even know Python has official docs outside of a
         | site called Stack Overflow.
        
         | leni536 wrote:
         | Considering how many times I need to add site:python.org to my
         | python search queries to actually get to the docs, I assume
         | that a surprisingly low number of python developers actually
         | read the docs.
        
           | 0x008 wrote:
           | If you use Druck duck go you can prefix search with "!py3"
        
         | iforgotpassword wrote:
         | > Everyone reads the docs?
         | 
         | For Python? The language where everyone just cobbles together
         | random code from the internet and other repos? I can totally
         | see how this mistake happens left and right. The bar of entry
         | for this language is way too low to assume only rigorous senior
         | devs use it.
        
       | bandyaboot wrote:
       | He doesn't really get into what makes this a Heisenbug, only that
       | it's indeterminate in nature. Would attaching a debugger/stepping
       | through the code make it less likely that your task would get
       | garbage collected out from under you?
        
         | Izkata wrote:
         | You're probably going to need a reference to the task in order
         | to inspect it in the debugger. Creating that reference prevents
         | the bug.
        
         | foobarbecue wrote:
         | Yeah, he seems to be re-defining the term to mean "a bug that
         | occurs occasionally depending on system state" as opposed to "a
         | bug that changes behavior when you observe it closely e.g. in a
         | debugger."
        
           | macintux wrote:
           | The first is a common way of using the term Heisenbug. I
           | first heard it used that way 10 years ago when discussing
           | Erlang's error handling model.
        
         | throwaway81523 wrote:
         | CPython does most of its memory management by reference
         | counting, which fails to reclaim circular structure. So to make
         | sure it gets everything, it occasionally runs a conventional
         | tracing GC. If the GC happens to run just after you create that
         | async task, the task itself can get collected, it sounds like.
         | It's good to know about this and is (my own editorializing) yet
         | another reason Python3 should have used Erlang-style
         | concurrency instead of this async stuff.
        
       | No1 wrote:
       | His argument hinges on "I can't be bothered to read the docs on
       | the stuff I'm using." So instead of reading the docs on
       | coroutines and tasks before using them, writes a rant about how
       | it's all wrong because he didn't understand how it works.
       | 
       | On a more fundamental level, why would anyone assume that a
       | coroutine is guaranteed to complete if it is never awaited? There
       | is no reason a scheduler could not be totally lazy and only
       | execute the coroutine once awaited.
       | 
       | At least he bothered to make note of TaskGroups, also clearly
       | shown in his documentation screenshot, immediately above the
       | section marked _Important_ that went ignored, and finishes with
       | "As long as all the tasks you spin up are in TaskGroups, you
       | should be fine." Yep, that's all there was to it.
        
         | ptx wrote:
         | > _There is no reason a scheduler could not be totally lazy and
         | only execute the coroutine once awaited._
         | 
         | Isn't the point of create_task (which is what the article is
         | about) to launch concurrent tasks without immediately awaiting
         | them? The example in the docs [1] wouldn't work (in the stated
         | manner) if the task didn't start until it was awaited.
         | 
         | > _At least he bothered to make note of TaskGroups [...] Yep,
         | that 's all there was to it._
         | 
         | That only works on Python 3.11, which was released just a few
         | months ago. Debian still uses 3.9, for example, so the
         | TaskGroups solution can't be used everywhere yet.
         | 
         | [1] https://docs.python.org/3/library/asyncio-
         | task.html#coroutin...
        
         | zackees wrote:
         | [dead]
        
       | [deleted]
        
       | [deleted]
        
       | m3047 wrote:
       | Hrmmmm.
       | 
       | > But who reads all the docs?
       | 
       | asyncio.create_task() doesn't exist in 3.6, and I can't find the
       | string "to avoid a task disappearing" in the doc, so I'll go out
       | on a limb: there is no such doc. However I see the reference to
       | weakref.WeakSet.
        
         | Jtsummers wrote:
         | The world didn't end in 2016. Welcome to seven years in the
         | future where this documentation does, in fact, exist:
         | 
         | https://docs.python.org/3/library/asyncio-task.html#asyncio....
        
       | cutler wrote:
       | Maybe grafting async onto a single threaded dynamic language just
       | isn't such a good idea in the first place.
        
         | murphy214 wrote:
         | bingo
        
       | qxmat wrote:
       | Python has a few weird issues like this. The last one I
       | encountered was with a class inheriting Thread, join and the SQL
       | Server ODBC driver on Linux. Fairly sure I hit page faults thanks
       | to a shallow copy on driver allocated string data but didn't have
       | the time to investigate like the hero of this blog post.
        
       | whoopdeepoo wrote:
       | > But who reads all the docs
       | 
       | Why is this so common? Do people seriously not read a
       | language/library documentation? That's the absolute first thing I
       | do when evaluating a technology.
        
         | adamckay wrote:
         | Because people have deadlines and need to get things working.
         | You read enough to figure out how to do what you need to do and
         | then mostly move on.
         | 
         | This function was added in 3.7 with no note on the importance
         | of saving a reference. In 3.9 a note was added "Save a
         | reference to the result of this function, to avoid a task
         | disappearing mid execution." which was then expanded with the
         | explanation of a weak reference in 3.10.
        
         | skitter wrote:
         | It absolutely is common. People see there is a len function
         | that takes one argument, they call len(some_collection), see
         | that it indeed returns the number of items in the collection
         | like they expect and move on. They don't expect len to return a
         | negative number instead on Thursdays, and of course it doesn't
         | because that would be a pretty big footgun. People also see
         | that there is a create_task function that takes a coroutine,
         | they call create_task(some_coroutine), see that the coroutine
         | indeed runs like they expect, and move on. Sure, you're
         | _supposed_ to await the result, but maybe they don 't need the
         | awaited value anymore, only the side effects, and see that it
         | still works.
        
         | throwaway81523 wrote:
         | I had a manager who actually told me not to read docs. I was a
         | bad report and read them anyway.
        
       | winter_blue wrote:
       | This article just makes me feel like Python, while a language
       | with nice-ish syntax, is a language that was poorly hacked and
       | put together with little concern/thought about the real-world
       | implications of poor design decisions like this async design
       | decision (and also dynamic typing - _a terrible thing in any_
       | language).
        
         | crdrost wrote:
         | Most languages have something like this, usually around async.
         | 
         | For instance NodeJS has had a bit of this around promises, and
         | eventually needed to institute the rule "if a promise rejects
         | with an error, anf nobody is around to hear it, we will crash
         | your program on the assumption that you probably needed to
         | clean up some resources but didn't and now they're going to
         | leak. Listen to the error with a handler that does nothing, if
         | we are wrong about that."
        
           | macintux wrote:
           | One of many reasons I like Erlang: _everything_ is async, so
           | you have plenty of tooling /libraries/core language features
           | to support you.
        
         | photochemsyn wrote:
         | 'async footguns' returns 20,000+ hits on Google. Top one
         | happens to be:
         | 
         | https://news.ycombinator.com/item?id=32086973
         | 
         | > "Async seems to be the first big "footgun" of Rust. It's
         | widespread enough that you can't really avoid interacting with
         | it, yet it's bad enough that it makes..."
        
       | deschutes wrote:
       | Fun stuff. Why aren't unfinished tasks gc roots?
        
       | [deleted]
        
       | [deleted]
        
       | dehrmann wrote:
       | Another common async footgun I see is unthrottled gathering, and
       | no throttling mechanism in the standard library. Once you gather
       | an unspecified number of awaitables, bad things start to happen,
       | either with CPU starvation, local IO starvation, or hammering an
       | external service.
       | 
       | What I like about threads is they make dangerous things like this
       | harder, and you have to put more thought into how much concurrent
       | work you want outstanding. They also handle CPU starvation better
       | for things that are latency-sensitive. I've seen degenerate
       | requests tie up the event loop with 500 ms of processing time.
        
         | rednafi wrote:
         | Huh! Unless you're using semaphores, you can also recreate
         | similar situation with threads. Spin up a whole bunch of
         | threads and send all of them towards some shared object or make
         | 100s of requests with them.
         | 
         | There's not much difference between spinning up threads
         | explicitly and creating async task with asyncio.create_task. In
         | either case, you can throttle them with semaphores.
        
           | dehrmann wrote:
           | I don't have a source or affected versions, but semaphores
           | can scale poorly. I vaguely remember each blocked acquire
           | getting checked on every event loop iteration, or something
           | silly like that.
        
       | acjohnson55 wrote:
       | Something linters can help with would think?
        
         | ryanianian wrote:
         | C++ has nodiscard which is super useful for scenarios like this
         | where ownership can be tricky.
        
       | smetj wrote:
       | Start a thread/greenthread/fiber/process/task without holding a
       | reference to at least tie all loose ends at exit? Hmm dunno.
        
         | tgv wrote:
         | You can do that in go. You don't even get a reference to the
         | thread/goroutine.
        
         | nixpulvis wrote:
         | Fire and forget.
        
       | crabbone wrote:
       | In many years since asyncio has been added, I have never used it
       | willingly, outside of the cases where a third-party library
       | required it. There has never been a practical benefit for any of
       | that stuff when compared to select. It always worked poorly and
       | never justified the effort one has to put into writing code that
       | uses the library. The behavior OP describes is just one of the
       | many bad design decisions that are so characteristic of this
       | library.
        
       | pyuser583 wrote:
       | I don't find this behavior odd at all. Dereferencing unassigned
       | values is normal Python garbage collector behavior. Threads are
       | an exception (no pun intended), but they're an exception in lots
       | of ways - just try pickling them.
        
       | samsquire wrote:
       | Thank you for this. This is really useful information.
       | 
       | I recently adapted some garbage collection code to add register
       | scanning.
       | 
       | I can imagine all sorts of subtle bugs where things go away
       | randomly. One problem I have with my multithreaded code is that
       | sometimes a thread crashes and the logs are so long I don't
       | notice. From my perspective the thread is just not doing
       | anything.
       | 
       | Sometimes the absence of behaviour can be really tricky to debug!
        
       | sgt wrote:
       | Is this something go developers also have to be careful with when
       | using goroutines?
        
         | gerad wrote:
         | No. But sometimes goroutines have the opposite problem, where
         | they don't terminate and get cleaned up.
         | 
         | https://betterprogramming.pub/common-goroutine-leaks-that-yo...
        
           | [deleted]
        
           | candiddevmike wrote:
           | Is there an (easy?) test for checking goroutine leaks?
        
             | Snawoot wrote:
             | Yes, it's visible on goroutine profile, provided by built-
             | in profiler pprof. E.g.: https://github.com/mysteriumnetwor
             | k/node/issues/5311#issueco...
        
         | Jtsummers wrote:
         | No. Goroutines don't generate a reference to hold onto, either.
         | They just run until they or the program terminate.
        
         | [deleted]
        
       | makomk wrote:
       | Well, this explains that one really annoying intermittent bug
       | that I was having in some asyncio-based code.
        
       | aardvark179 wrote:
       | The same problem or something similar exists in many languages.
       | Threads are GC roots because the OS knows about them, but this
       | may not be true for lightweight threads or async callbacks.
       | 
       | It is hard to fix because you don't want to introduce references
       | from an old object (such as a list of callbacks) to many new
       | objects as that will introduce GC issues, and many other
       | potential leaks.
        
       | jarboot wrote:
       | If I want to create a task that runs even after the function
       | returns, ie "async def f():
       | asyncio.create_task(coro=10_second_coro.run()); return;" is there
       | any way to mitigate this? Function-scoped set of tasks?
        
         | nhumrich wrote:
         | Yes, read the last part of the included documentation and hold
         | onto background tasks.
        
         | jmholla wrote:
         | Your task is implicitly not function-scoped as you want it to
         | survive exiting the function. What your doing here would be
         | better architecturally done with threads. async is not a direct
         | replacement for threading.
         | 
         | But, you could also return the task object to the caller and
         | have them manage it. There's also nothing async about your
         | function, so you don't need the async or to await it.
        
       | cmstodd wrote:
       | Thanks for posting.
        
       | nixpulvis wrote:
       | Hey, at least it's documented... good developers actually RTFM.
       | 
       | I can't comment on the design of this API, because I don't feel
       | like learning the library, but in some performance critical
       | applications these sorts of contracts aren't all that uncommon.
       | Granted, this is python, I guess it's a bit more suspicious, IDK.
        
         | vbernat wrote:
         | The documentation update is quite recent (Python 3.11). It was
         | added after this ticket: https://bugs.python.org/issue44665
         | (not the first ticket around this problem).
        
       | [deleted]
        
       | osigurdson wrote:
       | A little pedantic but HUP concerns the fundamental limits of
       | simultaneously knowing a particle's position and momentum, not
       | about observation impacting outcomes.
        
       | notatoad wrote:
       | wow. yeah, this absolutely explains a heisenbug that i've been
       | chasing for a while. and i can't count the number of times i've
       | had that exact doc page open on my screen in the last few months,
       | and never bothered to read that block of text that starts with
       | "important"...
       | 
       | thanks
        
       | aldenpage wrote:
       | That's extremely insidious. I suppose I never encountered this
       | issue because I almost always call asyncio.gather(*), which makes
       | having a collection of tasks natural.
        
         | kortex wrote:
         | This is good form. It makes top-level control flow easier to
         | follow, and keeps the concurrency scoped.
        
       | BiteCode_dev wrote:
       | And this is why trio got it right, and why I think the task
       | groups (nurseries from trio) can't arrive soon enough in the
       | stdlib.
       | 
       | Because not only you must maintain a reference to any task, but
       | you should also explicitly await it somewhere, using something
       | like asyncio.wait() or asyncio.gather().
       | 
       | Most people don't know this, and it makes asyncio very difficult
       | to use for them.
        
       ___________________________________________________________________
       (page generated 2023-02-11 23:00 UTC)