[HN Gopher] Python stands to lose its GIL, and gain a lot of speed
___________________________________________________________________
Python stands to lose its GIL, and gain a lot of speed
Author : bobajeff
Score : 161 points
Date : 2021-10-17 13:41 UTC (9 hours ago)
(HTM) web link (www.infoworld.com)
(TXT) w3m dump (www.infoworld.com)
| ahmedfromtunis wrote:
| In my opinion, there's a time in a language's life were it should
| slow down the pace of "innovation". Code bases are complexe
| things and updating and upgrading them constantly just to keep up
| with the language maybe counterproductive.
|
| Python is there now, if you ask me. It should slow down and focus
| more on "maintenance" stuff with little to no impact on its
| interface. And maybe work on big projects like multithreading or
| stronger typing on the background and them when they're fully
| ready.
| pmlnr wrote:
| > stronger typing
|
| Please, don't. There are wonderful strongly types languages out
| there, so if one wants or needs a strongly types language, use
| that, and not Python.
| smitty1e wrote:
| The gradual typing available now seems suitable. I have
| written plenty of code in typed contexts and plenty without.
| Python's "consenting adults" approach seems a win.
|
| Perhaps, without the GIL and with typing information
| included, additional performance gains with be on offer.
|
| But the "have it your way" nature of Python is a bigger win
| than either end of the data typing spectrum.
| pmlnr wrote:
| > "have it your way" nature of Python
|
| TMTOWTDI - There's more than one way to do it - is Perl's
| motto :)
|
| Zen of Python suggests there's just one correct way.
| JacobHenner wrote:
| Static typing, not strong typing
| gigatexal wrote:
| These are the kinds of moves Python needs to stay relevant with
| amazing languages like Go out there that are just so simple yet
| powerful.
| sys_64738 wrote:
| Knowledge of the GIL forces a lot of python developers to not try
| to write multithreaded scripts. Doing away with it will make life
| harder for many folks, IMO. Have you tried explaining
| multithreading to some python scripters?
| vram22 wrote:
| pinch.take(salt)
|
| Friend: I saw this interesting tech article ...
|
| Me: Site?
|
| Friend: mumble corp IT site mumble
|
| Me: Bye
| di wrote:
| Previous discussion:
| https://news.ycombinator.com/item?id=28880782
| Animats wrote:
| Yes. This is the discussion we had yesterday, but with more
| hype.
| np_tedious wrote:
| > These changes are major enough that a fair number of existing
| Python libraries that work directly with Python's internals
| (e.g., Cython) would need to be rewritten. But the cadence of
| Python's release schedule just means such breaking changes would
| need to be made in a major point release instead of a minor one.
|
| Maybe time to rethink?
| https://www.techrepublic.com/article/programming-languages-w...
|
| If this is as promising as it sounds, it seems Python 4 now has
| its "thing" and is on the horizon. Or at least may become a
| serious thing to talk about
| [deleted]
| gwking wrote:
| I began using python during the python3.0 betas, and I watched
| the 2 vs 3 saga from the (unusual?) perspective of a v3
| hobbyist with no back-compat requirements.
|
| What struck me as most significant was the opportunistic
| breakage of things not related to the unicode transition. In
| the many years it took to win people over to v3, they could
| have marched over all the breaking changes a year at a time.
| Given that side-by-side installs of python3.x point versions
| are very functional, with or without venvs, this would have
| been much more palatable. Perhaps harder than it sounds though.
|
| I attempted a couple of 2to3 translations of open source
| libraries over the years, with varying degrees of success.
| Every time I found that most of the changes were easy, but
| debugging the broken bits was hard due to the sheer volume of
| source changes. If instead I could have done conversions where
| there was only a single major semantic change at a time, it
| would be so much easier to figure out what was going wrong at
| any given step. Furthermore, I imagine that a single-breaking-
| change mentality would lead to better documentation on how to
| transition for each version.
|
| For this reason, I have become rather suspicious of yearly
| release schedules. Swift is even more frustrating: the version
| changes are really just dictated by Apple's yearly PR calendar.
| Some big things get rushed out for WWDC before they are ready,
| and smaller fixes can get held back until the next year. I
| would much rather that the language teams just prioritize one
| thing at a time, release it when it is ready, and foster a
| community where staying up-to-date on the latest version is
| easy and desirable (a more complicated story for Apple than for
| Python I think, due to ABI, OS version, etc).
|
| From past discussions on HN I've gathered that there is such a
| thing as release fatigue, where developers get irritated when
| libraries release breaking changes too often. Nevertheless I
| often wonder if languages and libraries could improve faster by
| making more breaking changes, one at a time, with robust side-
| by-side installs to facilitate testing across versions. I wish
| side-by-side library versions were possible in Python, just to
| facilitate regression testing.
|
| Bringing this all back to the post, I sincerely hope that if
| Python 4 is a breaking change to the GIL, that it will be only
| that.
|
| I'm curious what others think about all this. Thoughts?
| kevin_thibedeau wrote:
| > If instead I could have done conversions where there was
| only a single major semantic change at a time,
|
| That was the point of the "from __future__" imports. You
| could get most of the way toward Python 3 so that 2to3 would
| be easier to work with and the new semantics could be
| gradually baked into the code prior to migration.
|
| Python 3 had 25 years of cruft to clean up. They won't have
| to do that again.
| fractalb wrote:
| If every release has a single breaking change, then that
| language is said to be unstable/not-production ready. IMHO,
| that's not at all an acceptable way of doing point releases.
| People will just be scared of new releases. No one will adopt
| a new language version as soon as it releases. Java never
| breaks backwards compatibility and still there are people
| running Java 8. Imagine what would happen if every point
| release carries breaking changes. It makes you feel that the
| language is not mature, the library ecosystem broken, since
| you'll have to keep track of version compatibility for each
| library that you use. It's a nightmare for both library
| developers and end-users. Few people would like to use such a
| language
| nuerow wrote:
| > _If every release has a single breaking change, then that
| language is said to be unstable /not-production ready.
| IMHO, that's not at all an acceptable way of doing point
| releases._
|
| This.
|
| It makes absolutely no sense to claim that having to deal
| with a single non-backwards compatible release is somehow
| worse than having to deal with a sequence of non-backwards
| compatible releases.
|
| Even though the migration from Python2 to Python3 faced
| some resistence, if anything the decision was totally
| vindicated.
| ikerdanzel wrote:
| Python makes it to #1 over Java now. Python is breaking
| changes frequently. A lot of libraries need to be "tweak"
| to get it working even incremental changes within 3.x let
| along the big rift from 2 to 3. Although logically, few
| would like such language that breaks, current market for
| Python adoption buck this trend.
| xvland wrote:
| Idealists and free developers (who have created the majority
| of the Python interpreter) agree with you.
|
| Corporate developers, who have taken over Python and other
| people's work, like unnecessary changes, because they get
| many billable hours of seemingly complex work that can be
| done on autopilot.
|
| Corporations might even take over more C extensions whose
| developers are no longer willing to put up with the churn and
| who have moved to C++ or Java.
|
| In the long run, this is bad for Python. But many developers
| want to milk the snake until their retirement and don't care
| what happens afterwards.
| simonw wrote:
| "Corporate developers, who have taken over Python and other
| people's work, like unnecessary changes, because they get
| many billable hours of seemingly complex work that can be
| done on autopilot."
|
| In my 20+ year career I have never worked with a programmer
| that matches this description.
|
| Maybe I've got lucky.
| isoprophlex wrote:
| O boy, angry rant incoming. I'll say something petulant and
| overly dramatic, but I don't like the direction in which
| python is going, and I'm glad there's finally some news about
| focus on actual innovation instead of tacking on syntactic
| cruft.
|
| I want the python that Guido promised me, with 2021
| performance. I don't want some abhorrent committee-designed
| piece of middle-of-the-road shitware glue language that I
| must use because everyone uses it.
|
| I want a language that doesn't spin it's single-threaded
| wheel in a sea of CPU cores, and I want a language that has
| one obvious way of doing things without needing to grok and
| parse dumb """clever""" hacks that will only be abused by
| midlevel programmers to show off hoe they saved typing a few
| lines of additional code.
|
| To me, speed + simplicity = ergonomy = joy. I want a new
| python 4 to focus exclusively and intensely on performance
| improvements and ergonomy.
| DangitBobby wrote:
| What recent changes to the language do you specifically
| dislike?
| dataflow wrote:
| Not the parent but := assignment expressions and match
| expressions are abominations.
| DangitBobby wrote:
| I'll point out that the walrus operator was actually
| accepted while Guido was still BDFL (and the vitriol
| surrounding the decision to include it led directly to
| him stepping down from the position [1]), so even
| accepting the fact that it's a poor addition to the
| language, does not provide support for the statement that
| "design by committee" has lead to poor language design
| decisions.
|
| 1. https://pythonsimplified.com/the-most-controversial-
| python-w...
| dataflow wrote:
| You'll want to post that as a reply to the parent you
| originally replied to, since I'm not the one who said
| anything about design-by-committee.
| DangitBobby wrote:
| They gave no specific criticisms. This thread was born of
| a request for specific criticisms. When that happens, I
| try to operate as though the assumptions laid out in the
| parents hold for the children. I think this makes sense
| to do, especially when you appeared to step in as a proxy
| expanding on the parent's opinion. Even if that wasn't
| your intention, this is a public thread, and the most
| relevant place to post things as a response to a
| sentiment in a thread may not be directly to a person who
| holds that exact sentiment. If you don't take issue with
| "design by committee" then you need not be concerned. I
| don't think you think that, and I think no less of you
| regardless.
| dataflow wrote:
| I meant it moreso that the person whom the reply is
| actually relevant to might not see it orherwise, but
| whatever, it's fine with me.
| asah wrote:
| Disagree: the recent changes are things I put to work
| immediately and in a large fraction of the code. They're
| not niche and "should have" been added years ago. If
| anything, I'm thrilled with the work of the "committee,"
| whose judgments are better than the result of any
| individual. Postgres is the same.
|
| Gone are the days when you invest in a platform like
| python, and they make crazy decisions that kill the
| platform's future (e.g. perl5). Ignore small syntax stuff
| like := and focus on the big stuff.
| dataflow wrote:
| > Disagree: the recent changes are things I put to work
| immediately and in a large fraction of the code.
|
| That says nothing about their quality. It just says you
| like them. If you gave me unhealthy food I'd probably eat
| it immediately too. Doesn't mean I think it's good for
| me.
|
| > Ignore small syntax stuff like := and focus on the big
| stuff.
|
| They're not "small" when you immediately start using them
| in a "large fraction of your code". And a simple syntax
| that's easy to understand is practically Python's raison
| d'etre. They added constructs with some pretty darn
| unexpected meanings into what was supposed to be an
| accessible language, and you want people to ignore them?
| I would ignore them in a language like C++ (heck, I would
| ignore syntax complications in C++ to a large degree),
| but ignoring features that make _Python_ harder to read?
| To me that 's like putting performance-killing features
| in C++ and asking people to ignore them. It's not that I
| _can 't_ ignore them--it's that that's not the point.
| DangitBobby wrote:
| I simply do not understand how the walrus operator is
| harder to read. Maybe an example?
| my_match = regex.match(foo) if my_match:
| return my_match.groups() # continues with the now
| useless my_match in scope
|
| Versus if my_match := regex.match(foo):
| return my_match.groups() # continues without
| useless my_match in scope
|
| How is the second one less readable? Have you ever heard
| of a real world example of a beginner or literally anyone
| ever actually expressing confusion over this?
| dataflow wrote:
| The problem isn't that simple use case. Although even in
| that case, they already had '=' as an assignment
| operator, and they could've easily kept it like the
| majority of other languages do instead of introducing an
| inconsistency.
|
| The more major problem with the walrus operator is more
| complicated expressions they made legal with it. Like,
| could you explain to me why making _these_ legal was a
| _good_ thing? def foo() return
| ... def bar(): yield ... while
| foo() or (w := bar()) < 10: # w is in-scope
| here, but possibly nonexistent! # Even in C++
| it would at least *exist*! print(w)
| # The variable is still in-scope here, and still
| *nonexistent* # Ditto as above, but even worse
| outside the loop print(w := w + 1)
|
| If they just wanted your use case, they could've made
| only expressions of the form 'if var := val' legal, and
| _maybe_ the same with 'while', not full-blown
| assignments in arbitrary expressions, which they had
| (very wisely) prohibited for decades for the sake of
| readability. And they would've scoped the variable to the
| 'if', not made it accessible after the conditional. But
| nope, they went ahead and just did what '=' does in any
| language, and to add insult to injury, they didn't even
| keep the existing syntax when it has exactly the same
| meaning. And it's not like they even added += and -= and
| all those along with it (or +:= and -:= because
| apparently that's their taste) to make it more useful in
| that direction, if they really felt in-expression
| assignments were useful, so it's not like you get those
| benefits either.
| DangitBobby wrote:
| In your example, if you leave out the parentheses around
| w := bar(), you get "SyntaxError: cannot use assignment
| expressions with operator" which makes me think it's a
| bug in the interpreter and not intentionally designed to
| allow it.
|
| I am baffled to learn that it's kept in scope outside of
| the statement it's assigned, and I agree it would have a
| negative impact on readability if used outside of the if
| statement.
| dataflow wrote:
| > if you leave out the parentheses around w := bar(), you
| get "SyntaxError: cannot use assignment expressions with
| operator" which makes me think it's a bug in the
| interpreter and not intentionally designed to allow it.
|
| No, I'm pretty sure that's intentional. You want the
| left-hand side of an assignment to be crystal clear,
| which "foo() or w := bar()" is not. It looks like it's
| assigning to (foo() or w).
| DangitBobby wrote:
| To be clear: def thing(): return True
| if thing() or w:= "ok": # SyntaxError: cannot use
| assignment expressions with operator pass
| print(w) . . . if thing() or (w
| := "ok"): pass print(w) # NameError:
| name 'w' is not defined
|
| The first error makes me think your concern (that w is
| conditionally undefined) was anticipated and supposed to
| be guarded against with the SyntaxError. I believe the
| fact you can bypass it with parentheses is a bug and not
| an intentional design decision.
| dataflow wrote:
| Oh I see, you're looking at it from that angle. But no,
| it's intentional. Check out PEP 572 [1]:
|
| > _The motivation for this special case is twofold.
| First, it allows us to conveniently capture a "witness"
| for an any() expression, or a counterexample for all(),
| for example:_ if any((comment :=
| line).startswith('#') for line in lines):
| print("First comment:", comment) else:
| print("There are no comments")
|
| I have a hard time believing even the authors (let alone
| you) could tell me with a straight face that that's easy
| to read. If they really believe that, I... have questions
| about their experiences.
|
| The beauty of Python...
|
| [1] https://www.python.org/dev/peps/pep-0572/
| DangitBobby wrote:
| Your new example makes me wonder: if I can intentionally
| conditionally bring variables into existence with the
| walrus operator, what's the motivation behind the
| SyntaxError in my statement above? I maintain my belief
| that the real issue here is, readability aside, if blocks
| do not implement a new scope, which has always been a
| problem in the language. The walrus operator just gives
| you new ways to trip over that problem.
|
| From the PEP:
|
| > An assignment expression does not introduce a new
| scope. In most cases the scope in which the target will
| be bound is self-explanatory: it is the current scope. If
| this scope contains a nonlocal or global declaration for
| the target, the assignment expression honors that. A
| lambda (being an explicit, if anonymous, function
| definition) counts as a scope for this purpose.
|
| I find this particularly strange and inconsistent:
| lines = ["1"] [(comment :=
| line).startswith('#') for line in lines]
| print(comment) # 1 [x for x in range(3)]
| print(x) # NameError: name 'x' is not defined
| dataflow wrote:
| > what's the motivation behind the SyntaxError in my
| statement above?
|
| I'm pretty sure it's what I explained here:
| https://news.ycombinator.com/item?id=28899404
| DangitBobby wrote:
| I did not understand what you meant.
| dataflow wrote:
| I'm saying it's the same reason why (x + y = z) should be
| illegal even if (x + (y = z)) is legal in any language.
| It's not specific to Python by any means. The target of
| an assignment needs to be obvious and not confusing. You
| don't want x + y to look like it's being assigned to.
| DangitBobby wrote:
| I see. It has low precedence in the operator hierarchy
| [1] so False or w := 1:
|
| Is grouped like so: (False or w) := 1
|
| Which is a SyntaxError. That's... not a smart place for
| it to be in the operator hierarchy. I expected it to be
| near the very top, like await.
|
| 1. https://docs.python.org/3/reference/expressions.html#o
| perato...
|
| Edit: 20 minutes later, can't respond.
|
| There are two aspects I have been thinking about while
| looking at this: Introduction of non-obvious behavior
| (foot-guns) and readability. Readability is important,
| but I have been thinking primarily about the foot-gun
| bits, and you have been emphasizing the readability bits.
| I can't really accurately assess readability of something
| until I encounter it in the wild.
| dataflow wrote:
| If the precedence was higher then you'd get a situation
| like x := 1 if cond else 2
|
| never resulting in x := 2 which is pretty unintuitive.
|
| And you have to realize, even if the precedence works
| out, nobody is going to remember the full ordering for
| every language they use. People mostly remember a partial
| order that they're comfortable with, and the rest they
| either avoid or look up as needed. Like in C++, I
| couldn't tell you exactly how (a << b = x ? c : d) groups
| (though I could make an educated guess), and I don't have
| any interest in remembering it either.
|
| Ultimately, this isn't about the actual precedence. Even
| if the precedence was magically "right", it's about
| readability. It's just not readable to assign to a
| compound expression, even if the language has perfect
| precedence.
| [deleted]
| eesmith wrote:
| While the walrus operator gives a way to see this sort of
| non-C++ behavior, it's more showing that Python isn't C++
| than something special about the operator.
|
| Here's another way to trigger the same NameError, via
| "global": import random def
| foo(): return random.randrange(2) def
| bar(): global w w = return
| random.randrange(20) return w
| while foo() or (bar() < 10): print(w)
|
| For even more Python-is-not-C++-fun:
| import re def parse_str(s): def
| m(pattern): # I <3 Perl! nonlocal _
| _ = re.match(pattern, s) return _ is not
| None if m("Name: (.*)$"):
| return ("name", _[1]) if m("State: (..) City:
| (.*)$"): return ("city", (_[2], _[1]))
| if m(r"ZIP: (\d{5})(-(\d{4}))?$"): return
| ("zip", _[1] + (_[2] if _[2] else ""))
| return ("Unknown", s) del _ # Remove this
| line and the function isn't valid Python(!)
| for line in ( "Name: Ernest Hemingway",
| "State: FL City: Key West", "ZIP: 33040",
| ): print(parse_str(line))
| dataflow wrote:
| Right, I'm _quite_ well-aware of that, but I 'm saying
| this change has made the situation even worse. If they
| ensured the variables were scoped and actually
| initialized it'd have actually been an improvement.
| pansa2 wrote:
| The walrus operator provides no benefit here. `my_match`
| is still in scope in both cases.
|
| Python's `if` statements do not introduce a new scope.
| DangitBobby wrote:
| I know they don't, normally. I really thought that was
| basically the point of the walrus operator to begin with,
| that the variable was only in scope for the lifetime of
| the if statement where it's needed. Huge bummer to find
| out that's not true.
| dataflow wrote:
| Looks like now you're seeing why it's an abomination ;)
| DangitBobby wrote:
| IMO the real abomination was already present in the
| language, which is that if blocks do not introduce new
| scope. My IDE protects me from the bugs this could easily
| introduce when I try to use a variable that may not yet
| be in scope, but it should be detected before runtime.
|
| I will readily admit that the walrus operator doesn't do
| what I thought it did and I have no interest in whatever
| utility it provides as it exists today.
| dataflow wrote:
| > IMO the real abomination was already present in the
| language, which is that if blocks do not introduce new
| scope.
|
| Definitely. You would think if they're going to undermine
| decades of their own philosophy, they would instead
| introduce variable declarations and actually help
| mitigate some bugs in the process.
| DangitBobby wrote:
| I have now heard specific concerns with := that make
| sense to me, but what about match expressions? What about
| them do you not like?
| pansa2 wrote:
| IMO the match statement has some very unintuitive
| behaviour: match status:
| case 404: return "Not found"
| not_found = 404 match status: case
| not_found: return "Not found"
|
| The first checks for equality (`status == 404`) and the
| second performs an assignment (`not_found = status`).
|
| `not_found` behaving differently from the literal `404`
| breaks an important principle: "if you see an
| undocumented constant, you can always name it without
| changing the code's meaning" [0].
|
| [0] https://twitter.com/brandon_rhodes/status/13602261083
| 9909990...
| dataflow wrote:
| Aw dang you spoiled it :-) I was hoping my example would
| be more fun to work through haha.
| DangitBobby wrote:
| I see. Is it fair to say your issue with the feature is
| less about not wanting the feature and more about the
| implementation details?
| dataflow wrote:
| What's the output of this program? class
| C(object): A = 1 B = 2
| x = 3 y = 10 print(x - B) match y:
| case C.A: print('A') case B:
| print(y - B)
| DangitBobby wrote:
| Do you not like the idea of pattern matching as a feature
| or do you not like the implementation details? This kind
| of seems like another clumsy scoping problem, no?
| dataflow wrote:
| I would love a good pattern matching feature, but this is
| not it. And this is a seriously broken design at a
| fundamental level, not an "implementation detail". I
| actually have no clue how it's _implemented_ and couldn
| 't care less honestly. I just know it's incredibly
| dangerous for the user to actually use, and incredibly
| unintuitive on its face. It's as front-and-center as a
| design decision could possibly be, I think.
|
| And no, this is not really a scoping issue. Match is
| literally writing to a variable in one pattern but not
| the other. A conditional write is just a plain
| inconsistency.
|
| The sad part is both of these features are stumbling over
| the fact that Python doesn't have variable
| declarations/initialization. If they'd only introduced a
| different syntax for initializations, both of these could
| have been much clearer.
| DangitBobby wrote:
| > I actually have no clue how it's implemented and
| couldn't care less honestly.
|
| I guess I'm not sure where "design" ends and
| "implementation" begins? To me, how to handle matching on
| variables that already exists is both, because "pattern
| matching and destructuring" are the features and how that
| must work in the context of the actual language is
| "implementation". It being written in a design doc and
| having real world consequences in the resulting code
| doesn't make it not part of the implementation.
|
| Instead of quibbling over terms, I was much more
| interested in whether you like the idea of pattern
| matching.
|
| I think not liking the final form a feature takes in the
| language is fundamentally different from wholesale
| disliking the direction the language design is going.
| dataflow wrote:
| Design is the thing the client sees, implementation is
| the stuff they don't see. In this case the user is the
| one using match expressions. And they're seeing variables
| mutate inconsistently. It's practically impossible for a
| user _not_ to see this, even if they wanted to. Calling
| that an implementation detail is like calling your car 's
| steering wheel an implementation detail.
|
| But I mean, you can call it that if you prefer. It's just
| as terrible and inexcusable regardless of its name. And
| yes, as I mentioned, I would have loved to have a good
| pattern matching system, but so far the "direction"
| they're going is actively damaging the language by
| introducing more pitfalls instead of fixing the existing
| ones (scopes, declarations, etc.). Just because pattern
| matching in the abstract could be a feature, that doesn't
| mean they're going in a good direction by implementing it
| in a broken way.
|
| I guess like they say, the road to hell is paved with
| good intentions.
| isoprophlex wrote:
| The walrus operator is a tired old trope to hate on, but
| I dont see the point personally. Same goes for the
| structural pattern matching thing. The tacking on of
| typing features feels superfluous in a language thats not
| compiled or even strongly typed.
|
| But for the sake of maximum pendatry let me paste some
| nitpicky little detail from a somewhat recent syntactic
| addition: >>> def f(a, b, /, **kwargs):
| ... print(a, b, kwargs) ... >>> f(10,
| 20, a=1, b=2, c=3) 10 20 {'a': 1,
| 'b': 2, 'c': 3} a and b are used in two
| ways. Since the parameters to the left of
| / are not exposed as possible keywords, the parameters
| names remain available for use in **kwargs
|
| Jesus fucking hell on a tricylce so now i have *'s and
| /'s showing up in function signatures so someone can
| prematurely optimize the re-use of variable names without
| breaking backwards comparability?!
|
| Python is becoming a mockery, dying a death through a
| thousand little cuts to its ergonomics.
| ptx wrote:
| I'm sure you're already aware of this example since it's
| the canonical one, but to me personally the point is very
| clear: I use regular expressions all the time and always
| have to write that little bit of boilerplate, which the
| walrus operator now lets me get rid of.
|
| Avoiding tedious boilerplate by adding nice features like
| the walrus operator is precisely what lets us avoid
| "death through a thousand little cuts to its ergonomics",
| in my view.
|
| Sure, maybe writing m = re.match("^foo",
| s) if m != None: ...
|
| isn't so bad, but in that case maybe writing
| i = 0 while i < len(stuff): element =
| stuff[i] ... i += 1
|
| wouldn't be so bad, and we could get rid of Python's
| iterator protocol?
| mixmastamyk wrote:
| It should have been "if ... as y" and reused existing
| syntax. I've never seen anyone use the extended variant
| (multiple assignment) that walrus allows. The extra
| colons with this and typing makes it look like a standard
| punctuation-heavy language we sought to avoid in the
| first place.
| dataflow wrote:
| I think regex matches might be literally the only use
| case for := that I come across with any kind of
| nontrivial frequency, and it's only a minor nuisance at
| that. Certainly nothing to warrant an entirely new yet
| different syntax for something we already have.
|
| The iterator protocol is _way_ more general than what you
| have; it 's not remotely comparable.
| dragonwriter wrote:
| AFAIK, the _purpose_ of " /" is so that python-
| implemented functions can be fully signature-(and,
| therefore, also type-)compatible with builtins and
| C-implemented functions that required positional
| arguments but do not accept those arguments being passed
| as keyword arguments.
| wenc wrote:
| > The tacking on of typing features feels superfluous in
| a language thats not compiled or even strongly typed.
|
| Python has always been strongly typed (Python has strong
| dynamic typing). Adding typechecks moves it towards being
| gradually/statically-typed.
| ahmedfromtunis wrote:
| Not the OP but some of the late additions to Python were,
| *in MY very humble opinion* not very pythonic; just a
| syntactic sugar that meant there are now more than just
| one way of doing things.
|
| On that list: the walrus operator and the new switch
| thing. If I understand them fully and correctly, those
| two things don't enable developers to do things that were
| impossible before, instead they add new ways to do things
| that were possible prior.
|
| That's the Python I know and love.
|
| Of course, this doesn't mean I'll love Python any less,
| just that I wished there were more focus on staff that
| matters like the topic of this article. Or maybe getting
| type hinting better.
|
| Again, this is just my opinion.
| DangitBobby wrote:
| IMO, the "one obvious way to do things" has always been a
| comforting fiction. There are numerous ways to do
| everything, the worst offenders forcing people to make
| tradeoffs between debuggability and readability (ie, for
| loops versus list comprehensions). Many of them are
| purely about readability (ternary expression versus if
| blocks) and many of them are about style (ternary
| expression versus use of or/and short-circuiting). Even
| so, before the walrus operator, there was never a way to
| define a variable that only existed in the scope of a
| particular if statement.
|
| After using pattern matching in Rust and switch
| statements in JavaScript, I personally am very excited
| for that addition to Python, but I understand the feature
| is divisive and will concede it as a matter of opinion.
|
| Edit: turns out the walrus operator does not cause the
| variable to move out of scope after the if block, which
| is disappointing. IMO the worse anti-pattern has already
| been part of the language, which is not creating new
| scopes for if statements.
| klyrs wrote:
| Cython itself should be a relatively simple fix (relative to
| the difficulty that Cython devs are accustomed to). Libraries
| that use Cython in a pure way (that is, not fussing with
| refcounts in hand-written C code) should "just work" after
| Cython gets updated. It's the poor folk who have done straight
| C extensions without the benefit of Cython that I'm concerned
| about.
| dangerbird2 wrote:
| I'd wonder if it would be easier to introduce a totally new API
| along the lines of ruby's ractor API[1] that enables thread
| parallelism while keeping existing Thread behavior identical as
| with the GIL. Tons of python code relies on threaded code that
| is thread-safe under the GIL, but would completely blow up if
| the GIL was naively replaced.
|
| [1] https://docs.ruby-lang.org/en/master/doc/ractor_md.html
| arthurcolle wrote:
| Ractors don't offer very good performance yet. Better to have
| it be awesome right off the bat
| dangerbird2 wrote:
| Yeah, That's what I thought. I think the greatest barrier
| now is that most multithreaded python code right now is
| _just barely_ thread-safe, even with the GIL. I
| occasionally have to remind colleagues that even though the
| GIL guarantees instructions are atomic, you need to use
| mutexes and other synchronization primitives to ensure
| there is no race condition between multiple instructions. I
| 'd imagine this change would be an optional interpreter
| feature initially, since removing the GIL would break the
| vast majority of code out in the wild, and it would be much
| more difficult to create an automated conversion tool like
| they did with the syntactic changes between 2.7 and 3
| lpapez wrote:
| A simple solution would be to introduce two new types:
| ConcurrentThread and ParallelThread. Alias the old Thread to
| the ConcurrentThread and keep the behaviour. No breaking
| changes, easy to explain the differrence. People who need it
| can use the new truly parallel version.
| pvg wrote:
| The other day:
|
| https://news.ycombinator.com/item?id=28880782
| kjeetgill wrote:
| Anyone know how far along graalpython is? I'd imagine it should
| be a suitable, if not superior, replacement with about as much
| effort right?
| aasasd wrote:
| I mean, it stands to evolutionary reason that Python shouldn't
| have a GIL.
| aeturnum wrote:
| I worked in Python for years and while I suppose I'm glad for any
| improvement, I have never understood the obsession with true
| multi-threading. Languages are about trade-offs and Python, again
| and again, chooses flexibility over performance. It's a good
| choice! You can see it in how widely Python is used and the
| diversity of its applications.
|
| Performance doesn't come from any one quality, but from the
| holistic goals at each level of the language. I think some of the
| most frustrating aspects from the history of Python have been
| when the team lost focus on why and how people used the language
| (i.e. the 2 -> 3 transition, though I have always loved 3). I
| hope that this is a sensible optimization and not an over-
| extension.
| darthrupert wrote:
| Lack of multithreading can easily be a win for a language. A
| tiny subset of problems really needs it these days and for
| everything else it's a potential way to either screw things up
| or make them way more complicated that needs to be.
| cm2187 wrote:
| > _A tiny subset of problems_
|
| like processing web requests?
| emrah wrote:
| So if you don't like it or want it, don't use it then? Why
| does it have to be missing altogether for you to be happy?
| cm2187 wrote:
| Popularity has probably as much to do, if not more to do, with
| ease of access (or lack of alternative) than good design of the
| language. Php is equally if not more popular than python.
| aeturnum wrote:
| I'm not a PHP expert, but I did not know it was also used in
| data science, game programming, embedded programming and
| machine learning as Python is. Of course they are both used
| for web services.
| emerged wrote:
| From my perspective as a huge Python fan, efficient
| multithreading is simply the only major thing missing from the
| language. I would still use C/C++/assembly for bleeding edge
| performance needs, but efficient multithreading in Python would
| have me reaching for alternatives far less often.
|
| Basically I love peanut butter ice cream (Python) I'd just like
| it even more with sprinkles.
| dec0dedab0de wrote:
| I agree, but I don't do anything that can be split up, and
| would benefit from sharing memory. That is really the only
| benefit of removing the GIL. Multiprocessing can do true
| concurrency, and so can Celery, which even allows you to use
| multiple computers. The only time that is a pain is when you
| need to share memory, or I guess maybe if you're low on
| resources and can't spare the overhead from multiple processes.
|
| I think a JIT would be the best possible improvement for
| CPython as far as speed is concerned. Though I can imagine
| there are plenty of people doing processor heavy stuff with c
| extensions that would benefit from sharing memory. So from
| their perspective removing the GIL would be a better
| improvement.
|
| So basically a JIT would help every Python program, and
| removing the GIL would only help a small subset of Python
| programs. Though I'm just happy I get to make a living using
| Python.
| klyrs wrote:
| Losing the GIL strictly makes the language strictly more
| flexible. Previous GILectomies tanked performance to an
| unacceptable degree. In single-threaded code, this one is a
| moderate performance improvement in some benchmarks, and a
| small detriment in others -- which is about as close to perfect
| as one could expect from such a change. That's why people are
| excited about it.
|
| At a higher level, Python is getting serious about performance.
| But this gives both flexibility _and_ performance.
| aeturnum wrote:
| Yah, that's definitely the future I'm hoping for. What I am
| worried about are the kind of transition issues I mentioned.
| Python 2 -> 3 strictly made the language more flexible too -
| but the Python ecosystem is about existing code almost more
| than the language and I worry that we could find similar
| problems here. Potential for plenty of growing pains while
| chasing relatively small gains.
| ynik wrote:
| In the company I'm working for, we had to spent more
| engineer time on GIL workarounds (dealing with the extra
| complexity caused by multiprocessing, e.g. patching C++
| libraries to put all their state into shared memory) than
| we needed for the Python 2 -> 3 migration. And we've only
| managed to parallelize less than half of our workload so
| far.
|
| Even if this will be a major breaking change to Python,
| it'll be worth it for us.
| m0zg wrote:
| One does not preclude another: the language can be flexible and
| offer higher concurrency that it does now. My workstation has
| 64 hyperthreads. Python can use one at a time. That's messed up
| since I use it as a general purpose language.
| didip wrote:
| This is because Python, by luck, ended up dominating the data
| science market.
|
| In this market you really want to shuffle tons of data quickly,
| and that's usually achieved through parallelism.
|
| Python multiprocess library does a poor job at that.
| anthk wrote:
| That's calling C and Fortran in the background, actually.
| amelius wrote:
| > Performance doesn't come from any one quality, but from the
| holistic goals at each level of the language.
|
| It starts to become an issue when you have built a few well-
| performing subsystems and now want them to run together and
| interact. With the GIL, your subsystems are suddenly _not_
| performing as well anymore. Without the GIL, you can still get
| good performance (within limits of course).
|
| Performance referring here to throughput and/or latency
| (responsiveness).
| tester756 wrote:
| >again and again, chooses flexibility over performance. It's a
| good choice! You can see it in how widely Python is used and
| the diversity of its applications.
|
| What does it mean? how is python different here than Java/C#?
| aeturnum wrote:
| I mean, you can modify Python code at runtime if you like.
| This has a good overview of all the nonsense happening under
| the hood: http://jakevdp.github.io/blog/2014/05/09/why-
| python-is-slow/
| cma wrote:
| There are 64-core, 128-thread prosumer CPUs now and it is only
| going to go higher. At some point it just becomes necessary.
| turminal wrote:
| What does a 128 thread python app do better than 128 single
| threaded ones?
| gypsyharlot wrote:
| Shared L3 cache.
| jhoechtl wrote:
| OS overhead of 128 processes is higher than scheduling 128
| tasks. Varies from os to os, but it's especially bad on
| Windows.
| turminal wrote:
| Yeah, I know about that argument but it just doesn't make
| sense to me. Removing the GIL means that 1) you make your
| language runtime more complex and 2) you make your app
| more complex.
|
| Is it truly worth it just to avoid some memory overhead?
| Or is there some other windows specific thing that I'm
| missing here?
| dragonwriter wrote:
| > Yeah, I know about that argument but it just doesn't
| make sense to me. Removing the GIL means that 1) you make
| your language runtime more complex and 2) you make your
| app more complex.
|
| #2 need not be true; e.g., the approach proposed here is
| transparent to most Python code and even minimized impact
| on C extensions, still exposing the same GIL hook
| functions which C code would use in the same
| circumstances, though it has slightly different effect.
| Redoubts wrote:
| marshal data
| turminal wrote:
| Care to elaborate? What does that change for an average
| webapp?
| yuliyp wrote:
| Say your webapp talks to a database or a cache. It'd be
| really nice if you could use a single connection to that
| database instead of 64 connections. Or if you wanted to
| cache some things on the web server, it would be nice if
| you could have 1 copy easily accessible vs needing 64
| copies and needing to fill those caches 64x as much.
| semiquaver wrote:
| Unfortunately using a single db/RPC connection for many
| active threads is not done in any multithreaded system
| I'm aware of for good reasons. Sharing this type of
| resource across threads is not safe without expensive and
| performance-destroying mutexes. In practice each thread
| needs exclusive access to its own database connection
| while it is active. This is normally achieved using
| connection pooling which can save a few connections when
| some threads are idle, but 1 connection for 64 active web
| worker threads is not a recipe for a performant web app.
| If you can point to a multithreaded web app server that
| works this way I'd be very interested to hear about it.
|
| The idea of a process-local cache (or other data) shared
| among all worker threads is a different story. I see this
| as one of the bigger advantages of threaded app servers.
| However, preforking multiprocess servers can always use
| shmget(2) to share memory directly with a bit more work.
| cma wrote:
| That's slower than just doing it single threaded for many
| use cases.
| [deleted]
| heinrichhartman wrote:
| No shared memory. To communicate between processes you
| usually use sockets, to communicate between threads you
| mutate variables. This is a huge performance difference.
| aeturnum wrote:
| Yes higher core counts are more and more common, but the
| language has thirty years of single-threaded path-dependence.
| Lots of elements of it work the way they do because there was
| a GIL. I could be wrong, but I am skeptical that Python will
| ever be the best choice for high performance code. It's
| always worth improving the speed of code when you can, but
| more often than not you "get" something for going slower. I
| hope my worries are wrong and this is actually a free win!
| randtrain34 wrote:
| Design doc the proposer linked:
| https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsD...
| lvass wrote:
| I have seen and written Python code that spawns various threads
| with shared mutable state. Is it possible that some day the same
| code would run in parallel? That could be a terrible (very)
| breaking change. I'm not against allowing in-process parallel
| execution but please let it require a new API.
| ajkjk wrote:
| Isn't that.. Already the case? The gil doesn't prevent thread
| switching in the middle of python code.
| lvass wrote:
| It's concurrent not parallel. The switch won't happen inside
| the execution of one opcode including some dictionary update
| operations so it's safe in many cases where parallel
| execution isn't.
| ajkjk wrote:
| Yes, single-instruction operations would be fine, but if
| you're writing multithreaded code you are probably doing
| things that the GIL doesn't protect all the time. Like
| dict-updates on classes that implement __set__, or `if not
| a[x]: a[x] = y` sorts of two-phased checks, or just like,
| anything else. You can't get very far with global state
| without reckoning with concurrency, GIL or not.
|
| I assume that a change to relax the GIL will both allow you
| to opt-out of it, and allow you to use locking versions of
| primitive data-structures, anyway; it's not like it's going
| to just vanish overnight with no guardrails.
| colinmhayes wrote:
| Seems like checking the docs for atomicity of every
| operation is a huge pain in the ass.
| ajkjk wrote:
| It really is, they don't make it clear at all. Every time
| I have to ask the question of "is this atomic under the
| GIL" I struggle to find the right answer.
| kzrdude wrote:
| Best to avoid sharing data, and never mutating shared
| data!
|
| Rust has a great rule: Sharing XOR Mutation.
|
| Python is higher level, so message passing and passing
| "owned" values between threads is all the more feasible
| and sensible.
| ynik wrote:
| Currently, dictionary updates are atomic only if the keys
| are primitive types.
|
| `dict.update()` will call methods like `__eq__`, and those
| methods (if implemented in Python) may temporarily release
| the GIL.
| kzrdude wrote:
| Unfortunately the Python docs says that dict.update() is
| atomic. It's being fixed though... It came out during
| these discussions.
| zelphirkalt wrote:
| It is probably a bad practice to not acquire a mutex for
| that concurrent dictionary update. The code should be
| improved in that regard, with or without any potential
| Python language change.
| lvass wrote:
| If needed, I'll probably just change it to asyncio,
| probably saner than inspecting everything for new
| parallelism bugs which can be incredibly subtle.
| ajkjk wrote:
| If performance isn't hugely important you could make
| blanket-locking wrappers around common data structures
| and swap them in-place for all of your global state.
|
| .. but, as I said, removing the GIL will almost certainly
| be opt-in.
| pkulak wrote:
| That was my first thought as well. I use Python to whip up
| quick scripts and enjoy not having to worry about shared
| memory, even when I'm using concurrency. I'd hate to lose that.
| Spivak wrote:
| Don't you still have to worry about this since OS threads can
| be arbitrarily preempted?
| pkulak wrote:
| That's fine. You don't have to worry about pausing, you
| have to worry about multiple threads getting at some memory
| at the same time.
| juanbyrge wrote:
| Translation: I have written buggy, racy software that has
| specific dependencies on thread timing. Please do not make
| significant improvements to Python because it will reveal these
| bugs in my software, and I will be forced to fix the bugs and
| use proper synchronization.
| [deleted]
| lvass wrote:
| Is that how you call things that have been working flawlessly
| and solving people's problems for over 10 years? Is
| needlessly breaking things that work an improvement to you?
| zelphirkalt wrote:
| You could probably simply lock the Python version you use
| for such code. No breakage there. If you must upgrade to a
| newer Python version, then you will have to repair broken
| code.
| doubled112 wrote:
| This went really well for the Python 2 -> 3 upgrade
| lazide wrote:
| It did buy a decade or so (or more really) - not like the
| python2 distribution you downloaded and distribute with
| your program back then is going to get tracked down and
| shot in the head by Guido anytime soon.
|
| If you're relying on whatever python version is
| distributed with whatever machine it happens to be on,
| there are a huge number of problems you're already going
| to have.
| [deleted]
| lazide wrote:
| If you make something that works because of an explicit
| memory and concurrency model (and not like there are other
| options at the time), it is indeed legit to worry about a
| major shift to those models that would cause problems.
|
| Even if those changes are better for other ways of solving
| problems.
| JacobHenner wrote:
| Source: https://mail.python.org/archives/list/python-
| dev@python.org/...
| kzrdude wrote:
| The story of losing GIL is very popular in the news, and I like
| it too!
|
| .. but. Let's not count our chickens until they are home. I'm
| wondering if the Python dev community will take on this
| challenge. I hope so, Sam seems to really have put in a lot of
| effort!
| TekMol wrote:
| Why is multithreaded performance important, what are the usecases
| where you cannot run multiple processes to spread your
| numbercrunching across CPUs?
|
| I am _praying_ for CPython to become faster. But I need faster
| singlethreaded performance, so web applications benefit from it.
| chrisseaton wrote:
| Communicating between processes is more expensive than
| communicating between threads.
| TekMol wrote:
| Ok, but what is the use case where you need high bandwidth
| inter-thread/process communication?
| moron4hire wrote:
| Games. Simulations. Backend servers for shared editor
| experiences. Teleconferencing.
| chrisseaton wrote:
| Except for embarrassingly parallel problems, the trade-off
| of generating more parallelism is usually needing finer-
| grained communication. Canonical examples in the literature
| are matrix multiplication, triangulation, and refinement.
| Spivak wrote:
| Large shared state is basically always the answer. You can
| cop-out and say use a database or Redis if that's fast
| enough but that's just making someone else use many threads
| with shared memory.
| adgjlsfhk1 wrote:
| matrix multiply is an obvious one. partial differential
| equations are another. Sorting is one if you don't care
| about math.
| VWWHFSfQ wrote:
| multiple processes use a lot more memory than threads
| TekMol wrote:
| Can you quantify that?
| SkittyDog wrote:
| Yes, you can
| lazide wrote:
| Even with copy-on-write? Have any actual numbers?
| SkittyDog wrote:
| Actual work is left as an exercise for the reader ;-)
| lazide wrote:
| Last I did this, when the processes were fork()'s of the
| parent (the typical way this was done), memory overhead
| was minimal compared to threads. A couple %. That was
| somewhat workload dependent however, if there is a lot of
| memory churn or data marshaling/unmarshalling happening
| as part of the workload, they'll quickly diverge and
| you'll burn a ton of CPU doing so.
|
| Typical ways around that include mmap'ng things or
| various types of shared memory IPC, but that is a lot of
| work.
| kaba0 wrote:
| What about context switches as well, much slower IPC, and
| basically "no native support".
| adgjlsfhk1 wrote:
| each python process requires somewhere around 200mb of
| memory and .1s to do nothing. if you want libraries, it
| scales from there.
| TekMol wrote:
| Really? That sounds like an aweful lot.
|
| When I execute this: python3 -c 'import
| time; time.sleep(60)'
|
| And then pmap the process id of that process, I get
| 26144K. That is 26MB.
|
| As for timing, when I execute this: time
| python3 -c ''
|
| I get 0.02s
| lazide wrote:
| Also generally no one spins up distinct, new processes
| for the 'co-ordinated distinct process work queue' when
| they can just fork(), which should be way faster and
| pretty much every platform uses copy-on-write for this,
| so also has minimal memory overhead (at least initially)
| jesboat wrote:
| The problem is (perhaps amusingly) with refcounting. As
| the processes run, they'll each be doing refcount
| operations on the same module/class/function/etc objects
| which causes the memory to be unshared.
| lazide wrote:
| Only where there is memory churn. If you're in a tight
| processing loop (checksumming a file? Reading data in and
| computing something from it?) then the majority of
| objects are never referenced or dereferenced from the
| baseline.
|
| Also, since the copy on write generally is memory page by
| memory page, even if you were doing a lot of that, if
| most of those ref counts are in a small number of pages,
| it's not likely to really change much.
|
| It would be good to get real numbers here of course. I
| couldn't find anyone obviously complaining about obvious
| issues with it in Python after a cursory search though.
| byroot wrote:
| That was solved years ago by moving the refcounts into
| different pages: https://instagram-engineering.com/copy-
| on-write-friendly-pyt...
| [deleted]
| kzrdude wrote:
| Interactive use of Python: plotting, working with data, also
| would benefit from better multithreading. It's interactive, so
| it's (a bit) frustrating to wait for it to compute and see that
| it uses just one thread (the statistics ops are usually well
| threaded already, but plotting is not).
| lostdog wrote:
| Here's a use case: I was training a neutral net, and wanted to
| do some preprocessing (similar to image resizing, but without
| an existing C function). Inputs are batched, so the
| preprocessing is trivially parallelizable. I tried to
| multithread it in python, and got no speedup at all.
|
| That was a really sad moment, and I've never felt good about
| python since.
| isoprophlex wrote:
| Yeah, I had something similar.
|
| I wanted "as you wait for the GPU to churn through this
| batch, start reading the next batch from disk & preprocessing
| it on the CPU"
|
| Getting this to work turned out so ass backwards it made me
| sad
|
| Also I pity the fool who tries to connect a debugger to code
| using multiprocessing.Pool()...
| rbjorklin wrote:
| One thing that comes to mind is for shared connection pools
| when you have thousands of workers connecting to the same
| stateful service.
| TekMol wrote:
| Which use case requires such a setup?
| Spivak wrote:
| MySQL comes to mind. Unlike Postgres where connections are
| expensive MySQL encourages loading up the server with
| hundreds of simultaneous connections per server.
|
| Like anything, it's possible to split this work out to a
| separate process but the IPC overhead is a lot.
| lazide wrote:
| Most databases (unfortunately), and since a ton of web apps
| use databases for state..
|
| That said, many such databases are (or already have) rolled
| out connection pool proxies for reasons like this, so meh.
| lanstin wrote:
| Web front end backed by data base with expensive login, so
| you cache connections. Two tier architecture it would be
| called.
| mhh__ wrote:
| With all the effort that's been put into this how many people
| just jumped ship to a native/more-native language?
|
| I'm probably biased because I think Python is a hacked together
| mess but I just don't see what the point is in dragging it
| around.
| ajkjk wrote:
| The difference is that it is so much easier, by orders of
| magnitude, to write code that gets shit done in python than any
| native language I'm aware of.
| Zababa wrote:
| > With all the effort that's been put into this how many people
| just jumped ship to a native/more-native language?
|
| I think Go benefited a lot from that.
| jgb1984 wrote:
| Very useful contribution, thanks! Mainly to demonstrate your
| own ignorance on the topic at hand.
| mhh__ wrote:
| I did ask a question. The second part is me being belligerent
| but the question was sincere.
|
| I think Python is shit but so is most everything else, I'm
| interested in whether people jump ship or just work around
| it's issues at scale. I work on a programming language
| designed to avoid messy python scripts internally, so I am
| sincerely interested in these decisions.
| mixmastamyk wrote:
| The question is unanswerable. Python started as a scripting
| and prototyping language and that hasn't and won't
| completely change. It's fantastic at what it does from that
| perspective, late additions of complexity notwithstanding.
| Mikeb85 wrote:
| I use Ruby and not Python, but I think both have a lot of the
| same benefits and weaknesses.
|
| IMO, removing the GIL is a major mistake. The GIL is what allows
| you easy concurrency and to keep the language's 'magic' while
| ensuring correctness. If you need parallelism, there's processes
| and probably other tactics (I'm not super up to date on Python
| things). If you simply remove the GIL you have a bunch of race
| conditions, so you need a bunch of new language constructs, and
| it just adds a bunch of complexity to solve problems that don't
| really need solving.
|
| IMO they should just do what Ruby did with Ractors; basically a
| cheap alternative to spawning more processes. Rewriting
| absolutely everything that uses threads to be thread-safe is a
| waste of time.
| klyrs wrote:
| It's already easy to write race conditions in Python.
| if x in d: del d[x] else: d[x] = True
|
| Is a classic example -- if two threads execute that, you can't
| predict the outcome (but a KeyError is quite likely)
|
| The GIL only protects the CPython virtual machine; it doesn't
| protect user code. Concurrent code with shared mutable state
| already needs explicit mutexes.
| intrepidhero wrote:
| What thread safe code can I write with the GIL that will have a
| race without it?
|
| I already have to be careful to only write to a shared object
| from one thread, since I have no guarantees on order of
| execution.
|
| The main benefit of the GIL, from my recent reading is that it
| makes ref counting fast _and_ thread safe. The meat of the
| proposal is changing ref counting so that it 's almost as fast
| _and_ atomic without the GIL.
| ptx wrote:
| What about setting a simple boolean flag, e.g. setting
| "cancelled = True" in the UI thread to cancel an operation in
| a background thread?
|
| In Java you would have to worry about _safe publication_ to
| make the change visible to the other thread, but thanks to
| the GIL changes in Python are always (I think?) made visible
| to other threads.
| dleslie wrote:
| I wonder how many folks think they're writing thread safe
| software with ease, and are unaware that they are leaning on the
| GIL?
|
| Could be that the impact of this change is far broader than just
| a few key libraries.
| [deleted]
| avianlyric wrote:
| I don't see how the GIL makes writing thread safe software any
| easier. The GIL might prevent two Python threads executing
| simultaneously, but it doesn't change the fact that a Python
| thread can be preempted, meaning your global state can change
| at any point during execution without warning.
|
| Most of the issues with multi-threading come from concurrency,
| not parallelism. The GIL allows concurrency, you just don't get
| any of the advantages of parallelism, which is normally the
| reason for putting up with the complexity concurrency creates.
| dehrmann wrote:
| It's more cpython than the GIL, but it lets you get away with
| using += and certain dict operations without locks.
| hexane360 wrote:
| Is this true? It looks like += compiles to four bytecode
| instructions: two loads, an increment, and a store. It
| should be possible for a thread to get paused after the
| load but before the store, resulting in a stale read and
| lost write.
|
| Some more discussion here:
| https://stackoverflow.com/questions/1717393/is-the-
| operator-...
| dehrmann wrote:
| Maybe it's just certain collection operations, then.
| pansa2 wrote:
| With the GIL, for an int i, `i += 1` is not thread-safe,
| but IIRC for a list l, `l += [1]` (i.e. extend) is.
|
| Presumably this patch changes the list implementation in
| some way so that the extend operation remains thread-safe
| without the GIL.
| stefan_ wrote:
| From running the same software on two moderately powerful
| embedded systems, one single-core and one multi-core, the
| latter is a lot more reliable in immediately exposing races
| and concurrency issues.
| dleslie wrote:
| > The GIL might prevent two Python threads executing
| simultaneously, but it doesn't change the fact that a Python
| thread can be preempted, meaning your global state can change
| at any point during execution without warning.
|
| That thread behavior is enough to reduce the likelihood of
| races and collisions; particularly if the critical sections
| are narrow.
| laserlight wrote:
| I wouldn't call it thread-safe when race conditions are
| possible.
| jldugger wrote:
| Then we need a term for when code race conditions are
| possible but rare enough that nobody using the software
| notices. thread-timebomb?
| nuerow wrote:
| > _Then we need a term for when code race conditions are
| possible but rare enough that nobody using the software
| notices. thread-timebomb?_
|
| There's already a term for that: not thread-safe.
|
| The definition of thread safety does not include
| theoretical or practical assessments regarding how
| frequent a problem can occurr. It only assesses whether a
| specific class of problems is eliminated or not.
| jldugger wrote:
| >The definition of thread safety does not include
| theoretical or practical assessments regarding how
| frequent a problem can occur.
|
| Well, _obviously_.
|
| The challenge I am putting forth on HN is to meaningfully
| describe _usable_ thread-unsafe software. If you've spent
| enough time outside university, you'll be aware that
| there are all kinds of theoretical race conditions that
| are not triggered in practical use.
| klyrs wrote:
| If you've worked at industrial scale, you'll be aware
| that even the most theoretical-seeming race condition
| will be triggered frequently.
| The_Colonel wrote:
| That reminds me how I was called to fix some Java
| service, which was successfully in production for 10
| years with hardly any incident, but it suddenly started
| crashing hard, all the time. It was of course a thread
| safety issue (concurrent non-synchronized access to
| hashmap) which laid dormant for 10 years only to wreak
| havoc later.
|
| Nothing obvious changed (it was still running a decade
| old JRE), perhaps it was a kernel security patch, perhaps
| a RAM was replaced or even just the runtime data
| increased/changed in some way which woke up this monster.
| formerly_proven wrote:
| Heisenbug.
| dkersten wrote:
| That's not useful. If you have a race condition, you will
| eventually hit it and when you do, you may get incorrect
| results or corrupt data. Thread unsafe is thread unsafe,
| regardless how rare it appears to be.
|
| Also, rare on one computer (or today's computer) might
| not be rare on another (tomorrows faster one for
| example).
|
| These types of bugs are also very hard to detect. You
| might not know your data is corrupted. Reminds me of how
| bad calculations in excel has cost companies billions of
| dollars, except now, the calculations could be "correct"
| and the error sitting dormant, just waiting for the right
| timings to happen. Much better to not make assumptions
| about the safety and think about it up front: if you are
| using multiple threads, you need to carefully consider
| your thread safety.
| Brian_K_White wrote:
| There is no such thing as "likely". A thing is either
| possible or not possible.
| kingofpandora wrote:
| Likelihood refers to probability not possibility.
| Brian_K_White wrote:
| There is no such thing as probability. All there is is
| possible and not possible.
|
| I don't know how the point of the comment could be
| missed, but what I am saying is, it is a mistake, a
| rookie baby not-a-programmer not even any kind of
| engineer in any field, to even think in those sorts of
| terms at all. At least not in the platonic ideal worlds
| of math or code or protocol or systems design or legal
| documents, etc.
|
| Physical events have probability that is unavoidable. How
| fast does the gas burn? "Probably this fast"
|
| There is no excuse for any coder to even utter the word
| "likely".
|
| The ONLY answers to "Is this operation atomic?" or "Is
| this function correct?" or "Does this cpu perform
| division correctly?" Is either yes or no. There is no
| freaking "Most of the time."
|
| "Likely" only exists in the realm of user data and where
| it is explicitly _created_ as part of an algorythm.
| avianlyric wrote:
| That just means the GIL is good at hiding concurrency bugs.
| It doesn't make writing correct code any easier. Arguably
| you could say it makes writing correct concurrent code
| harder, because it'll take significantly longer for
| concurrency bugs cause errors.
| chacham15 wrote:
| There are certain classes of errors that it prevents. E.g:
|
| Thread1: a = 0xFFFFFFFF00000000
|
| Thread2: a = 0x00000000FFFFFFFF
|
| One might think that the two possible values of a if those
| are run concurrently are 0xFFFFFFFF00000000 and
| 0x00000000FFFFFFFF. But actually 0x0000000000000000 and
| 0xFFFFFFFFFFFFFFFF are also possible because the load itself
| isnt atomic.
|
| The GIL (AFAICT) will prevent the latter two possibilities.
| adrian_b wrote:
| Most CPUs guarantee that aligned loads and stores up to the
| register size, i.e. now usually up to 64-bit, are atomic.
|
| The compilers also take care to align most variables.
|
| So while your scenario is not impossible, it would take
| some effort to force "a" to be not aligned, e.g. by being a
| member in a structure with inefficient layout.
|
| Normally in a multithreaded program all shared variables
| should be aligned, which would guarantee atomic loads and
| stores.
| knorker wrote:
| Well, thread safety is exactly about these cases of
| "well, it's hardly ever a problem".
|
| Real life bugs have come from misapplication of correct
| parameters for memory barriers, even on x86. Python GIL
| removes a whole class of potential errors.
|
| Not that I'm against getting rid of the GIL, but I'm more
| sceptical that it won't trigger bugs.
|
| Though in my opinion python just isn't a good language
| for large programs for other reasons. But it'd be nice to
| be able to multithread some 50 line scripts.
| intrepidhero wrote:
| But if you're writing to the same object from two different
| threads you're going to have undefined behavior regardless
| of the GIL, yes?
| NovemberWhiskey wrote:
| Not really. If you're doing an atomic write to the same
| object from two different threads, you're going to have
| one win the race and the other lose. That may be a bug in
| your code, but it's not undefined behavior at the
| language level.
| ajkjk wrote:
| It prevents classes of errors, such as, as the parent
| mentioned, non-atomic writes to individual variables.
| fulafel wrote:
| No. The L in GIL stands for lock. So only the thread that
| holds it can write or read from the object, and the
| behavior is well defined at the C level, because C lock
| acquire and release operations are defined to be memory
| barriers.
| dkersten wrote:
| But when each thread reads the variable, you have no
| control over which value you see, since you don't control
| when each thread gets to run. So its undefined in the
| sense that you don't know which values you will get: a
| thread might get the value it wrote, or the value the
| other thread wrote. The threads might not get the same
| value either.
|
| The GIL exists to protect the interpreters internal data,
| not your applications data. If you access mutable data
| from more than one thread, you still need to your own
| synchronisation.
| hexane360 wrote:
| It depends what you mean by 'undefined behavior'. The GIL
| makes operations atomic on the bytecode instruction
| level. Critically, this includes loading and storing of
| objects, meaning that refcounting is atomic. However,
| this doesn't extend to most other operations, which
| generally need to pull an object onto the stack,
| manipulate it, and store it back in separate opcodes.
|
| So with Python concurrency, you can get unpredictable
| behavior (such as two threads losing values when
| incrementing a counter), but not undefined behavior in
| the C sense, such as use-after-free.
| snek_case wrote:
| AFAIK CPUs implement atomic load and store instructions and
| the performance overhead of these is very small compared to
| something like a software busy lock. So I think it's quite
| possible to take away the GIL while still making it
| impossible to load only half of a value.
| ignoramous wrote:
| Related: _Symmetric Multi-Processor primer for Android_ ,
| https://developer.android.com/training/articles/smp
| (although for Android/ARM, it makes for a pretty good read
| on the topic).
| rectang wrote:
| I have read many anti-GIL arguments over the years that
| approach soundness as optional. Is this change going to make a
| bunch of previously sound code unsound?
| hyperbovine wrote:
| Conversely, I have found that the GIL makes it unexpectedly
| easy to write thread-safe software in Python. Compare (in
| Cython) writing with gil:
| call_a_method() print(some_debugging_info)
|
| with all the sit-ups you'd have to do in a "real" concurrent
| language.
| ynik wrote:
| The GIL doesn't really help Python code though, because the
| interpreter may switch threads between any two opcodes.
|
| It only protects the state of the Python interpreter and that
| of C/Cython extension modules. Though even there, you can
| have unexpected thread switches, e.g. in Cython `self.obj =
| None` can result in a thread switch if the value previously
| stored in `self.obj` had a `__del__` method implemented in
| Python.
|
| And AFAIK pretty much any Python object allocation can
| trigger the cycle collector which can trigger `__del__` on
| (completely unrelated) objects in reference cycles, so it's
| pretty much impossible to rely on the GIL to keep any non-
| trivial code block atomic.
| ngrilly wrote:
| Impressive proposal. That would remove a major limitation of
| CPython.
___________________________________________________________________
(page generated 2021-10-17 23:00 UTC)