[HN Gopher] Prevent DoS by large int-str conversions
___________________________________________________________________
Prevent DoS by large int-str conversions
Author : genericlemon24
Score : 78 points
Date : 2022-09-07 17:03 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| wmichelin wrote:
| Can anyone TL;DR why? Why wouldn't it just return that long
| integer of all 1s?
| sp332 wrote:
| Yeah it's right at the top of the linked page?
| schoen wrote:
| It's stated to be CVE-2020-10735, which is apparently about a
| denial of service by forcing Python to inefficiently convert a
| very large string to an integer, using a potentially ridiculous
| amount of CPU time.
|
| The CVE hasn't been published, but for example there's an
| explanation at
|
| https://bugzilla.redhat.com/show_bug.cgi?id=1834423
| klyrs wrote:
| Looks to me like the actual problem is in string.__mul__ --
| that one's got arbitrary memory usage. Better limit those
| arguments...
| masklinn wrote:
| str.__mul__ is just a conveniently short way to demonstrate
| the issue, the target is pretty much any parsing routine
| exposed to outside users e.g. any JSON API.
| klyrs wrote:
| Apologies, my comment is snark. The algorithm in question
| is soft-linear, faster implementations exist, this seems
| like an incredibly myopic fix. Just make a bigger JSON
| blob and it will take longer to parse.
| [deleted]
| adgjlsfhk1 wrote:
| this seems like a dumb fix to the cve to me. why not just use
| a faster algorithm?
| lifthrasiir wrote:
| Because there is no linear-time algorithm for decimal-to-
| binary conversion. If we are to expose the bignum-aware
| `int` function to untrusted input there should be some
| limit anyway. I do think the current limit of 4301 digits
| seem too low though---if it were something like 1 million
| digits I would be okay.
| schoen wrote:
| It looks like there is some discussion of the algorithmic
| options at
|
| https://github.com/python/cpython/issues/95778
|
| https://github.com/python/cpython/issues/90716
|
| Is there something bad going on with Python's internal
| representation of big integers, too? I thought I might
| have understood Tim Peters to be saying that in the
| latter thread.
|
| It does look like gmpy2.mpz() is like 100 times faster
| than int() or something. Is this just because it's doing
| it all in assembly rather than in Python bytecodes, or
| are the Python data structures here also not so hot?
| thehappypm wrote:
| One of the comments showed the incredibly naive approach
| of just building the integer digit-by-digit:
|
| '1234' => 1x1000 + 2x100 + 3x10 + 4x1
|
| Is faster and has room to improve
| tylerhou wrote:
| This takes (worse than) quadratic time.
| thehappypm wrote:
| I'm not sure it does, in the best case.
|
| There are d additions, so the addition is linear time.
|
| Each multiplication is potentially quadratic, but it
| seems optimizable since it's never multiplication of two
| large numbers--always one large and one small number.
| singron wrote:
| Each addition is linear in d, but there are d additions,
| so it's already quadratic before you even consider the
| multiplications.
|
| In a power-of-2 base, the result of the multiplication is
| a constant number of digits (because the multiplication
| is just a shift of a single digit), so the additions
| could each be constant time in that case.
| klodolph wrote:
| > It does look like gmpy2.mpz() is like 100 times faster
| than int() or something. Is this just because it's doing
| it all in assembly rather than in Python bytecodes, or
| are the Python data structures here also not so hot?
|
| It's not the data structures. The data structures are
| really more or less the same: you have some array of
| words, with a length and a sign. The only real
| differences are in the particular length of word that you
| choose, which is not a very interesting difference.
|
| Assembly language optimizations do tend to matter here,
| because you're working with the carry bit for lots of
| these operations, and each architecture also has some
| different way of multiplying numbers. Multiplying numbers
| is "funny" because it produces two words of output for
| one word of input.
|
| There are also sometimes some different algorithms in
| use, and GMP uses some different algorithms depending on
| the size. Here's a page describing the algorithms used by
| GMP:
|
| https://gmplib.org/manual/Multiplication-Algorithms
|
| Here's a description of how carries are propagated:
|
| https://gmplib.org/manual/Assembly-Carry-Propagation
|
| IMO, I wouldn't expect my language's built-in bigint type
| to use the best, most cutting-edge algorithms and lots of
| hand-tuned assembly. GMP is a specialized library for
| doing special things.
| tylerhou wrote:
| There is no practical linear time algorithm for
| multiplication; should Python disable multiplication for
| numbers greater than 10^4301?
|
| Even a naive divide and conquer decimal to binary
| algorithm is only logarithmically slower than
| multiplication.
| adgjlsfhk1 wrote:
| there isn't a linear time algorithm, but there is an
| algorithm in O(n*log(n)^2) http://maths-
| people.anu.edu.au/~brent/pd/rpb032.pdf which is pretty
| close. it also seems weird to have a CVE for "some
| algorithms don't run in linear time". should there be a
| 4000 element maximum for the size of list passed to sort?
| lifthrasiir wrote:
| > should there be a 4000 element maximum for the size of
| list passed to sort?
|
| Technically speaking, yes, there should be some limit if
| you are accepting an untrusted input. But there is a good
| argument for making this limit built-in for integers but
| not lists: integers are expected to be atomic while lists
| are wildly understood as aggregates, therefore large
| integers can more easily propagate throughout
| unsuspecting code base than large lists.
|
| (Or, if you are just saying that once you have sub-
| quadratic algorithms you don't need language-imposed
| limits anymore, maybe you are right.)
| bjourne wrote:
| But why convert it to binary? If you store the number as
| an array of digits the parsing process should be O(n).
| lifthrasiir wrote:
| That means every limb operation should be done modulo
| 10^k, which would be pretty expensive and only makes
| sense if you don't do much computation with them so the
| base conversion will dominate the computation.
| wyldfire wrote:
| But the multiplier is unbound, though. Faster wouldn't help
| in that case.
| klyrs wrote:
| Maybe we should limit the lengths of strings altogether.
| 512k should be enough for anybody.
| eugenekolo wrote:
| Could they not have modified the `int` function to `int(thingy,
| i_really_want_to_do_this=false)`?
|
| Edit: Looks like they added a python argument to increase the
| limit. So if you really need this, I suppose you can search
| around until you figure out why it's not working and pass the
| correct argument to the python bin.
| qbane wrote:
| Yeah, we must prevent DoS at all costs. It seems that Python
| should not have integers at arbitrary size for "performance"
| reason in the beginning. Aren't int32/int64/int128 nice? Number
| of operations are all bounded. We should stick to them.
| kragen wrote:
| This was Python's behavior until Python 2; `long`, the
| arbitrary-precision integer, was a separate type, and `int`
| arithmetic overflow caused a ValueError. One of the big changes
| in Python 2 was to imitate the behavior of Smalltalk and (most)
| Lisp by transparently overflowing `int` arithmetic to `long`
| instead of requiring an explicit `long()` cast. Python 3
| eliminated the separate `long` type altogether.
|
| Having been bitten by the Smalltalk behavior, I am skeptical
| that the Python 2 change was a good idea.
| justinsaccount wrote:
| From the linked bug..
|
| > It takes about 50ms to parse an int string with 100,000 digits
| and about 5sec for 1,000,000 digits. The float type, decimal
| type, int.from_bytes(), and int() for binary bases 2, 4, 8, 16,
| and 32 are not affected.
|
| Sure seems strange to set the limit to 4300. 50ms is not a DoS.
| xani_ wrote:
| balooning 2ms request to 50ms is absolutely a DoS
|
| that's only 20req/sec to fill a core of execution
| schoen wrote:
| If you need to make integers this big from decimal
| representations, I guess you could still use gmpy2.mpz(), and
| then either leave the result as an mpz object (which is generally
| drop-in compatible with Python's int type, with the addition of
| some optimized assembly implementations of arithmetic operations
| and some additional methods), or convert it to a Python int by
| calling int() on it.
| blibble wrote:
| new interpreter argument: -X
| int_max_str_digits=number limit the size of int<->str
| conversions. This helps avoid denial of service
| attacks when parsing untrusted data. The default is
| sys.int_info.default_max_str_digits. 0 disables.
|
| this should not be a runtime configuration setting, fix the
| sodding algorithm to not be quadratic
|
| will we be getting PHP style magic quotes soon? that also
| protects developers against untrusted input (bonus! this could be
| configured too!)
|
| or an inability to pass strings into the regular expression
| module? that can also cause DoS
|
| (what happened to Python?)
| simonw wrote:
| My understanding is that there is no algorithm for this that
| isn't quadratic.
|
| Update: I may have understood incorrectly, see
| https://github.com/python/cpython/issues/90716
| blibble wrote:
| > My understanding is that there is no algorithm for this
| that isn't quadratic.
|
| > If you know of one, the Python core development team would
| love to hear about it!
|
| it's mentioned on the issue page that makes up the article...
|
| (before they closed it due to the "code of conduct")
| [deleted]
| jwilk wrote:
| https://github.com/python/cpython/issues/95778 has more
| information.
| dang wrote:
| Ok, we'll change to that from
| https://pythoninsider.blogspot.com/2022/09/python-
| releases-3.... Thanks!
|
| All: submitted title was "`int('1' * 4301)` will raise
| ValueError starting with Python 3.10.7" and comments reference
| that, so you might want to take a look at both URLs.
| svet_0 wrote:
| So now an unreasonable user input will crash my server instead of
| slowing it down by 50ms. Great DoS mitigation!
| Ukv wrote:
| In addition to omnicognate's point, calling `int` on user input
| would generally already expect a possible ValueError.
| omnicognate wrote:
| Your server crashes if a request fails?
| xani_ wrote:
| it does with this change where it didn't before. At the very
| best you're still restarting the whole process instead of
| just wasting a bit of time
| fuckstick wrote:
| Who uses a process per request for serving Python apps?
| That must be very uncommon. Even if you use a worker pool
| that isn't going to restart a whole process just because of
| an errant exception in a request handler.
|
| Also as noted if your whole process crashes because of
| errant input to int() you are beyond fucked in other ways.
| aYsY4dDQ2NrcNzA wrote:
| Then don't upgrade Python in your container?
| progval wrote:
| You should always catch ValueError when using int() on user
| input, because that input may not be a valid number.
| [deleted]
| ridiculous_fish wrote:
| Why is base 10 string -> int a quadratic algorithm? Are there no
| faster ones that could be implemented?
| blahedo wrote:
| No, because 10 is not a power of 2, so any digit in the source
| (base 10) can affect any digit in the result (base 2).
| Converting from e.g. base 16 to base 2 is linear, because 16 is
| a power of 2.
| saghm wrote:
| I was surprised to see this in a bugfix release since it seems
| like a breaking change, but from reading, it seems that this was
| considered a security vulnerability (specifically a DOS
| opportunity) given the CVE status, so I imagine that
| compatibility concerns were secondary here. This seems in line
| with how other languages seem to do things from what I've seen;
| semver is important, but in a sense not every change is equally
| "breaking" to users, and breaking code that's unlikely to be
| common and potentially is not behaving correctly in the first
| place is not going to cause as much friction as most other types
| of breaking changes. Put another way, if there's a valid security
| concern, breaking things loudly for users forces them to double
| check their usage of this sort of code and ensure that nothing
| risky is going on. (I don't personally have enough domain
| knowledge here to know if the security concern is actually valid
| or not, but the decision to make this change in a patch release
| seems like a reasonable conclusion to come to for people who
| determine that it is a security concern).
| bo1024 wrote:
| From the link:
|
| > Everyone auditing all existing code for this, adding length
| guards, and maintaining that practice everywhere is not feasible
| nor is it what we deem the vast majority of our users want to do.
|
| It's hard not to read this as "we want to use untrusted input
| everywhere with no consequences". Seems like we'll be kicking as
| many issues under the rug as we're fixing with this change,
| right?
| bostik wrote:
| I read it the other way round - untrusted input is used in
| various places where doing such inline checks is prohibitively
| tricky. The examples given are quite telling: json, xmlrpc,
| logging. First two are everywhere in APIs. The third is just
| ... everywhere.
|
| Are you really going to use a JSON or XML stream parser _first_
| before feeding it to the stdlib module? And one that does not
| try to expand the read values to native types? As for logging,
| that is certainly the place where you are not only expected,
| but often required to use untrusted input.
|
| The fix feels like a heuristic and a compromise. None of the
| [easily available] solutions are robust, solid or performant,
| so someone picked an arbitrary threshold that should never be
| hit in sane code.
|
| The linked issue mentions that GMP remains fast even in face of
| absurdly big numbers. No surprise, the library is _literally_
| designed for it: MP stands for multi-precision (ie. big int and
| friends).
| adgjlsfhk1 wrote:
| this would all make more sense if python was using a
| reasonably fast string to int routine, but the one they are
| using is asymptotically bad, and the limit they chose is
| roughly a million times lower than it should have been.
| rwmj wrote:
| Did they consider doing tainting (like Perl)? Input strings are
| marked as tainted and anything derived from them, except for
| some specific operations that untaint strings. If you use a
| tainted string for a security-sensitive operation then it
| fails. http://perlmeme.org/howtos/secure_code/taint.html
| Dylan16807 wrote:
| It's easy for me not to read it that way! Converting to an
| integer is a very good start for validating many kinds of
| input.
| machina_ex_deus wrote:
| This is way too low, I've used RSA keys in base 10 with half the
| size of this string. It corresponds to only 14,000 bit numbers,
| there are 8192 bit keys. I'm pretty sure this will break some CTF
| challenges. The limit should be in the millions at the very
| least.
| munch117 wrote:
| It does seem very low.
|
| However, you shouldn't be passing million-digit numbers around
| as (decimal) text. Even if you're not at risk of DOS attacks,
| there's still the issue that it's very, very slow:
| $ python3 -m timeit -s "s='1'*1000000" "i=int(s)" 1
| loop, best of 5: 5.77 sec per loop
|
| A ValueError alerting you to that fact could be considered a
| service.
|
| Contrast and compare: $ python3 -m timeit -s
| "s='1'*1000000" "i=int(s,16)" 200 loops, best of 5:
| 1.45 msec per loop
| adgjlsfhk1 wrote:
| python being slow isn't news. that's not a reason for an
| error.
| nomel wrote:
| > However, you shouldn't be passing million-digit numbers
| around as (decimal) text
|
| This is about numbers that are thousands of digits, not
| millions. Regardless, why not? What's the alternative that
| supports easy exchange? If you stick it in some hexified
| representation, you still have to parse text, and put it into
| some non-machine-native number container. It's going to be
| slow no matter what.
| blibble wrote:
| you can convert hex into binary directly without any
| multiplications
| munch117 wrote:
| No, it's not going to be slow no matter what. Didn't you
| see my example? The hexadecimal non-machine-native textual
| representation was 4000 times faster than the decimal
| ditto. On a number that was much larger, I might add.
|
| Hex number parsing is linear time.
| schoen wrote:
| I could imagine people overlooking that little "m" in
| your example's output!
| nomel wrote:
| Indeed I did!
| im3w1l wrote:
| This will break correct code for a fairly small benefit. I don't
| think they should do this in a patch release.
| [deleted]
| [deleted]
| gfd wrote:
| Why did they close the discussion due to code of conduct? I
| didn't see anything wrong with the previous comments before that
| point.
| klodolph wrote:
| > As a reminder to everybody the Python Community Code Of
| Conduct applies here.
|
| > Closing. This is fixed. We'll open new issues for any follow
| up work necessary.
|
| The issue was marked closed, because the associated work was
| completed and the PR was merged. The same comment happened to
| mention the code of conduct, but the code of conduct wasn't why
| the issue was closed--it was just because the work was done.
|
| I think the comment mentioned the CoC because the previous
| comment, "This is appalling" was a bit rude.
| Delk wrote:
| > I think the comment mentioned the CoC because the previous
| comment, "This is appalling" was a bit rude.
|
| The previous comment was indeed a bit rude. I personally
| wouldn't think it was rude enough to invoke a code of
| conduct.
|
| Even just referring to a code of conduct has, IMO, a rather
| strong vibe of policing and perhaps even an implication of
| wrongdoing, more so than merely a suggestion to keep it calm.
|
| I don't know the culture or context of Python development
| (either the language or CPython), but I'm inclined to agree
| with gdf that it's a bit weird to start reminding people of a
| CoC because of a slightly rude sentence or two, especially
| since the rest of the comment was reasonable technical
| argumentation even if unapologetic.
|
| Even if closing the issue were entirely because of other
| reasons and benign (someone did still reference the issue in
| a commit later, though), it's all too easy to see the issue-
| closing comment as shutting out dissenting opinions, either
| because of a somewhat unpleasantly expressed argument or
| simply because "this is fixed, no further discussion needed".
|
| The "this is appalling" comment may have been a bit rude but
| the closing one wasn't exactly a triumph in communication
| either.
| Guthur wrote:
| "This is appalling" is not even remotely rude, honestly are
| we all children now?
| blibble wrote:
| your new comment violates the PSF "code of conduct" too!
|
| this particular wording could be used to ban any
| criticism of contributions (regardless of the criticism's
| correctness):
|
| > Being respectful. We're respectful of others, their
| positions, their skills, their commitments, and their
| efforts.
|
| in this sort of environment I guess it's far from
| surprising that the technical decisions are suffering (to
| put it politely)
| klodolph wrote:
| > Even just referring to a code of conduct has, IMO, a
| rather strong vibe of policing and perhaps even an
| implication of wrongdoing, more so than merely a suggestion
| to keep it calm.
|
| I'd say the opposite. A suggestion to "keep it calm" is
| inappropriate, because it carries the implication that
| someone is not calm. This is inappropriate because it is a
| comment on a person's emotional state rather than on what
| they say or how they say it.
|
| In fact, if someone on my team said to "keep it calm", I'd
| take that person aside and explain, in private, the reasons
| why not to say that.
|
| > Even if closing the issue were entirely because of other
| reasons and benign (someone did still reference the issue
| in a commit later, though), it's all too easy to see the
| issue-closing comment as shutting out dissenting opinions,
| [...]
|
| If somebody thought that closing the issue shut out
| dissenting opinions, then that person has forgotten how
| GitHub issues work or how bug trackers work in general.
| Closing an issue just means that someone thinks that the
| work on it is done; it does not stop discussion on the
| issue. I can see why someone might forget and not realize
| that the issue was closed and _not_ the discussion, but I
| don 't think that it's a problem that someone visiting the
| bug from HN would forget how GitHub issues work for a
| minute.
|
| With any online community above a certain size, there's a
| certain amount of policing not just of what is said, but
| where people have discussions. Anyone who regularly uses a
| forum, Subreddit, Discord server, IRC, Slack, etc. will see
| this pattern of behavior everywhere. For example--the
| discussion about whether this is the right way to fix a bug
| is a discussion which should be held elsewhere, where
| people can see the context and interested parties can
| respond to it.
|
| Which is why there is a comment at the bottom,
|
| > Please redirect further discussion to discuss.python.org.
|
| It's crystal clear to me that this is not about shutting
| out dissenting voices, but just saying that this GitHub
| issue is the wrong place for this discussion.
|
| You can see that there is a related issue which was closed,
| but there was a lot of discussion afterwards--but because
| the discussion was on-topic, the issue was not locked.
|
| https://github.com/python/cpython/issues/90716
| Delk wrote:
| > I'd say the opposite. A suggestion to "keep it calm" is
| inappropriate, because it carries the implication that
| someone is not calm.
|
| Perhaps a suggestion to "keep it calm" wouldn't be the
| best. English isn't my first language and my verbal
| expression isn't always the greatest. But referring to a
| code of conduct does also carry the implication that
| someone isn't minding that code, and I don't see how that
| would necessarily be better.
|
| In my view, suggesting that someone isn't calm is less of
| a reprimand than suggesting they might be in breach of a
| code of conduct which, among other things, includes rules
| against outright harassment and other clearly
| reprehensible behaviour. It's normal to not be calm at
| times; it's another thing if someone needs to be reminded
| of the rules of a community. Perhaps it's a cultural
| thing but to me the latter is stronger judgement.
|
| There may well be reasons for not saying to keep it calm
| (it sometimes simply doesn't work), but I can equally
| well see how people might see a reference to a CoC as
| strong-armed.
|
| > If somebody thought that closing the issue shut out
| dissenting opinions, then that person has forgotten how
| GitHub issues work or how bug trackers work in general.
| Closing an issue just means that someone thinks that the
| work on it is done; it does not stop discussion on the
| issue.
|
| That's fair enough. Perhaps the intention is clear enough
| within the community that it would indeed be deemed as
| simply closing that rather specific GitHub issue without
| implying that the matter is closed.
|
| Human communication isn't always quite that simple,
| though. People get impressions from the way things are
| expressed. "This is fixed." makes it feel that there is
| nothing to be discussed about that particular change and
| that it is final.
|
| I don't know the particular community well enough to know
| how it would be interpreted, though.
|
| > Which is why there is a comment at the bottom,
|
| >> Please redirect further discussion to
| discuss.python.org.
|
| That's after the comment that closed the issue. Had it
| been in the issue-closing comment, that would have left a
| different taste to the closing.
| googlryas wrote:
| For anyone wondering, '1' * 4301 creates a string of '11111....'
| 4301 characters long. It doesn't result in an integer value of
| 4301 like in some other languages.
|
| I find this a strange modification to the language, though
| probably not a particularly painful one. Has python saved you
| from yourself when dealing with non-linear built-in algorithms
| before? IIRC it is also possible to have the regex engine take an
| inordinate amount of time for certain matching concepts(I think
| stackoverflow was affected by this?), but the engine wasn't
| hobbled to throw in those cases, it is merely up to the user to
| write efficient regex that aren't subject to those problems.
| ffhhj wrote:
| They should have made the analogous inverse operation: '1234' /
| 2 = ['12', '34']
| bsdz wrote:
| I was more expecting '1111' / 4 = '1'. This would be the
| inverse operation. However, it opens up even more questions
| like what to do if your string has mixed values etc
| ffhhj wrote:
| The string multiplication is about _joining_ strings, the
| inverse is about _splitting_ them in several parts. It's
| only confusing because the * appends the string to itself,
| the / is actually very clear.
| dekhn wrote:
| Disagree. The inverse "string" * value is logically
| splitting, _and then collapsing the repeated values_. The
| logical split can be omitted, but the collapsing cannot.
| [deleted]
| tremon wrote:
| That's not the inverse of the multiplication though. The
| inverse would be '33' / 2 = '3', and '1234'/2 should then
| probably raise a ValueError.
| hyperpape wrote:
| Backtracking regular expressions as an intentional or
| accidental DOS vector are a moderately well-known issue, and
| while I prefer that a standard library implementation be robust
| against them, I can see the POV that it's buyer beware.
|
| Converting a string to an integer is somewhat less well known
| as a DOS vector, more painful to avoid as an application
| creator, and easier to fix in code.
|
| So there's a cost-benefit argument that you should just do this
| before you rewrite your regex engine.
| masklinn wrote:
| > I can see the POV that it's buyer beware.
|
| On the other hands, lots of buyers are not aware that it's an
| issue, and more frustratingly there are regex engines which
| are very resilient to it... but are not widely used.
|
| Python's stdlib will fall over on any exponential
| backtracking pattern, but last time I tried to make postgres
| fall over I didn't succeed. Even though it does have
| lookahead, lookbehind, and backrefs, so should be sensible to
| the issue (aka it's not a pure DFA).
| bo1024 wrote:
| This does seem like a strange level of handholding, even if the
| motivation makes lots of sense. If you start going down the
| road of protecting people who don't sanitize user input, you
| may have quite a long journey ahead...
| mjevans wrote:
| Operator overloading sure seems to increase the prevalence of
| foot-guns, security issues, and other gotchas.
|
| str.ccClone(4301) # ConCatenate Clones of the source string N
| times.
|
| Would even an abbreviated, named, function not be more self
| documenting and better for human and machine reviews?
| proto_lambda wrote:
| Other than that being a terrible name (it's almost impossible
| to be sure what it does without consulting documentation), I
| personally do prefer fewer implicit/overloaded operations.
| mjevans wrote:
| What name would you suggest? That was my 5 min of thought
| version.
|
| cc prefix for concatenate because that word is very long
| and it seemed likely that strings may have a large number
| of different concatenation focused functions that could all
| share the prefix.
|
| Clone as the type of concatenation operation to perform.
| proto_lambda wrote:
| Rust uses `repeat()`, which sounds much more descriptive
| to me. The types in the function signature make the
| "clone" part of the name redundant.
| mjevans wrote:
| Offhand, is repeat(0) an empty string, repeat(1) the
| input string, etc? If so that's a great name for the
| function.
| pezezin wrote:
| Repeat is an iterator, so you can apply it to any type
| you want, not just strings. You can chain it with other
| iterators, or collect it into some data structure. But
| yes, repeat(0) returns an empty iterator.
|
| https://doc.rust-lang.org/std/iter/fn.repeat.html
| slaymaker1907 wrote:
| I think how Rust does it is fine, but I agree operators are
| often a mess. Yesterday I was looking at a memory dump where
| there was a problem in a destructor (a double free was
| detected) and it was an absolute mess trying to figure out
| the exact execution location in source code since it was
| setting the value of a smart pointer which triggered a
| decrement of a reference counted value in turn triggering a
| free. It's junk like that which starts to convince me that
| Linus was right to avoid C++. Rust obviously also has
| destructors, but it doesn't have the nightmare that is
| inheritance+function overloading+implicit casting.
| cma wrote:
| > and it was an absolute mess trying to figure out the
| exact execution location in source code since it was
| setting the value of a smart pointer which triggered a
| decrement of a reference counted value in turn triggering a
| free.
|
| Isn't all that context there in the stack trace?
| jlarocco wrote:
| Yes, probably. Depends on the compiler settings. Stuff
| can get optimized out and stripped.
|
| When writing the code in the first place, though, it's
| difficult to see problems like that because it's all
| hidden behind magic calls to copy constructors, move
| semantics, and destructor calls. Out of sight, out of
| mind.
| DSMan195276 wrote:
| I think it's separate from his point but some of those
| things could potentially be tail calls, meaning the
| functions actually leading to the free/delete might not
| be in the stacktrace even if they were called.
| UncleEntity wrote:
| It is really useful sugar for: for _ in
| range(4301): llama.append('1')
|
| (there's probably an easier way to do that but you get the
| point)
|
| where python can see both sides of the operation and optimize
| it on the C side of things.
|
| The issue really has nothing to do with that though, it is
| converting a string to an int which is the whole point of the
| security update.
| Gordonjcp wrote:
| > Operator overloading sure seems to increase the prevalence
| of foot-guns, security issues, and other gotchas.
|
| How exactly? What would you expect an expression like ('1' *
| 4301) to give you, and why would you think it would be
| different from ('caterpillar' * 4301)?
| qayxc wrote:
| Well, let's assume that the "expected" behaviour holds,
| shall we? Let's open up a python REPL and try
| >>> 'caterpillar' * 2 'caterpillarcaterpillar'
|
| OK, now for something different: >>> [1, 2,
| 3] * 2 [1, 2, 3, 1, 2, 3]
|
| Marvellous! How about this then: >>> True *
| 2 2
|
| Wait, what? Hm. >>> False * 2 0
|
| Whoops! Implicit type conversion takes place... Even worse:
| >>> 'abc' + 'efg' 'acbefg' >>> 'efg' + 'abc'
| 'efgabc'
|
| Now I'm stumped. Isn't addition supposed to be commutative?
|
| So yeah, without contracts in place, operator overloading
| is BAD. You can never know what the operator does, or what
| its properties are by just looking at how it's used.
| There's simply no enforced rules and so no-one's stopping
| you from doing >>> class Complex:
| def __init__(self, real, imag): self.real = real
| self.imag = imag def __add__(self, other):
| return Complex(self.real - other.real, self.imag -
| other.imag) def __repr__(self):
| return f'Complex({self.real}+{self.imag})' >>>
| x = Complex(1, 2) >>> y = Complex(1, 2) >>> x
| + y Complex(0+0j)
|
| Now this intentionally being malicious of course, but
| plenty of libraries overload operators in non-intuitive
| ways so that the operator's properties and behaviour isn't
| obvious. This is especially true if commutative operators
| are implemented as being non-commutative (e.g. abusing '+'
| for concatenation instead of using another symbol like '&'
| for example) or if the behaviour changes depending on the
| order of operands.
| samatman wrote:
| In Lua, the first is 4301 and the second is a runtime
| error. ('1' .. 4301) is 14301, the equivalent of the weird
| thing Python is fixing would be spelled
| `tonumber(('1'):rep(4301))` which is obviously wrong.
|
| To my taste operator overloading is fine, but concatenation
| isn't addition, so they shouldn't be overloaded because...
| [gestures vaguely at a half dozen language]
| im3w1l wrote:
| Succinct string operations is honestly like half of what I
| use python for and the great numeric support with bignum by
| default and powerful libraries with overloads like numpy and
| tensorflow is the other half.
| jejones3141 wrote:
| In Algol 68, you can do that; it's part of the standard
| prelude. I think that some people who'd worked on Algol 68 in
| the Netherlands also worked on the ABC language, where it's "1"
| ^^ 4301, and Guido worked on ABC before Python.
| gsliepen wrote:
| Well, in C++ int('1' * 4301) is a perfectly valid expression,
| but it evaluates to 210749, not 4301.
| oldgradstudent wrote:
| Or some other value.
|
| If sizeof(int)=2, the result is undefined.
| gsliepen wrote:
| Not if CHAR_BIT is 10 or more!
| oldgradstudent wrote:
| I wonder how much software will fail on platforms where
| CHAR_BIT is not 8.
| eMSF wrote:
| Whether evaluating that expression results in undefined
| behaviour also depends on the basic execution character set
| and the bit width of the machine byte.
| dark-star wrote:
| it doesn't evaluate to 4301 in Python either ;-)
| Phil_Latio wrote:
| What's next? A default socket timeout of X seconds for security
| reasons? What a joke and rather scary that apparently everyone or
| the majority on the internal side agrees with this change.
| linspace wrote:
| I find it completely unpythonic. Python has become too
| important to do the right thing, there is money on the table.
| LtWorf wrote:
| I think python is now completely owned by a couple big
| companies that decide everything.
|
| By this logic they should also block me from running benchmarks
| on too big lists, because I'm dossing myself.
| krick wrote:
| This. I don't really understand CPython decision-making
| process, but it just seems like a common sense that anybody who
| would find this a good idea surely must be a very junior
| developer who shouldn't be allowed to commit directly to the
| master branch of your local corporate project just yet... But
| basically breaking a perfectly logical behaviour just like that
| in a language used by millions of people... To me it's
| absolutely shocking.
| loeg wrote:
| Will Python's relentless campaign to break backwards
| compatibility never end? (80% sarcastic.)
| klyrs wrote:
| Don't worry, it's a minor release. (110% sarcastic)
| tremon wrote:
| It's a patch release, not even minor (100% serious).
| mywittyname wrote:
| What should you use instead if you want the original
| functionality?
| Veedrac wrote:
| https://docs.python.org/3/library/stdtypes.html#configuring-...
| mywittyname wrote:
| If I'm understanding this correctly: the only way to convert
| an extremely large base10 string to an integer using the
| standard library is to muck with global interpreter settings?
|
| It seems short sighted to not provide some function that
| mimics legacy functionality exactly. Even if it is something
| like int.parse_string_unlimited(). Especially since a random
| library can just set the cap to 0 and side-step the problem
| entirely.
| Someone wrote:
| > Especially since a random library can just set the cap to
| 0 and side-step the problem entirely.
|
| Until another random library sets it to its preferred value
| (see https://news.ycombinator.com/item?id=32738206 for a
| similar issue with a CPU flag for supporting IEEE
| subnormals)
|
| We might end up with libraries that keep setting that
| global to the value they need on every call into them.
| mywittyname wrote:
| Oh fun. Just what Python needs more of, this...
| try: value = int(value_to_parse)
| except ValueError: import sys
| __old_int_max_str_digits = sys.get_int_max_str_digits()
| sys.set_int_max_str_digits(0) value =
| int(value_to_parse)
| sys.set_int_max_str_digits(__old_int_max_str_digits)
|
| Or maybe just this: class
| UnboundedIntParsing: def __enter__(self):
| self.__old_int_max_str_digits =
| sys.get_int_max_str_digits() return self
| def __exit__(self, *args):
| sys.set_int_max_str_digits(self.__old_int_max_str_digits)
| with UnboundedIntParsing as uip: value =
| int(str_value)
| dmurray wrote:
| Needs to be made thread safe!
| js2 wrote:
| 4300 digits?
|
| > Chosen such that this isn't wildly slow on modern hardware and
| so that everyone's existing deployed numpy test suite passes
| before https://github.com/numpy/numpy/issues/22098 is widely
| available.
|
| https://github.com/python/cpython/blob/511ca9452033ef95bc7d7...
___________________________________________________________________
(page generated 2022-09-07 23:01 UTC)