[HN Gopher] We fixed f-string typos in popular Python repos
___________________________________________________________________
We fixed f-string typos in popular Python repos
Author : rikatee
Score : 94 points
Date : 2022-04-29 18:06 UTC (4 hours ago)
(HTM) web link (highertier.com)
(TXT) w3m dump (highertier.com)
| safwan wrote:
| f-string does not work with GNU/gettext! It is also a common
| mistake that people make!
| hn_saver wrote:
| nice
| mhils wrote:
| We've been one of 666 repos, and I'm not too happy of having our
| repo used as advertising space. Some thoughts:
|
| - I'm happy to receive fix-a-typo PRs from human users. In this
| case the other side demonstrated that they care by putting in a
| bit of manual effort, and a small PR often paves the way towards
| larger contributions. I also know that open source beginners get
| really excited about their first small contributions, and I'm
| honestly happy to support that.
|
| - In contrast, the marginal effort for bot PRs is ~0. It's very
| easy to generate a small amount of work for a lot of people, and
| the nice side effect is that the bot's platform is advertised
| everywhere. As a maintainer, I have never given consent to this
| and I have no choice to opt out.
|
| We are very happy users of some GitHub bots, but I feel it needs
| to be an active adoption decision by the maintainer. If you want
| to pitch me your service you may send me an unsolicited email,
| but don't use our public space to advertise your product without
| asking.
|
| Edit: I don't want to be too harsh to OP here - at least they
| pointed out a small but valid issue in our case. I very much
| appreciate their apology at
| https://news.ycombinator.com/item?id=31210245
| IshKebab wrote:
| I feel the same way about those bots that tell you about
| insignificant security vulnerabilities in some project you
| abandoned. It's basically spam.
|
| That said, this does seem like it is a bit more useful. As long
| as they actually read the changes and make sure they aren't
| false positives. Which I'm guessing they didn't do for 666
| repos.
| b6dybuyv wrote:
| > As long as they actually read the changes and make sure
| they aren't false positives. Which I'm guessing they didn't
| do for 666 repos.
|
| In the article they say that "really a bot found the problem
| and made the PR, but really a human developer at Code Review
| Doctor did triage the issue before the PR was raised)".
| TameAntelope wrote:
| I just... think you should reconsider your stance on this. If
| you made a mistake in a public repo and someone else caught it
| (via scan of your repo or otherwise), it's a pretty bad look to
| be anything but grateful at that point, PR benefits for the bot
| aside.
| omegalulw wrote:
| I understand the sentiment but you should be judging the PR,
| not the source. Ask yourself: would you have happily accepted
| the same PR that the bot sent if it came from a human?
|
| By all means, I am not against having bots identify themselves
| properly, my point is that "effort from bot PR is ~0", "it
| advertises their platform" are simply not the right reasons to
| judge this situation by.
| memco wrote:
| The article links to some docs for the logging module here:
| https://docs.python.org/3/howto/logging.html#optimization
| asserting that f-strings are less optimal but the docs do not say
| that they do not optimize our the expression evaluation of
| f-strings: only that the logging module tried to perform
| evaluation as late as possible: where is the f-string described
| as suboptimal?
|
| Relatedly the logging optimization suggests setting:
| raiseExceptions to false for production logging code: where is
| that set? On the logger, handler or something else?
| bobbiechen wrote:
| I was also confused by the expression evaluation thing. Reading
| between the lines, it seems like
| logger.debug("hello %s", foo)
|
| may be better than logger.debug(f"hello
| {foo}")
|
| in the case when loglevel is higher than debug. In the first
| version, the final string does not have to be computed, while
| in the second version, we might construct the string and then
| do nothing since the loglevel is excluded.
| jrootabega wrote:
| You're also able to add additional log-specific processors to
| the log record in the first case.
| masklinn wrote:
| That's exactly it.
|
| Although this becomes more complicated because printf-style
| string formatting is not free (though it's the cheapest of
| all methods save fstrings if I remember correctly), and
| because python does not support lazy parameters if `foo` is a
| non-trivial expression odds are good it will far outcost
| either formatting.
| nimish wrote:
| The "Use logging's interpolation" warning has always annoyed
| me. If logging is in your hot path, that might be an issue. Me
| f-string interpolating some info logs that run once for
| convenience is not.
|
| Low value junk like this is not helpful.
| aib wrote:
| > where is the f-string described as suboptimal?
|
| I guess it's implicit in that f-strings, as arguments, will be
| evaluated before the logging function can even run whereas
| `debug("...", heavy_obj)` will avoid a potentially expensive
| `str(heavy_obj)` (or whatever the string conversion warrants.)
|
| As for raiseExceptions, I'm not sure it's for optimization. It
| looks like an old sanity check for bad logging configurations.
| dewey wrote:
| > For science you can see the reactions here.
|
| That link seems to be broken:
| https://github.com/issues?q=is%3Aissue+author%3Acode-review-...
|
| I was actually surprised to read that people would ignore or be
| annoyed by a bot raising a valid PR that can be easily merged
| after a quick glance. What would be the reason for that?
| VWWHFSfQ wrote:
| Because this is basically just PR spam
| dekhn wrote:
| to me, well-intentioned systems wiht a high true positive
| rate and low false positive rate are welcome so long as they
| follow reasonable etiquette and norms, which this group seems
| to do.
| mhils wrote:
| In our case OPs bot did not open a PR which could have been
| merged quickly, but filed an issue instead.
| Waterluvian wrote:
| I expect to see the entire gamut of possible reactions with a
| sufficient number of bot PRs. But in looking at 10 of them at
| random, I didn't find a single "negative response."
|
| (I don't think ignoring it is invalid or wrong by any means,
| given there's so many reasons one might not engage in a timely
| manner, or at all, in the issues section or PRs. I don't
| monitor my repos issues because I just don't feel interested in
| supporting my code. Feel free to fork or ignore!)
| vitus wrote:
| Some negative reactions:
|
| https://github.com/mitmproxy/mitmproxy/issues/5285
|
| https://github.com/Qiskit/qiskit-terra/issues/7981
|
| https://github.com/beetbox/beets/issues/4340
|
| I do think those concerns are legitimate. (I also think more
| tooling is a good thing!)
| Waterluvian wrote:
| Thank you for sharing these links!
| dekhn wrote:
| I looked through all three. The first isn't really a
| complaint because the bot acted in good faith and found an
| error. In the second one they complained abiout a missing
| unsubscribe link (reasonable) and in the third one, the
| author should update their code so it doesn't create a
| variable named path, then a non-f-string that includes
| "{path}". I had to stare at the author's comment that it
| was a false positive for quite a bit to convince myself
| they were right.
| vitus wrote:
| I will point out that in the first two issues, the repo
| owners also edited the initial report with something
| along the lines of "removed ad".
|
| I disagree that the first isn't a complaint -- the owner
| stated that this behavior isn't appreciated but decided
| to let it slide because the issue was valid.
|
| In the third issue, the owner also explicitly stated: "I
| don't think bots posting unsolicited static analysis
| results are a good idea." I have no opinions on whether
| the code should be clearer, but that doesn't change the
| validity of the reaction.
| klyrs wrote:
| The bot-account's (apparently human-written) reply of
| "you're very welcome" to the complaint in the third issue
| is downright dismissive of the problem and kinda passive
| aggressive. While it seems that the bot did good work
| overall, the human(s) handling edge cases need work.
| mhils wrote:
| We've blocked the bot after their script malfunctioned
| and they opened a second issue with exactly the same text
| (https://github.com/mitmproxy/mitmproxy/issues/5286).
| rikatee wrote:
| klyrs was right about the reply from me (a dev behdind
| Code Review Doctor) being dismissive in the issue. I
| apologise for that.
|
| FWIW my reaction was classic "expectations not meeting
| reality": weeks of work to do (what I thought) was a
| mutually beneficial helpful thing. I was naively not
| expecting non-positive responses and was ill prepared
| when you raised valid concerns I had not considered.
|
| Again, I am working on that and sorry I was passive
| aggressive to you.
| mhils wrote:
| Thank you - that explanation makes sense, apology
| accepted. :)
| klyrs wrote:
| Nice response, thanks.
| dekhn wrote:
| I do think the question of "how should bots that do
| static analysis work" is an important one, but in the
| meantime, people are gonna bot and repo managers are
| gonna complain.
| Forge36 wrote:
| What I've found from doing similar types of changes.
|
| 1. It's hard to explain the impact to the application of the
| current problem. Thus it looks like a theoretical issue
|
| 2. Sometimes people rely on the bug for their code to work
|
| 3. Surprise work can be poorly received (ie: not the current
| priority)
| TrickardRixx wrote:
| Automated checking of potential bugs in f-strings is hard.
| There are lots of false positives. You can see some discussion
| around this kind of rule in pylint [0]. At the end of the day,
| the choice to run automated linting tools on a repo is up to
| the maintainers. Autogenerating PRs like this is incredibly
| noisy and comes off to me as a blatant advertisement for their
| "code review doctor" product.
|
| [0] https://github.com/PyCQA/pylint/issues/5039
| zamadatix wrote:
| In reactions they conveniently left out "false positives we
| still hadn't weeded out". On top of that it can be annoying to
| have bots making trivial PRs in their own format when you've
| got a well defined process for it. Lastly it was basically
| spamming an ad link for the service at the end of the PR
| comment - even if the other issues didn't come up it's not
| always well received to do that.
|
| Looking at 1 bot it doesn't sound bad, when you have everyones
| bot doing this kind of stuff it can quickly become more of a
| nuisance than a help.
| cinntaile wrote:
| https://github.com/Qiskit/qiskit-terra/pull/7982
|
| That guy was not happy. I do agree that it's basically
| advertising and that's annoying.
| jjoonathan wrote:
| You're assuming that the PR is valid, but a maintainer can't
| make that assumption. They have to do the thankless work to
| figure out the context and handle the fallout if they get it
| wrong. Let's look at who wins: * Small
| benefit to bot creator * Tiny benefit to project
| * Modest cost to maintainer
|
| Waves of low-effort resume-padding commits are already a thing.
| Not a big problem, but bots clearly have the potential to
| multiply the small problem into a big problem.
|
| I'm still open to the idea that bots could be a net win,
| because most projects really do have heaps of small simple
| mistakes lying around. I'm sympathetic to the maintainers
| though. They always seem to get the short end of the stick.
| malcolmgreaves wrote:
| You can also use flake8 to find this, and even more, errors in
| Python code.
| rikatee wrote:
| flake8 does not currently support this check, as they are
| concerned about the false positives from "what if the string it
| later used in .format(...)"
|
| However, Code Review Doctor is more of a "this MIGHT be a
| problem. have you considered..." rather than "it wrong"
| bvinc wrote:
| So they checked 666 python repositories and fixed bugs in 69 of
| them. Interesting choice of numbers.
| defterGoose wrote:
| ...doing the devil's work.
| racl101 wrote:
| Nice!
| InfiniteRand wrote:
| Fixing F-strings
| readthenotes1 wrote:
| I find it ironic that the article points out that relying on
| error from humans to find errors is something of a hit or miss
| proposition and suggests that automating error finding is an
| appropriate course instead of making it less likely to make the
| error in the first place.
|
| For example, I wonder how many errors would have been found if
| the definition of a format string was the default? That is, how
| many times would people have written something like "hello
| {previously-defined-variable}" and not meant to substitute the
| value of that previously defined variable at runtime?
| rikatee wrote:
| what's also ironic is I left an easter egg in the code sample
| for how we downloaded the list of repositories and no one has
| noticed it yet.
| swatcoder wrote:
| To be fair, your suggestion might make for a more resilient
| default, but it's also a great way to leak data and add
| overhead for the default case. There are tradeoffs.
| Someone wrote:
| Not much overhead, I would think. We're talking about literal
| strings in source code, not strings in general. It's not much
| work to check those.
|
| One thing that it would break is that strings read from files
| would be treated differently from those in source code, even
| those read from files that logically "belong" to the
| application (say config file)
|
| I don't think that's an issue, though.
|
| Also, in Swift _" \\(foo)"_ does string interpolation. I
| haven't seen people complain it leaks data or makes Swift
| slow (but then, it's not fast at compiling at all because of
| its rather complicated type inference)
| JadeNB wrote:
| > Also, in Swift "\\(foo)" does string interpolation. I
| haven't seen people complain it leaks data or makes Swift
| slow (but then, it's not fast at compiling at all because
| of its rather complicated type inference)
|
| I think that the claim is not that this leaks data in an
| absolute sense, but rather that changing the behaviour
| after people have come to rely on it will leak data from
| currently well behaving applications.
| boxed wrote:
| This is how strings work in swift. It's a much superior system
| imo.
| MauranKilom wrote:
| I don't think this makes sense. Plain strings and format
| strings are not interchangeable, and using one where the other
| was meant is probably a bug.
|
| Would you expect that a user input like "{secret} please" is
| interpolated? If so, we hopefully agree that this would blow
| major security holes into any python script processing
| untrusted user input. And if not... Why not?
| kgeist wrote:
| >Would you expect that a user input like "{secret} please" is
| interpolated?
|
| That's basically what the recent log4j security vulnerability
| was all about. "Helpfully" interpolating logs by default.
| noobermin wrote:
| The assumption I'm thinking they mean is to make formatting
| default and unformatted not default, for example, how "raw"
| strings were treated, escaped characters are replaced with
| the ascii code by default unless the string is raw, signified
| by an 'r' prefixed in front.
| Sohcahtoa82 wrote:
| Adding that behavior would break existing code that uses
| str.format, and Python tries to avoid breaking code between
| minor releases.
| asvitkine wrote:
| If you only make it work with string literals (e.g. generate
| the underlying formatting logic at parse time), it wouldn't
| allow arbitrary inputs to be treated as f strings.
| boxed wrote:
| Look up how this works in Swift. They only have one string.
| No raw strings or f strings. Yet they have all the power of
| all three python string types and less syntax. It's very
| nice.
| HL33tibCe7 wrote:
| That's not really a feasible solution in Python because that
| change would break a load of existing code.
| mschuster91 wrote:
| So what? Raise a deprecation notice, treat it as a fatal
| error in two or three years and that's it. PHP has been doing
| this for years now.
| krisoft wrote:
| > treat it as a fatal error
|
| Did you think this through? What would you treat as a fatal
| error? How would the compiler know if a particular string
| is old style code wanting to print some characters between
| curly braces or new style code wanting to string
| interpolate a variable?
| noobermin wrote:
| We've done this too many times and we've had enough pain,
| let's please proceed at a pace where we can worry about
| delivering our product and not updating formatted strings,
| thank you.
| bornfreddy wrote:
| Also see: Python 2 => 3 hell. Nobody wants to repeat that.
| HL33tibCe7 wrote:
| We're not talking about deprecating a feature here, we're
| talking about the addition of behaviour that will break
| existing code, potentially in non-trivial and hard to debug
| ways, and in ways that could easily introduce security
| vulnerabilities.
| capitalsigma wrote:
| Just bump the major version number from 3 to 4, right? How
| long could that migration take?
| dekhn wrote:
| Python does not do this. A change like that would require a
| major version number increment and the community would
| revolt.
|
| Too bad we can't go back in time to 1996 or so.
| swatcoder wrote:
| As someone who would like to be working on new, interesting
| things in 2-3 years rather than bringing old code into
| conformance with breaking changes, this attitude captures a
| worrisome trend in development.
|
| On the one hand, it's great that we have platforms that
| innovate and improve and harden over time, but we're also
| facing a development culture where more and more time is
| spent servicing package/platform/language/OS changes that
| have no material impact on our own otherwise-mature
| projects.
|
| It's worth being judicious about where breaking changes are
| applied, right?
| mulmen wrote:
| One person's "new interesting thing" is another person's
| "breaking change".
| macspoofing wrote:
| >...and suggests that automating error finding is an
| appropriate course instead of making it less likely to make the
| error in the first place.
|
| You can't fix the syntax and standard lib of the language. It
| is what it is. Similarly, how many bugs would you prevent if
| Python had compiler support to catch those types of syntax (and
| type) errors.
| ggm wrote:
| Maybe catenation of an fString and a string _should_ yield an
| fString by type promotion? String is morally "any" so it feels
| to me like a contextual narrowing of type.
| zikohh wrote:
| Can I use code review doctor with gitlab? If not what options do
| I have?
| mjs7231 wrote:
| This was posted on Reddit earlier this week with similar negative
| responses:
| https://www.reddit.com/r/Python/comments/ubkvrd/10_of_the_66...
| rikatee wrote:
| FWIW, HN is much more positive (while also raising valid points
| that will be taken into account going forward)
| snapetom wrote:
| I'd like to add better technical discussion, too.
| fareesh wrote:
| I like python although I don't use it too often. Would it be
| unfairly critical of me to say that this is the outcome of a bad
| design choice? Ideally languages should be designed in a way that
| a bug like this which is so widespread and easy to create, should
| be caught via some mechanism, either linting or some part of the
| process.
| noobermin wrote:
| The f-strings are a recent (may be not so recent now) addition
| to the language, so all the errors stem from it being "new"
| where people's reflexes / carefulness hasn't adjusted to them
| yet.
|
| I think in addition to the suggestion for linters, updating
| IDE/editors to incorporate them would help. Syntax highlighting
| is the primary reason not terminating strings isn't that common
| of an error anymore, coloring it differently than a normal
| string might help (or may be it would make things ugly, I don't
| know).
| polio wrote:
| Most developers will require an arsenal of static analysis
| tools to achieve maximum productivity. Linters are an example
| of such a tool, but they don't exist as part of the language
| spec itself, AFAIK.
| f7fg_u-_h wrote:
| dattboii wrote:
| HL33tibCe7 wrote:
| > > We may be looking too deep into this but it seems like many
| developers think when string concatenation occurs it's enough to
| declare the first string as an f-string and the other strings are
| turned into f-strings by osmosis. It doesn't. We're not
| suggesting this is the case for all developers that accidentally
| did this error, but interesting nonetheless.
|
| I highly doubt that people believed that f-strings worked this
| way. Far more likely is that, for example, the expression started
| as one line, then got split onto two, or some such similar
| scenario.
| badrabbit wrote:
| You'd be surprised, people who expect python to be "smart" and
| "figure it out" might think that way.
| groestl wrote:
| Well, it's not completely unlogical.
|
| 'a' is str
|
| b'a' is bytes
|
| f'something' might be a separate f-str-type too?
|
| 1 is an int
|
| 1.2 is a float
|
| (1.2 + 1) is a float
| GeorgeTirebiter wrote:
| Indeed, if "{value} is bad" can be automatically f-stringed
| by an external program automatically --- then why can't
| Python do this automatically -- so we can get rid of the
| f-string type as a required explicit declaration? After
| all, we don't specifically add a type to a number like 42
| or 3.14159 --- those are implicitly 'int' and 'float'
| types.
|
| I would use such a feature, as I always use f-strings when
| formatting.
| bagels wrote:
| Backwards compatibility for one. Code existed before
| fstrings that may use curly braces, and you can currently
| use curly braces in non fstrings without escaping them.
|
| Might also be a performance penalty for always having to
| run the fstring parser.
| dragonwriter wrote:
| If all strings with what looks like format specs are
| implicitly f-strings, how do you use reusable template
| strings, which use the same basic internal syntax, for
| formatting?
|
| f-strings, like r-strings, have uses, but, like
| r-strings, I wouldn't want to replace plain strings with
| them.
| masklinn wrote:
| > Indeed, if "{value} is bad" can be automatically
| f-stringed by an external program automatically --- then
| why can't Python do this automatically -- so we can get
| rid of the f-string type as a required explicit
| declaration?
|
| Because it would break existing strings containing
| braces, such as those used with `str.format`, or
| string.Template, or literal Jinja templates, ...
| GeorgeTirebiter wrote:
| Yes, my use case was "I always use F-strings" so these
| other breakages could not occur, by definition.
|
| I suppose it's not that much of a problem to run a
| program to preprocess source and add in the F
|
| But.. Python has a history of introducing new features
| that break old ones. That seems to me a balance between
| backward compatibility and future goodness.
|
| the "from future import auto-fstring" construct could do
| it...
|
| And, as for having to run the f-string parser: yes, but
| only once on static strings, which are most of them.
| chrisshroba wrote:
| It's not clear to me that folks would _want_ this
| behavior globally - I personally would not, because
| sometimes I want to be able to include curly braces in my
| strings without it being interpolated, and I prefer the
| current syntax of having that just work. I 'm very
| comfortable with having to add the 'f' to the string
| syntax to declare that this _particular_ string should be
| interpolated.
|
| There are other issues you'd have to deal with as well.
| Would you interpolate non-literal strings (e.g. in the
| code `print(input())`, if the user inputs the string
| "{1+1}", what would be printed?)? How would you propose
| dealing with jinja and other strings that typically
| contain curly braces that should not be interpolated?
| This could also be an issue with (potentially long)
| strings containing lots of random characters: if a '}'
| showed up in a string somewhere after a '{', even if
| separated by tens or hundreds of characters, then you'll
| run into errors. This could be a problem if you're
| dealing with pseudorandom or base-92 encoded strings, or
| even just strings representing code (imagine a python
| library that generates C++ or Java code which has lots of
| hard coded strings with braces).
|
| I think overall, having to specify that a particular
| string should be interpolated is a better solution than
| having to specify that a string should _not_ be
| interpolated.
| GeorgeTirebiter wrote:
| Well, you could put a N" your string with{} in it" to
| type-specify that you are not F-string. In fact, WHY are
| strings not fully specified every time? There is nothing
| wrong with F"This is also an f-string". After all,
| strings are just number sequences with special meanings
| assigned to some values in some circumstances.
|
| Seems to me the problem is similar to wanting 1 be
| interpreted as an int --- if you want float, you change
| the syntax (and therefore semantics) to 1. or 1.0
|
| It's always possible to conjure corner-case failure
| modes; but shouldn't the 'common case' be catered to,
| more than some base-92 encoded strings?
|
| And, by the by, more 'smarts' can be applied to automatic
| f-string determination. If "{variable-that-exists}
| foobar" is seen it could plausibly be converted to an
| f-string.
|
| This leads into a much longer discussion of how our
| compilers/interpreters are too stupid today, and need to
| up their game. But probably not here, not now.... and
| also, thank you for your observations & comments.
| dragonwriter wrote:
| > It's always possible to conjure corner-case failure
| modes; but shouldn't the 'common case' be catered to
|
| It is: normal strings are the common case.
| ArchOversight wrote:
| Because in some cases it is not desired, if for example
| the string is used in something that expands it further
| down the line (think SQL injection potential).
| HL33tibCe7 wrote:
| As a beginner, maybe. But this is code in some of the biggest
| open-source Python repos in existence. Probably written by
| someone with a reasonable level of Python expertise.
| groestl wrote:
| I think what's happening is that people assume there's type
| coercion going on here.
| jldugger wrote:
| I wonder if an autoformatter like black is at play here.
| saila wrote:
| Black doesn't split strings, and I doubt they'd choose to use
| concatenation if they did.
| johnny_castaway wrote:
| This may be also caused by confusing syntax highlighting in
| some editors, for example in VSCode [1]
|
| The variable in the second string gets highlighted (with
| slightly different color, but still) because it would still
| work with `str.format()`. GitHub doesn't seem to do this.
|
| [1] https://imgur.com/a/9KGWVG0
| totony wrote:
| Yeah I got bitten by the same-color syntax highlighting a few
| times.
| baisq wrote:
| The comments on this submission are overwhelmingly negative. Why
| are the comments on PVS-Studio submissions, on the other hand,
| generally positive?
| bayareabadboy wrote:
___________________________________________________________________
(page generated 2022-04-29 23:00 UTC)