[HN Gopher] The /o in Ruby regex stands for "oh the humanity "
___________________________________________________________________
The /o in Ruby regex stands for "oh the humanity "
Author : todsacerdoti
Score : 114 points
Date : 2025-08-02 14:37 UTC (8 hours ago)
(HTM) web link (jpcamara.com)
(TXT) w3m dump (jpcamara.com)
| rco8786 wrote:
| Love these sorts of deep dives, thanks!
| cbsmith wrote:
| As an old Perl programmer, I knew immediately what the /o would
| do. ;-)
| Amorymeltzer wrote:
| I've always loved the recent[1] summary from `perlre`:
|
| >o - pretend to optimize your code, but actually introduce bugs
|
| 1: I still think of it as a relatively new change, but it's
| from 2013: <https://github.com/Perl/perl5/commit/7cf040c1f64979
| 0a4040aec...>
| kstrauser wrote:
| It's older than that. The article links to this conversation
| about it in 2003: https://www.perlmonks.org/?node_id=256053
| riffraff wrote:
| Unsurprisingly, `END {}` is also inherited from perl, tho I think
| it originally comes from awk.
| mdaniel wrote:
| Similarly unsurprisingly, with its BEGIN friend
| https://docs.ruby-lang.org/en/3.3/syntax/miscellaneous_rdoc....
|
| In the spirit of "what's old is new again," PowerShell also has
| the same idea, and is done per Function with "begin",
| "process", "end", and "clean" stanzas that allow setup,
| teardown, for-each-item, and "finally" behavior:
| https://learn.microsoft.com/en-us/powershell/module/microsof...
| mananaysiempre wrote:
| Oh, that's an interesting take. I've long been looking for
| newer developments on Awk's clause structure, and this seems
| like an interesting take (though I'm unclear on whether I can
| have multiple begin/end clauses, which are the best thing
| about Awk's version). It also finally connects this idea to
| something else in my mind--specifically advice[1] and CLOS's
| :before/:after/:around methods[2]. (I guess Go's defer also
| counts?)
|
| [1] https://en.wikipedia.org/wiki/Advice_(programming)
|
| [2] https://gigamonkeys.com/book/object-reorientation-
| generic-fu...
| mdaniel wrote:
| It seems not:
|
| Given: function Fred { begin
| { echo "hello from begin1" }
| begin { echo "hello from begin2"
| } process { echo "does the
| magic" } } $bob = @("alpha"
| "beta") $bob | Fred
|
| Then $ pwsh fred.ps1 ParserError:
| /Users/mdaniel/fred.ps1:5 Line | 5 |
| begin { | ~~~~~~~ | Script
| command clause 'begin' has already been defined.
| phoronixrly wrote:
| It's kind of a cool feature. I like it.
| thayne wrote:
| Is it? I can't think of a non-contrived case where this would
| actually be useful.
|
| And in any case where it _would_ be useful, it seems like a
| better way to optimize would just be to refactor the regex out
| into a constant.
| kayodelycaon wrote:
| Actually, I have a way this would work well. If you're
| interpolating a value that comes from configuration and
| wouldn't change.
|
| Example: /admin@#{Rails.config.x.domain}/io
|
| But you're right that a constant would be a lot more clear.
| "o" is a footgun.
| naniwaduni wrote:
| The context is that this is a feature cribbed straight from
| perl, where where it's passed down from perl 4/pre-5.6, where
| _compiled regexen weren 't first-class values_. Pretty much
| every use of it this _century_ is a mistake.
| lupire wrote:
| This is the same problem people have with closures, where it's
| unclear to the user whether the argument is captured by name or
| by value.
| layer8 wrote:
| This isn't the same problem, because this is about whether the
| regex is instantiated each time the code _around_ the regex is
| executed, or only the first time and cached for subsequent
| executions. The same could in theory happen with closures, but
| I haven 't ever seen programming-language semantics where, for
| example, a function containing the definition of a closure that
| depends on an argument of that outer function, would use the
| argument value of the first invocation of the function for all
| subsequent invocations of the function.
|
| For example, when you have fn f x = (y -> x +
| y)
|
| then a sequence of invocations of _f_ f 1 3
| f 2 6
|
| will yield 4 and 8 respectively, but never will the second
| invocation yield 7 due to reusing the value of _x_ from the
| first invocation. However, that is precisely what happens in
| the article 's regex example, because the equivalent is for the
| closure value (y -> x + y) to be cached between invocations, so
| that the _x_ retains the value of the first invocation of _f_
| -- regardless of whether _x_ is a reference by name or by
| value.
| zer00eyz wrote:
| Im sorry but the classics never go out of style:
|
| "Some people, when confronted with a problem, think 'I know, I'll
| use regular expressions.' Now they have two problems."
| stavros wrote:
| Yeah but it's kind of tired when it's being used every time
| someone makes a mistake with regex. I've used them extensively
| in my career and never once regretted it.
| apgwoz wrote:
| The problem with regexps is that "Sometimes a smart person,
| who has done the work, and knows how to leverage regular
| expressions correctly, decides they are appropriate for
| solving a problem where there is shared maintenance. Now, you
| have people who haven't put in the work, and have been told
| repeatedly through 'witty quips' to not bother."
| jodrellblank wrote:
| The second problem being how to deal with all the extra time
| they just freed up?
| fanf2 wrote:
| This is one of the features that Ruby cribbed directly from Perl.
| The Ruby documentation seems really bad, in particular
| "interpolation mode" is grievously misleading.
|
| Perl's documentation is far more clear about the consequences:
|
| (https://perldoc.perl.org/perlop#Regexp-Quote-Like-Operators)
| o Compile pattern only once. [...]
| PATTERN may contain variables, which will be interpolated
| every time the pattern search is evaluated, except for when
| the delimiter is a single quote. [...] Perl will not
| recompile the pattern unless an interpolated variable that
| it contains changes. You can force Perl to skip the test
| and never recompile by adding a /o (which stands for
| "once") after the trailing delimiter. Once upon a time,
| Perl would recompile regular expressions unnecessarily, and
| this modifier was useful to tell it not to do so, in
| the interests of speed. But now, the only reasons to use /o
| are one of: [reasons] The bottom line is
| that using /o is almost never a good idea.
|
| In the days before Perl automatically memoized the compilation of
| regexes with interpolation, even back in the 1990s, it said,
| However, mentioning /o constitutes a promise that you won't
| change the variables in the pattern. If you change them,
| Perl won't even notice.
|
| Perl 4's documentation is briefer. It says,
|
| (https://github.com/Perl/perl5/blob/perl-4.0.00/perl.man#L272...)
| PATTERN may contain references to scalar variables, which
| will be interpolated (and the pattern recompiled) every
| time the pattern search is evaluated. [...] If you want
| such a pattern to be compiled only once, add an "o" after
| the trailing delimiter. This avoids expensive run-time
| recompilations, and is useful when the value you are
| interpolating won't change over the life of the script.
| Johnny555 wrote:
| https://perldoc.perl.org/perlre o - pretend to
| optimize your code, but actually introduce bugs
| Joker_vD wrote:
| > I didn't recognize /o. It didn't seem critically important to
| lookup yet.
|
| > With nothing else to investigate, I finally looked up the docs
| for what the /o regex modifier does.
|
| I'll probably never understand this mode of thinkning. But then
| again, Ruby programmers are, after all, people who chose to write
| Ruby.
|
| > /o is referred to as "Interpolation mode", which sounded pretty
| harmless.
|
| _Really_? Those words sound quite alarming to me, due to
| personal reminiscences of eval.
|
| Also, this whole "/o" feaure seems insane. If I have an
| interpolation in my regex, _obviously_ I have to re-interpolate
| it every time a new value is submitted, or I 'd hit this very
| bug. And if the value _is_ expected to the same every time, then
| I can just compile it once and save the result myself, right? In
| which case, I probably could even do without interpolation in the
| first place.
| apgwoz wrote:
| "Compilation", I think, is exactly right. This feature is less
| about interpolation than it is about compilation of a single
| regexp to be used many times. It's just shrouded in confusing
| documentation that should say: "/o tells ruby to rewrite this
| code such that it refers to a new statically allocated regexp
| object." And when you write it that way, you see how insane it
| is for a function call to be hoisted automatically like this,
| without an explicit, obvious, syntactic annotation.
| gpvos wrote:
| The implications of "statically allocated" are less clear
| than if you'd just write "compiled only once".
| gpvos wrote:
| It's a feature dating from the 1990s, when Perl (and I guess
| Ruby?) didn't have a way for the user to store a compiled
| regex, and this was a useful shortcut for a very specific
| optimization, which Ruby documented badly. Perl (and I guess
| Ruby?) later evolved in a way that made /o unnecessary, but the
| (now mis)feature remained.
| kazinator wrote:
| > Modifier o means that the first time a literal regexp with
| interpolations is encountered, the generated Regexp object is
| saved and used for all future evaluations of that literal regexp.
|
| That is crystal clear to me. It means that on the next execution,
| the new values of the interpolation will be ignored; the regexp
| is now "baked" with the first ones.
|
| Like this in C++: void fun(int arg) {
| static int once = arg; }
|
| if we call this as f(42) the first time, once gets initialized to
| 42. If we then call it f(73), once stays 42.
|
| There is a function in POSIX for once-only initializations:
| pthread_once. C++ compilers for multithreaded environments emit
| thread-safe code to do something similar to pthread_once to
| ensure that even if there are several concurrent first
| invocations of the function, the initialization happens once.
| IshKebab wrote:
| Seems par for the course for Ruby.
| jononor wrote:
| It looks like an emoji for someone getting bashed in the head
| with a long stick. So that makes sense?
| tialaramex wrote:
| This is a footgun. A language should strive not to add footguns.
| Every footgun you provide, somebody is going to blow their foot
| off with it, so that's a high price. If your language is popular
| it might be a _lot_ of somebodies.
|
| The opposite behaviour (we have a constant regular expression, we
| re-use it often but the tooling doesn't realise and so it's
| created each time we mention it) is not a footgun, it results in
| poor performance, and so you might want (especially in some
| managed languages) to just magically optimise this case, but if
| not you won't cause mysterious bugs. An expert, asked "Why is
| this slow?" can just fix it - you have to supply basic tools for
| that, but this flag is not a sensible tool.
___________________________________________________________________
(page generated 2025-08-02 23:00 UTC)