[HN Gopher] The /o in Ruby regex stands for "oh the humanity "
       ___________________________________________________________________
        
       The /o in Ruby regex stands for "oh the humanity "
        
       Author : todsacerdoti
       Score  : 114 points
       Date   : 2025-08-02 14:37 UTC (8 hours ago)
        
 (HTM) web link (jpcamara.com)
 (TXT) w3m dump (jpcamara.com)
        
       | rco8786 wrote:
       | Love these sorts of deep dives, thanks!
        
       | cbsmith wrote:
       | As an old Perl programmer, I knew immediately what the /o would
       | do. ;-)
        
         | Amorymeltzer wrote:
         | I've always loved the recent[1] summary from `perlre`:
         | 
         | >o - pretend to optimize your code, but actually introduce bugs
         | 
         | 1: I still think of it as a relatively new change, but it's
         | from 2013: <https://github.com/Perl/perl5/commit/7cf040c1f64979
         | 0a4040aec...>
        
           | kstrauser wrote:
           | It's older than that. The article links to this conversation
           | about it in 2003: https://www.perlmonks.org/?node_id=256053
        
       | riffraff wrote:
       | Unsurprisingly, `END {}` is also inherited from perl, tho I think
       | it originally comes from awk.
        
         | mdaniel wrote:
         | Similarly unsurprisingly, with its BEGIN friend
         | https://docs.ruby-lang.org/en/3.3/syntax/miscellaneous_rdoc....
         | 
         | In the spirit of "what's old is new again," PowerShell also has
         | the same idea, and is done per Function with "begin",
         | "process", "end", and "clean" stanzas that allow setup,
         | teardown, for-each-item, and "finally" behavior:
         | https://learn.microsoft.com/en-us/powershell/module/microsof...
        
           | mananaysiempre wrote:
           | Oh, that's an interesting take. I've long been looking for
           | newer developments on Awk's clause structure, and this seems
           | like an interesting take (though I'm unclear on whether I can
           | have multiple begin/end clauses, which are the best thing
           | about Awk's version). It also finally connects this idea to
           | something else in my mind--specifically advice[1] and CLOS's
           | :before/:after/:around methods[2]. (I guess Go's defer also
           | counts?)
           | 
           | [1] https://en.wikipedia.org/wiki/Advice_(programming)
           | 
           | [2] https://gigamonkeys.com/book/object-reorientation-
           | generic-fu...
        
             | mdaniel wrote:
             | It seems not:
             | 
             | Given:                   function Fred {             begin
             | {                 echo "hello from begin1"             }
             | begin {                 echo "hello from begin2"
             | }             process {                 echo "does the
             | magic"             }         }         $bob = @("alpha"
             | "beta")         $bob | Fred
             | 
             | Then                   $ pwsh fred.ps1         ParserError:
             | /Users/mdaniel/fred.ps1:5         Line |            5 |
             | begin {              |      ~~~~~~~              | Script
             | command clause 'begin' has already been defined.
        
       | phoronixrly wrote:
       | It's kind of a cool feature. I like it.
        
         | thayne wrote:
         | Is it? I can't think of a non-contrived case where this would
         | actually be useful.
         | 
         | And in any case where it _would_ be useful, it seems like a
         | better way to optimize would just be to refactor the regex out
         | into a constant.
        
           | kayodelycaon wrote:
           | Actually, I have a way this would work well. If you're
           | interpolating a value that comes from configuration and
           | wouldn't change.
           | 
           | Example: /admin@#{Rails.config.x.domain}/io
           | 
           | But you're right that a constant would be a lot more clear.
           | "o" is a footgun.
        
           | naniwaduni wrote:
           | The context is that this is a feature cribbed straight from
           | perl, where where it's passed down from perl 4/pre-5.6, where
           | _compiled regexen weren 't first-class values_. Pretty much
           | every use of it this _century_ is a mistake.
        
       | lupire wrote:
       | This is the same problem people have with closures, where it's
       | unclear to the user whether the argument is captured by name or
       | by value.
        
         | layer8 wrote:
         | This isn't the same problem, because this is about whether the
         | regex is instantiated each time the code _around_ the regex is
         | executed, or only the first time and cached for subsequent
         | executions. The same could in theory happen with closures, but
         | I haven 't ever seen programming-language semantics where, for
         | example, a function containing the definition of a closure that
         | depends on an argument of that outer function, would use the
         | argument value of the first invocation of the function for all
         | subsequent invocations of the function.
         | 
         | For example, when you have                   fn f x = (y -> x +
         | y)
         | 
         | then a sequence of invocations of _f_                   f 1 3
         | f 2 6
         | 
         | will yield 4 and 8 respectively, but never will the second
         | invocation yield 7 due to reusing the value of _x_ from the
         | first invocation. However, that is precisely what happens in
         | the article 's regex example, because the equivalent is for the
         | closure value (y -> x + y) to be cached between invocations, so
         | that the _x_ retains the value of the first invocation of _f_
         | -- regardless of whether _x_ is a reference by name or by
         | value.
        
       | zer00eyz wrote:
       | Im sorry but the classics never go out of style:
       | 
       | "Some people, when confronted with a problem, think 'I know, I'll
       | use regular expressions.' Now they have two problems."
        
         | stavros wrote:
         | Yeah but it's kind of tired when it's being used every time
         | someone makes a mistake with regex. I've used them extensively
         | in my career and never once regretted it.
        
           | apgwoz wrote:
           | The problem with regexps is that "Sometimes a smart person,
           | who has done the work, and knows how to leverage regular
           | expressions correctly, decides they are appropriate for
           | solving a problem where there is shared maintenance. Now, you
           | have people who haven't put in the work, and have been told
           | repeatedly through 'witty quips' to not bother."
        
         | jodrellblank wrote:
         | The second problem being how to deal with all the extra time
         | they just freed up?
        
       | fanf2 wrote:
       | This is one of the features that Ruby cribbed directly from Perl.
       | The Ruby documentation seems really bad, in particular
       | "interpolation mode" is grievously misleading.
       | 
       | Perl's documentation is far more clear about the consequences:
       | 
       | (https://perldoc.perl.org/perlop#Regexp-Quote-Like-Operators)
       | o   Compile pattern only once.            [...]
       | PATTERN may contain variables, which will be       interpolated
       | every time the pattern search is       evaluated, except for when
       | the delimiter is a       single quote. [...] Perl will not
       | recompile the       pattern unless an interpolated variable that
       | it contains changes. You can force Perl to skip       the test
       | and never recompile by adding a /o       (which stands for
       | "once") after the trailing       delimiter. Once upon a time,
       | Perl would recompile       regular expressions unnecessarily, and
       | this       modifier was useful to tell it not to do so,       in
       | the interests of speed. But now, the only       reasons to use /o
       | are one of:            [reasons]            The bottom line is
       | that using /o is almost       never a good idea.
       | 
       | In the days before Perl automatically memoized the compilation of
       | regexes with interpolation, even back in the 1990s, it said,
       | However, mentioning /o constitutes a promise       that you won't
       | change the variables in the       pattern. If you change them,
       | Perl won't even       notice.
       | 
       | Perl 4's documentation is briefer. It says,
       | 
       | (https://github.com/Perl/perl5/blob/perl-4.0.00/perl.man#L272...)
       | PATTERN may contain references to scalar       variables, which
       | will be interpolated       (and the pattern recompiled) every
       | time the       pattern search is evaluated. [...] If you want
       | such a pattern to be compiled only once, add       an "o" after
       | the trailing delimiter. This       avoids expensive run-time
       | recompilations, and       is useful when the value you are
       | interpolating       won't change over the life of the script.
        
         | Johnny555 wrote:
         | https://perldoc.perl.org/perlre                 o  - pretend to
         | optimize your code, but actually introduce bugs
        
       | Joker_vD wrote:
       | > I didn't recognize /o. It didn't seem critically important to
       | lookup yet.
       | 
       | > With nothing else to investigate, I finally looked up the docs
       | for what the /o regex modifier does.
       | 
       | I'll probably never understand this mode of thinkning. But then
       | again, Ruby programmers are, after all, people who chose to write
       | Ruby.
       | 
       | > /o is referred to as "Interpolation mode", which sounded pretty
       | harmless.
       | 
       |  _Really_? Those words sound quite alarming to me, due to
       | personal reminiscences of eval.
       | 
       | Also, this whole "/o" feaure seems insane. If I have an
       | interpolation in my regex, _obviously_ I have to re-interpolate
       | it every time a new value is submitted, or I 'd hit this very
       | bug. And if the value _is_ expected to the same every time, then
       | I can just compile it once and save the result myself, right? In
       | which case, I probably could even do without interpolation in the
       | first place.
        
         | apgwoz wrote:
         | "Compilation", I think, is exactly right. This feature is less
         | about interpolation than it is about compilation of a single
         | regexp to be used many times. It's just shrouded in confusing
         | documentation that should say: "/o tells ruby to rewrite this
         | code such that it refers to a new statically allocated regexp
         | object." And when you write it that way, you see how insane it
         | is for a function call to be hoisted automatically like this,
         | without an explicit, obvious, syntactic annotation.
        
           | gpvos wrote:
           | The implications of "statically allocated" are less clear
           | than if you'd just write "compiled only once".
        
         | gpvos wrote:
         | It's a feature dating from the 1990s, when Perl (and I guess
         | Ruby?) didn't have a way for the user to store a compiled
         | regex, and this was a useful shortcut for a very specific
         | optimization, which Ruby documented badly. Perl (and I guess
         | Ruby?) later evolved in a way that made /o unnecessary, but the
         | (now mis)feature remained.
        
       | kazinator wrote:
       | > Modifier o means that the first time a literal regexp with
       | interpolations is encountered, the generated Regexp object is
       | saved and used for all future evaluations of that literal regexp.
       | 
       | That is crystal clear to me. It means that on the next execution,
       | the new values of the interpolation will be ignored; the regexp
       | is now "baked" with the first ones.
       | 
       | Like this in C++:                 void fun(int arg)       {
       | static int once = arg;       }
       | 
       | if we call this as f(42) the first time, once gets initialized to
       | 42. If we then call it f(73), once stays 42.
       | 
       | There is a function in POSIX for once-only initializations:
       | pthread_once. C++ compilers for multithreaded environments emit
       | thread-safe code to do something similar to pthread_once to
       | ensure that even if there are several concurrent first
       | invocations of the function, the initialization happens once.
        
       | IshKebab wrote:
       | Seems par for the course for Ruby.
        
       | jononor wrote:
       | It looks like an emoji for someone getting bashed in the head
       | with a long stick. So that makes sense?
        
       | tialaramex wrote:
       | This is a footgun. A language should strive not to add footguns.
       | Every footgun you provide, somebody is going to blow their foot
       | off with it, so that's a high price. If your language is popular
       | it might be a _lot_ of somebodies.
       | 
       | The opposite behaviour (we have a constant regular expression, we
       | re-use it often but the tooling doesn't realise and so it's
       | created each time we mention it) is not a footgun, it results in
       | poor performance, and so you might want (especially in some
       | managed languages) to just magically optimise this case, but if
       | not you won't cause mysterious bugs. An expert, asked "Why is
       | this slow?" can just fix it - you have to supply basic tools for
       | that, but this flag is not a sensible tool.
        
       ___________________________________________________________________
       (page generated 2025-08-02 23:00 UTC)