[HN Gopher] Regular Expressions which query an Oracle
       ___________________________________________________________________
        
       Regular Expressions which query an Oracle
        
       Author : agnishom
       Score  : 28 points
       Date   : 2024-11-26 12:28 UTC (1 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | shayanjm wrote:
       | At what point do we drop the term "regular" expressions
       | altogether for stuff like this? This is going to sound pedantic
       | since I know that most popularly-used regex implementations are
       | themselves non-regular, but I feel like we're just piling more
       | and more stuff on top of good-old-regexes and trying to turn the
       | concept into a catch-all for anything that does pattern matching
       | on text.
       | 
       | I guess it just feels icky that "regular expressions" has
       | inherent meaning (i.e. can be represented entirely by a finite
       | automaton) which has become completely diluted at this point.
       | 
       | That rant aside, cool paper. The idea of bridging formal language
       | theory with modern computational tooling feels timely. I think I
       | would've liked to see more exploration of oracle-based costs, for
       | instance:
       | 
       | * What happens when oracle outputs are inconsistent/uncertain?
       | 
       | * What happens as oracle interactions become more computationally
       | expensive?
        
         | judofyr wrote:
         | Your rant is a bit unfounded for this paper as they do actually
         | take completely standard regular expressions (no backtracking
         | or anything like that) and extend it with one more construct.
         | Calling it <<semantic regular expressions>> seems perfectly
         | reasonable to me. What else to call it?
         | 
         | As for outside of the computer science sphere (which I find is
         | quite consistent in their terminology): I do agree that it
         | seems like it's a lost cause and <<regex>> is now synonymous
         | for <<pattern matching using this one specific syntax>> :(
        
           | shayanjm wrote:
           | Nested querying is not something that is standard for regular
           | grammars, amongst other aspects introduced in this paper that
           | implicitly require things like memory (again, not standard
           | for regular grammars)
        
           | tgv wrote:
           | Regular has a meaning, and it isn't this. It's had this
           | meaning since the 1950s. OTOH, these expressions do not
           | generate a regular language. That's a good reason to me.
           | 
           | They call it "semantic regular expression" because it
           | apparently already is a lost cause. "Regular expressions with
           | TMs embedded" doesn't quite have the same ring to it. Nobody
           | would see it as a regexp.
        
             | taeric wrote:
             | To be fair, I could see this basically allowing a form of
             | "back reference" lookup that lets you offload the back
             | reference to other parts. For example, `/(. _); \1 = (._
             | )/.exec("foo; foo = anything")`, but instead of doing a
             | back reference, you could have an oracle lookup.
             | 
             | I haven't looked at the examples in this paper, yet. But
             | I'm having fun imagining ways this could be used.
        
             | judofyr wrote:
             | > OTOH, these expressions do not generate a regular
             | language.
             | 
             | Okay sure you're technically correct here, but only because
             | these expressions generate a _subset_ of a regular
             | language. The LLM can only be invoked on a substring that
             | can expressed as a regular expression, and then it 's only
             | used to _remove_ strings from the language. Their results
             | are based heavily on how regular expressions work. A
             | "semantic context-free grammar" would have different type
             | of characteristics and behavior.
             | 
             | Maybe throwing in the word "extended" or "augmented" would
             | be a bit more clear, but as I reader I definitely would
             | expect "regular expression" to be part of the name.
        
         | krackers wrote:
         | >which has become completely diluted at this point.
         | 
         | Isn't it common in TCS to consider the class X with oracle
         | access to decide L though?
        
       | ashvardanian wrote:
       | There aren't many papers on the subject, and I'm probably not
       | great at spotting them. The linked paper also doesn't mention
       | much recent research, so it seems like an excellent opportunity
       | to ask for the collective wisdom of the HN community - what are
       | your favorite undervalued information retrieval papers after
       | 2015?
        
       | larodi wrote:
       | This article lets an UTF8 symbol be something... so now we have
       | all the existing math symbols + all UTF8 icons. Wonder what the
       | papers be look like in 20 years.
        
       | nickpsecurity wrote:
       | They said they want the cost of invoking the oracle, the DB, to
       | be low. Might be a good use for in-memory, embedded DB's with a
       | good API.
        
       | kevingadd wrote:
       | To me this feels kind of similar to the idea of verifying tokens
       | in a compiler by querying an oracle. i.e. in the script compiler
       | for my game engine, there's an additional pass at the end of
       | compilation that checks things like filename literals or entity
       | ID literals against a database of assets or a list of all the
       | scripts that were compiled, so when I hit F5 in visual studio any
       | typos in those literals are caught before I waste time booting
       | the game up.
       | 
       | It would be interesting to be able to do this eagerly during
       | parsing, so it's neat to see people thinking about doing this at
       | a regex level, though I'm having trouble thinking of specific
       | cases where I would both want to use a regex and want to query an
       | oracle from inside of it...
        
       ___________________________________________________________________
       (page generated 2024-11-27 23:01 UTC)