[HN Gopher] Regular Expressions which query an Oracle
___________________________________________________________________
Regular Expressions which query an Oracle
Author : agnishom
Score : 28 points
Date : 2024-11-26 12:28 UTC (1 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| shayanjm wrote:
| At what point do we drop the term "regular" expressions
| altogether for stuff like this? This is going to sound pedantic
| since I know that most popularly-used regex implementations are
| themselves non-regular, but I feel like we're just piling more
| and more stuff on top of good-old-regexes and trying to turn the
| concept into a catch-all for anything that does pattern matching
| on text.
|
| I guess it just feels icky that "regular expressions" has
| inherent meaning (i.e. can be represented entirely by a finite
| automaton) which has become completely diluted at this point.
|
| That rant aside, cool paper. The idea of bridging formal language
| theory with modern computational tooling feels timely. I think I
| would've liked to see more exploration of oracle-based costs, for
| instance:
|
| * What happens when oracle outputs are inconsistent/uncertain?
|
| * What happens as oracle interactions become more computationally
| expensive?
| judofyr wrote:
| Your rant is a bit unfounded for this paper as they do actually
| take completely standard regular expressions (no backtracking
| or anything like that) and extend it with one more construct.
| Calling it <<semantic regular expressions>> seems perfectly
| reasonable to me. What else to call it?
|
| As for outside of the computer science sphere (which I find is
| quite consistent in their terminology): I do agree that it
| seems like it's a lost cause and <<regex>> is now synonymous
| for <<pattern matching using this one specific syntax>> :(
| shayanjm wrote:
| Nested querying is not something that is standard for regular
| grammars, amongst other aspects introduced in this paper that
| implicitly require things like memory (again, not standard
| for regular grammars)
| tgv wrote:
| Regular has a meaning, and it isn't this. It's had this
| meaning since the 1950s. OTOH, these expressions do not
| generate a regular language. That's a good reason to me.
|
| They call it "semantic regular expression" because it
| apparently already is a lost cause. "Regular expressions with
| TMs embedded" doesn't quite have the same ring to it. Nobody
| would see it as a regexp.
| taeric wrote:
| To be fair, I could see this basically allowing a form of
| "back reference" lookup that lets you offload the back
| reference to other parts. For example, `/(. _); \1 = (._
| )/.exec("foo; foo = anything")`, but instead of doing a
| back reference, you could have an oracle lookup.
|
| I haven't looked at the examples in this paper, yet. But
| I'm having fun imagining ways this could be used.
| judofyr wrote:
| > OTOH, these expressions do not generate a regular
| language.
|
| Okay sure you're technically correct here, but only because
| these expressions generate a _subset_ of a regular
| language. The LLM can only be invoked on a substring that
| can expressed as a regular expression, and then it 's only
| used to _remove_ strings from the language. Their results
| are based heavily on how regular expressions work. A
| "semantic context-free grammar" would have different type
| of characteristics and behavior.
|
| Maybe throwing in the word "extended" or "augmented" would
| be a bit more clear, but as I reader I definitely would
| expect "regular expression" to be part of the name.
| krackers wrote:
| >which has become completely diluted at this point.
|
| Isn't it common in TCS to consider the class X with oracle
| access to decide L though?
| ashvardanian wrote:
| There aren't many papers on the subject, and I'm probably not
| great at spotting them. The linked paper also doesn't mention
| much recent research, so it seems like an excellent opportunity
| to ask for the collective wisdom of the HN community - what are
| your favorite undervalued information retrieval papers after
| 2015?
| larodi wrote:
| This article lets an UTF8 symbol be something... so now we have
| all the existing math symbols + all UTF8 icons. Wonder what the
| papers be look like in 20 years.
| nickpsecurity wrote:
| They said they want the cost of invoking the oracle, the DB, to
| be low. Might be a good use for in-memory, embedded DB's with a
| good API.
| kevingadd wrote:
| To me this feels kind of similar to the idea of verifying tokens
| in a compiler by querying an oracle. i.e. in the script compiler
| for my game engine, there's an additional pass at the end of
| compilation that checks things like filename literals or entity
| ID literals against a database of assets or a list of all the
| scripts that were compiled, so when I hit F5 in visual studio any
| typos in those literals are caught before I waste time booting
| the game up.
|
| It would be interesting to be able to do this eagerly during
| parsing, so it's neat to see people thinking about doing this at
| a regex level, though I'm having trouble thinking of specific
| cases where I would both want to use a regex and want to query an
| oracle from inside of it...
___________________________________________________________________
(page generated 2024-11-27 23:01 UTC)