[HN Gopher] Hashing Modulo Theories
___________________________________________________________________
Hashing Modulo Theories
Author : philzook
Score : 53 points
Date : 2024-05-17 20:51 UTC (2 days ago)
(HTM) web link (www.philipzucker.com)
(TXT) w3m dump (www.philipzucker.com)
| yarg wrote:
| (What the article calls) Canonisation is an interesting issue -
| it seems very hard to get right and to do so efficiently.
|
| I'm trying to wrap my head around a the issue in a regex parser
| that I've knocked up.
|
| I'm currently ending up (expectedly) with multiple nodes
| representing equivalent languages; I want to strip these out when
| I use code generation to convert the constructed network into a
| switch based static automaton.
|
| The most robust way to see whether two languages are the same is
| to xor them and test for interminability - but this means
| comparing each pair of nodes throughout the network, and I'd
| rather avoid the n^2 scaling if there's another option.
|
| That option is to generate a canonical expression for the
| language that the machine represents, somewhat difficult but far
| more efficient when it comes to detecting collisions.
|
| https://en.wikipedia.org/wiki/Canonization
|
| https://en.wikipedia.org/wiki/Canonicalization
| CJefferson wrote:
| Canonizing is a fascinating research topic (which I've done
| quite a bit of work on).
|
| The shortish version is if you have an unordered list, you can
| sort the list.
|
| With graphs, you have to do a backtrack search -- there are
| many ways to speed that search up, and in practice it is very,
| very quick. This algorithm is, I think, the coolest algorithm
| most people have never heard of. There is a fairly good
| description in the paper "Practical Graph Isomorphism II".
|
| For most structures more complicated than graphs, turns out you
| can often turn them into graphs, then use the existing graph
| software.
|
| Unfortunately, I don't think this will help your case, as the
| structure is that you can represent things in quite different
| ways, so this stuff won't be much help :)
| yarg wrote:
| The current plan is not to perform any part of the
| canonisation on a graph, but to calculate a language table
| for the network and canonise the resultant expressions (which
| are trees not graphs).
|
| I've mostly figured out the sorting aspect. The problematic
| part is language de-aliasing which means I'm going to have to
| refactor some things.
|
| The language extraction (it's just iterative expression
| expansion and a language mapping function, example below)
| shouldn't be too hard to write down, I've already implemented
| it in my head and I don't think that there are any corner
| cases that I'm missing.
|
| The language de-aliasing is going to change the way that
| certain expressions get translated into automata.
|
| For example, right now there are generator methods for things
| like & and ^, but in order to de-alias the languages it's
| better if the expression gets expanded, and then the expanded
| form gets converted into a machine.
|
| So ^(A,B) [?] &(:(A,!B),:(!A,B)) [?] !:(!:(A,!B),!:(!A,B)).
|
| Even a simple expression such as !a needs to be de-aliased.
|
| !/a [?] :([.../.],[/b...])
|
| (in my insane non-standard syntax, NOT 'a' is the OR of all
| chars up to and including '.' (the char before 'a') and all
| chars 'b' and above.
|
| Language mapping example: S0{ ^
| Expr0 + S0 Expr1 + S1 Expr2 + S2
| }
|
| Is equivalent to S0{ Expr0*
| Expr0* + Expr1 + S1; Expr0* + Expr2 + S2;
| }
|
| So just run through iterating away the state symbols until
| all that's left are pure expressions, attempt to de-alias the
| languages, then do a recursive sort on the expression trees.
___________________________________________________________________
(page generated 2024-05-19 23:02 UTC)