hngopher.com

       [HN Gopher] Hashing Modulo Theories
       ___________________________________________________________________
        
       Hashing Modulo Theories
        
       Author : philzook
       Score  : 53 points
       Date   : 2024-05-17 20:51 UTC (2 days ago)
        
 (HTM) web link (www.philipzucker.com)
 (TXT) w3m dump (www.philipzucker.com)
        
       | yarg wrote:
       | (What the article calls) Canonisation is an interesting issue -
       | it seems very hard to get right and to do so efficiently.
       | 
       | I'm trying to wrap my head around a the issue in a regex parser
       | that I've knocked up.
       | 
       | I'm currently ending up (expectedly) with multiple nodes
       | representing equivalent languages; I want to strip these out when
       | I use code generation to convert the constructed network into a
       | switch based static automaton.
       | 
       | The most robust way to see whether two languages are the same is
       | to xor them and test for interminability - but this means
       | comparing each pair of nodes throughout the network, and I'd
       | rather avoid the n^2 scaling if there's another option.
       | 
       | That option is to generate a canonical expression for the
       | language that the machine represents, somewhat difficult but far
       | more efficient when it comes to detecting collisions.
       | 
       | https://en.wikipedia.org/wiki/Canonization
       | 
       | https://en.wikipedia.org/wiki/Canonicalization
        
         | CJefferson wrote:
         | Canonizing is a fascinating research topic (which I've done
         | quite a bit of work on).
         | 
         | The shortish version is if you have an unordered list, you can
         | sort the list.
         | 
         | With graphs, you have to do a backtrack search -- there are
         | many ways to speed that search up, and in practice it is very,
         | very quick. This algorithm is, I think, the coolest algorithm
         | most people have never heard of. There is a fairly good
         | description in the paper "Practical Graph Isomorphism II".
         | 
         | For most structures more complicated than graphs, turns out you
         | can often turn them into graphs, then use the existing graph
         | software.
         | 
         | Unfortunately, I don't think this will help your case, as the
         | structure is that you can represent things in quite different
         | ways, so this stuff won't be much help :)
        
           | yarg wrote:
           | The current plan is not to perform any part of the
           | canonisation on a graph, but to calculate a language table
           | for the network and canonise the resultant expressions (which
           | are trees not graphs).
           | 
           | I've mostly figured out the sorting aspect. The problematic
           | part is language de-aliasing which means I'm going to have to
           | refactor some things.
           | 
           | The language extraction (it's just iterative expression
           | expansion and a language mapping function, example below)
           | shouldn't be too hard to write down, I've already implemented
           | it in my head and I don't think that there are any corner
           | cases that I'm missing.
           | 
           | The language de-aliasing is going to change the way that
           | certain expressions get translated into automata.
           | 
           | For example, right now there are generator methods for things
           | like & and ^, but in order to de-alias the languages it's
           | better if the expression gets expanded, and then the expanded
           | form gets converted into a machine.
           | 
           | So ^(A,B) [?] &(:(A,!B),:(!A,B)) [?] !:(!:(A,!B),!:(!A,B)).
           | 
           | Even a simple expression such as !a needs to be de-aliased.
           | 
           | !/a [?] :([.../.],[/b...])
           | 
           | (in my insane non-standard syntax, NOT 'a' is the OR of all
           | chars up to and including '.' (the char before 'a') and all
           | chars 'b' and above.
           | 
           | Language mapping example:                   S0{             ^
           | Expr0 + S0             Expr1 + S1             Expr2 + S2
           | }
           | 
           | Is equivalent to                   S0{             Expr0*
           | Expr0* + Expr1 + S1;             Expr0* + Expr2 + S2;
           | }
           | 
           | So just run through iterating away the state symbols until
           | all that's left are pure expressions, attempt to de-alias the
           | languages, then do a recursive sort on the expression trees.
        
       ___________________________________________________________________
       (page generated 2024-05-19 23:02 UTC)