Post 2613735 by BruceJia@refactorcamp.org
(DIR) More posts by BruceJia@refactorcamp.org
(DIR) Post #2566580 by alcinnz@floss.social
2019-01-03T02:14:59Z
0 likes, 0 repeats
I wonder: who here has heard of the concept of a Perfect Hash Function?Because I'm studying one generator now.
(DIR) Post #2566848 by alcinnz@floss.social
2019-01-03T02:45:39Z
0 likes, 0 repeats
In Memex, next I'll want to map each known CSS property into an offset of an array so I can store it's value at that offset in order to implement CSS cascade. And sure I can use HashMaps for that, but there's more efficient algorithm already implemented as a Rust crate.A "perfect hash function" for a particular set of keys (e.g. the Memex-supported CSS properties) maps each key to a unique number. That is it has no collisions amongst those keys.
(DIR) Post #2568349 by alcinnz@floss.social
2019-01-03T04:14:56Z
0 likes, 0 repeats
So I've finished reading the implementation of Perfect Hash Functions given by the "phf" Rust crate, and I think it's easiest to describe it by starting with the generic form of it's hash function.It splits a normal hash into 3 sections (g, f1, & f2), and uses the first section to lookup additional parameters d1 & d2. From there it's simply "d2 + f1 * d1 + f2 % len".From there making this hash function perfect involves almost bruteforcing good parameters for this hash function.
(DIR) Post #2568578 by alcinnz@floss.social
2019-01-03T04:22:54Z
0 likes, 0 repeats
But finding those parameters really isn't as bad as a bruteforce because for one we know that none of the parameters will be larger than the number of possible keys. That drastically reduces the search space.We can directly compute the number of "buckets" to generate for the parameter table.And we know that the more keys land under one of those buckets, the harder it will be to find good parameters for it. So do those first before there are too many collisions with other bucket's keys.
(DIR) Post #2568988 by alcinnz@floss.social
2019-01-03T04:43:51Z
0 likes, 0 repeats
In contrast when I studied GPerf (which other browser engines tend to use, amongst other projects), I found it used a different technique consisting of refining the perfect hash function in numerous iterations over the keyset.PHF's algorithm to me looks to be a more efficient use of both human and computer effort. Once it was discovered ofcourse.
(DIR) Post #2607328 by BruceJia@refactorcamp.org
2019-01-03T19:49:53Z
0 likes, 0 repeats
@alcinnz What's a hash function?
(DIR) Post #2608140 by alcinnz@floss.social
2019-01-03T20:23:03Z
0 likes, 0 repeats
@BruceJia A hash function converts some data into a single number, such that the same inputs always gives the same output.They're usually used for security (though they have more requirements), and for implementing key-value mappings/dictionaries.In this case what I'm discussing is more relevant to the latter.
(DIR) Post #2608427 by alcinnz@floss.social
2019-01-03T20:32:24Z
0 likes, 0 repeats
@BruceJia To expand on the issue perfect-hash-functions solve:A key-value mapping can be implemented by using a hash function to translate keys into indices into an array or list. The hash needs to brought into the range of those indices, but that's easy enough with a remainder operator.Howeverit's possible for two keys to map to the same index, so there needs to be logic to implement that. Or you can compute a hash function that doesn't have that issue with the keys you use
(DIR) Post #2608455 by alcinnz@floss.social
2019-01-03T20:33:20Z
0 likes, 0 repeats
@BruceJia Does that make sense?
(DIR) Post #2613735 by BruceJia@refactorcamp.org
2019-01-03T23:28:13Z
0 likes, 0 repeats
@alcinnz Yes, that makes sense