[HN Gopher] Space-filling curves, constructively
       ___________________________________________________________________
        
       Space-filling curves, constructively
        
       Author : luu
       Score  : 32 points
       Date   : 2024-07-24 17:20 UTC (5 hours ago)
        
 (HTM) web link (math.andrej.com)
 (TXT) w3m dump (math.andrej.com)
        
       | aguynamedben wrote:
       | An interesting variant of space-filling curves + dimensionality
       | reduction is Geohash (https://en.wikipedia.org/wiki/Geohash,
       | http://geohash.org/) which takes a lon/lat and uses a Z-curve
       | approach to produce a hash such as `u4pruydqqvj` representing the
       | location. This hash value is basically "how far along the space-
       | filling curve is the lon/lat located". You're reducing two
       | dimensions (lon/lat) into one dimension (how far along the space-
       | filling curve).
       | 
       | There's a unique side-effect to Geohashes in that the value
       | (`u4pruydqqvj`) can have it's end "lopped off" (i.e. cut down to
       | `u4pru`) and it still represents a less precise, but generally
       | accurate representation of the original lon/lat in most cases
       | (when the curve isn't near the edge of the 2d map!). This allows
       | you to index locations (lat/lon) using a string ('u4pru') which
       | opens you up to doing b-tree, range queries, etc. in traditional
       | database, with one field.
       | 
       | Just a rad math quirk! I'm not an expert, and it's a very dense
       | book, but if someone really wants to get into this kind of stuff
       | the "Bible" is "Foundations of Multidimensional and Metric Data
       | Structures" by Hanan Samet.
        
         | Sharlin wrote:
         | The thing about SFC addressing is that two places near each
         | other always share a prefix, and the closer they are, the
         | longer the common prefix. It's sort of like Gray coding.
         | Quadtree addressing, for example, also has the property that
         | you mentioned (as do, of course, normal lat/long coordinates!)
         | _but_ two adjacent locations may not have similar addresses at
         | all if they happen to straddle a subdivision boundary. (Again,
         | compare to  "normal" numbers where, say, |2000-1999| = 1 but
         | there's no common prefix at all!)
        
           | tomsmeding wrote:
           | The same happens, in fact, for space-filling curves. Sure, in
           | _most_ of the area covered by the curve, closeness of points
           | means closeness on the curve. But in this Hilbert curve:
           | 
           | https://upload.wikimedia.org/wikipedia/commons/7/7c/Hilbert-.
           | ..
           | 
           | Consider the two points on the curve straddling the middle of
           | the top edge of the square. They are very close together, but
           | their 1D addresses on the curve are very far apart (one
           | close-ish to the beginning, and one close-ish to the end)!
           | 
           | There ain't no free lunch.
        
       | wenc wrote:
       | Space filling curves like Hilbert sound esoteric but they're
       | actually useful in for high performance database indexing.
       | 
       | DuckDB has a Hilbert index.
       | 
       | https://github.com/rustyconover/duckdb-lindel-extension
       | 
       | It's not just DuckDB. SQL server and Postgres support Hilbert
       | too.
       | 
       | The big idea is that you can project tuples e.g (lat, long) or
       | just about any other tuple structure (string or floats) in a
       | single integer, ordered in a way that is locality preserving.
       | 
       | Because most queries tend to lookup data close to each other,
       | this can lead to performance gains. Parquet statistics can be
       | built to be locality sensitive (locality defined broadly, not
       | just numbers).
       | 
       | If you index on Hilbert, you can specific compute a max min of
       | the Hilbert range of your predicate and eliminate looking outside
       | that range, speeding up query performance.
        
         | CurtHagenlocher wrote:
         | "Liquid Clustering" from Databricks also uses Hilbert curves.
        
       | perone wrote:
       | I wrote an article about it and S2 some time ago as well for
       | those interested:
       | https://blog.christianperone.com/2015/08/googles-s2-geometry...
        
       | JKCalhoun wrote:
       | I'm imagining a physical world analog: shoving a bicycle chain
       | into a shallow box until you can't shove any more of the chain
       | in.
        
       | wenc wrote:
       | [delayed]
        
       ___________________________________________________________________
       (page generated 2024-07-24 23:02 UTC)