[HN Gopher] Space-filling curves, constructively
___________________________________________________________________
Space-filling curves, constructively
Author : luu
Score : 32 points
Date : 2024-07-24 17:20 UTC (5 hours ago)
(HTM) web link (math.andrej.com)
(TXT) w3m dump (math.andrej.com)
| aguynamedben wrote:
| An interesting variant of space-filling curves + dimensionality
| reduction is Geohash (https://en.wikipedia.org/wiki/Geohash,
| http://geohash.org/) which takes a lon/lat and uses a Z-curve
| approach to produce a hash such as `u4pruydqqvj` representing the
| location. This hash value is basically "how far along the space-
| filling curve is the lon/lat located". You're reducing two
| dimensions (lon/lat) into one dimension (how far along the space-
| filling curve).
|
| There's a unique side-effect to Geohashes in that the value
| (`u4pruydqqvj`) can have it's end "lopped off" (i.e. cut down to
| `u4pru`) and it still represents a less precise, but generally
| accurate representation of the original lon/lat in most cases
| (when the curve isn't near the edge of the 2d map!). This allows
| you to index locations (lat/lon) using a string ('u4pru') which
| opens you up to doing b-tree, range queries, etc. in traditional
| database, with one field.
|
| Just a rad math quirk! I'm not an expert, and it's a very dense
| book, but if someone really wants to get into this kind of stuff
| the "Bible" is "Foundations of Multidimensional and Metric Data
| Structures" by Hanan Samet.
| Sharlin wrote:
| The thing about SFC addressing is that two places near each
| other always share a prefix, and the closer they are, the
| longer the common prefix. It's sort of like Gray coding.
| Quadtree addressing, for example, also has the property that
| you mentioned (as do, of course, normal lat/long coordinates!)
| _but_ two adjacent locations may not have similar addresses at
| all if they happen to straddle a subdivision boundary. (Again,
| compare to "normal" numbers where, say, |2000-1999| = 1 but
| there's no common prefix at all!)
| tomsmeding wrote:
| The same happens, in fact, for space-filling curves. Sure, in
| _most_ of the area covered by the curve, closeness of points
| means closeness on the curve. But in this Hilbert curve:
|
| https://upload.wikimedia.org/wikipedia/commons/7/7c/Hilbert-.
| ..
|
| Consider the two points on the curve straddling the middle of
| the top edge of the square. They are very close together, but
| their 1D addresses on the curve are very far apart (one
| close-ish to the beginning, and one close-ish to the end)!
|
| There ain't no free lunch.
| wenc wrote:
| Space filling curves like Hilbert sound esoteric but they're
| actually useful in for high performance database indexing.
|
| DuckDB has a Hilbert index.
|
| https://github.com/rustyconover/duckdb-lindel-extension
|
| It's not just DuckDB. SQL server and Postgres support Hilbert
| too.
|
| The big idea is that you can project tuples e.g (lat, long) or
| just about any other tuple structure (string or floats) in a
| single integer, ordered in a way that is locality preserving.
|
| Because most queries tend to lookup data close to each other,
| this can lead to performance gains. Parquet statistics can be
| built to be locality sensitive (locality defined broadly, not
| just numbers).
|
| If you index on Hilbert, you can specific compute a max min of
| the Hilbert range of your predicate and eliminate looking outside
| that range, speeding up query performance.
| CurtHagenlocher wrote:
| "Liquid Clustering" from Databricks also uses Hilbert curves.
| perone wrote:
| I wrote an article about it and S2 some time ago as well for
| those interested:
| https://blog.christianperone.com/2015/08/googles-s2-geometry...
| JKCalhoun wrote:
| I'm imagining a physical world analog: shoving a bicycle chain
| into a shallow box until you can't shove any more of the chain
| in.
| wenc wrote:
| [delayed]
___________________________________________________________________
(page generated 2024-07-24 23:02 UTC)