[HN Gopher] Hexagons and Hilbert curves - The headaches of distr...
___________________________________________________________________
Hexagons and Hilbert curves - The headaches of distributed spatial
indices
Author : max_sendfeld
Score : 57 points
Date : 2024-03-22 08:20 UTC (14 hours ago)
(HTM) web link (hivekit.io)
(TXT) w3m dump (hivekit.io)
| otoburb wrote:
| I knew about hexagons but not about R-trees and Hilbert curves,
| and then combining the two. Even though that was an inbound
| content article, it _was_ at least informative.
| michelpp wrote:
| I experimented with geospatial Hilbert Curves as a Postgres
| extension [0] for PostGIS using the S2 [1] spherical geometry
| library. S2 uses a scale free cell coverage pattern that is
| numbered using six interlocking space filling Hilbert Curves [2].
| This approach is similar to what is used in this article. S2
| doesn't use hexagons but a different cell structure but the idea
| is generally the same.
|
| By having both high level (cell) and low level (cell id)
| geometries it was a very powerful library which allowed
| projection from the hilbert space into a Postgres spatial index
| (spgist) including various trees, like noted in this article. It
| appears to be still quite active in development.
|
| [0] https://github.com/michelp/pgs2
|
| [1] https://s2geometry.io/
|
| [2] https://s2geometry.io/devguide/s2cell_hierarchy
| max_sendfeld wrote:
| Ah - I've used S2 in the past. Great work. It scales really
| well for larger datasets.
|
| Interesting that you mention the use of multiple hilbert curves
| as well. We also experimented with two Hilbert Curves, rotated
| by 90 degrees. This helps to get around what we've dubbed the
| "Hilbert Equator" problem where two objects are quite far on
| the curve because they are placed close to one of the major
| fault lines in the fractal (for lack of a better word)
| michelpp wrote:
| Tackling these boundary problems are the literal "edge cases"
| in geospatial indexing and they exist everywhere, so this i a
| good reason for using an existing library as the authors have
| already solved them.
|
| Hexagons are cool, but they are not necessarily the bestagon
| for a spherical geometry since you cannot break a hexagon
| into smaller hexagons, whereas an S2 cell is a "cube" with
| spherical sides or HEALPix uses a rhombic dodecahedron [0]
| both of which can be split into smaller divisions of
| themselves.
|
| Not to discourage you from your experimentation, it's all
| trade offs and you might find a good one. Good luck!
|
| [0] https://en.wikipedia.org/wiki/HEALPix
| plagiarist wrote:
| Does that end up with specific points that have that same
| difficulty at intersections of equators? I suppose even if
| that is the case, a point is something that's easier to
| position over a dead area than a line would be.
| jandrewrogers wrote:
| There are always caveats with these types of indexing
| methods, especially if you require dynamic high-performance
| indexing and support for rectangles/polygons. You can only
| move the edge cases around, there is no way to eliminate
| them. This allows you to tailor an indexing scheme for
| specific workload assumptions but this obviously breaks down
| if you need an algorithm that generalizes to many workloads
| and data models. There isn't just one type of edge case with
| this type of indexing, there are several which may or may not
| be relevant depending on what you are trying to do.
|
| Some research from the 1980s showed it is only possible to
| mitigate bounded categories of edge case, thereby improving
| generality, by indexing on complex higher-dimensionality
| embeddings. Mitigating more categories requires more
| dimensions and more complexity. However, no one could figure
| out how to construct these embeddings for even basic cases or
| deal with more practical curse of dimensionality issues, so
| that is largely forgotten (the researchers themselves made
| comments to the effect that they didn't think a tractable
| solution was possible). I've never even been able to find
| that literature in electronic form, unfortunately.
|
| Like AI, it is an interesting open-ended problem space. You
| can prove that an elegant optimal solution is not tractable
| so it ends up being a search for asymptotically optimal
| algorithms that become exponentially more complex the closer
| you get to optimal.
| durkie wrote:
| any idea why postgis hasn't formally incorporated S2 cell
| functions? It seems like it'd be right up their alley.
| patches11 wrote:
| It would be interesting to know what other solutions exist in
| this space and why they weren't chosen.
|
| I know of at least one, GeoMesa, which seems like it could at
| least provide the building blocks to achieve what they are trying
| to do.
| joe_the_user wrote:
| There's a long of study on the subject (that I've only
| surveyed). See my comment
| https://news.ycombinator.com/item?id=39792884
| tromp wrote:
| > a vehicle with a Hilbert Curve position of 0.34 is really close
| to one with 0.35 and really far from one with 0.89
|
| But points with a large difference in their single curve
| coordinate can be either far apart or close together. E.g. on
| this 16 point Hilbert curve __. .__
| __| |__ | __ | |__| |__|
|
| the '.' marked points at 1/16th and 15/16th along the curve are
| adjacent.
| wolframhempel wrote:
| It's an important point. We've ended up using two Hilbert
| curves, rotated by 90 degrees to tackle this problem. If it is
| close on either, it is close in 2D space
| tromp wrote:
| Interesting. So one might think that being close together in
| 2D space corresponds to being close on
|
| BOTH cartesian coordinates / EITHER Hilbert coordinate
|
| And that being far apart in 2D space corresponds to being far
| apart on
|
| EITHER cartesian coordinate / BOTH Hilbert coordinates
|
| But if we consider the two commas below which are close
| together in 2D space, we see they are far apart in any
| rotation of this Hilbert curve?! __ __
| __ __ |__| __| |__ |__| __ |__ __|
| __ | |__ __| ;__ __| | |__ __,__ __
| __| __| |__ __| |__ | __ | | __
| | |__| |__| |__| |__|
| sp332 wrote:
| So you converted your 2D space into another 2D space?
| dwallin wrote:
| As mentioned in other replies, this doesn't fully solve a lot
| of the cases. I think z order curves might actually be better
| here, they are slightly less accurate individually, but in a
| way that would seemingly minimize the worst case results when
| taking the minimum of the two. But at that point you are
| probably better off using a single z-order curve and using
| the coordinates to check the manhattan or Chebyshev distance
| using bitwise operations.
| jandrewrogers wrote:
| I would second the use of Z-order curves. Hilbert curves
| were famously known as more optimal on spinning disk in a
| specific context a few decades ago. Inertia has kept them
| top-of-mind even though modern systems do not have the
| problems Hilbert curves were better at solving and those
| curves are more complex to use than e.g. Z-order curves.
| zX41ZdbW wrote:
| I've recently implemented a map of the Internet using a similar
| technique: https://reversedns.space/
|
| In ClickHouse, we have almost every mentioned technique: H3 and
| S2, geohashes, and indexing by space-filling curves.
| esafak wrote:
| For reference:
|
| https://clickhouse.com/docs/en/sql-reference/functions/geo/h...
|
| https://clickhouse.com/docs/en/sql-reference/functions/geo/s...
| joe_the_user wrote:
| The problem of finding the shortest path on a map (a la google
| maps etc) is often solved with approaches like "hub networks" and
| "contraction hierarchies" rather than simple spatial networks.
| Things like dijkstra's algorithm are only efficient if you're
| input is the raw graph and point and you only want one path once.
|
| See just for a start: https://arxiv.org/abs/1504.05140
| spenczar5 wrote:
| Oh boy, this gives me a chance to talk about one of the gems of
| astronomy software which deserves to be better known: HEALPixel
| tesselation!
|
| HEALPixels stand for 'Hierarchical Equal-Area Iso-latitudinal
| Pixels'. It is a scheme that was developed to analyze signals
| that cover the entire sky, but with variable density.
|
| Like HTM or Hilbert curves, this can be used to organize spatial
| data.
|
| The tesselation looks kind of funny but has many good features -
| it doesn't have discontinuities at poles, and is always equal
| area. And with the "nested" healpixel formulation, pixels are
| identified by integers. Pixel IDs are hierarchical based on
| leading bits - so, for example, pixel 106 (=0110 1010) contains
| pixel 1709 (=0110 1010 1101). This lets you do some marvelous
| optimizations in queries if you structure your data
| appropriately. Nearest neighbor searches can be extremely quick
| if things are HEALPix-indexed - and so can radius searches, and
| arbitrary polygon searches.
|
| HEALPixels are used today for more than just their original
| intent. LSST will use them for storing all-sky data and point
| source catalogs, for example.
|
| More here:
|
| - Original NASA/JPL site: https://healpix.jpl.nasa.gov/
|
| - Popular Python implementation:
| https://healpy.readthedocs.io/en/latest/
|
| - Good PDF primer: https://healpix.jpl.nasa.gov/pdf/intro.pdf
|
| And an experimental database being built on healpix for extremely
| large data volumes (certainly many TB, maybe single-digit PB):
| https://github.com/astronomy-commons/hipscat
| aimor wrote:
| Thanks for sharing, I've needed this for years and never
| stumbled across it.
| eh_why_not wrote:
| Gem indeed! Here's its wikipedia page:
|
| https://en.wikipedia.org/wiki/HEALPix
| feverzsj wrote:
| The purpose of using space-filling curves is to convert
| coordinates into 1 dimensional data, thus they can be indexed by
| ordinary database index, no matter distrusted or not. The problem
| is it's much slower than a "real" spatial index like R-tree, and
| the supported queries are rather limited.
|
| On the other hand, spatial data can be easily partitioned,
| rendering the aforementioned method less useful. Also, I've never
| seen anyone needed more than a single instance of postgres to
| index all the spatial data, unless you are dealing with spatial
| temporal data, which can grow infinitely large. Then again,
| spatial temporal data can be easily partitioned.
| Lichtso wrote:
| Here is one of the recent advancements in performing similarity
| (including nearest neighbor) search on sparse high dimensional
| data using learned indices: https://arxiv.org/abs/2204.10028
|
| The paper basically deals with the most adverse case but the
| approach should work in lower dimensions and in data with a non-
| uniform density distribution as well.
___________________________________________________________________
(page generated 2024-03-22 23:01 UTC)