[HN Gopher] H3: For indexing geographies into a hexagonal grid, ...
       ___________________________________________________________________
        
       H3: For indexing geographies into a hexagonal grid, by Uber
        
       Author : wiradikusuma
       Score  : 96 points
       Date   : 2025-03-09 03:32 UTC (18 hours ago)
        
 (HTM) web link (h3geo.org)
 (TXT) w3m dump (h3geo.org)
        
       | Traubenfuchs wrote:
       | Is the big thing here that those hexagons have the advantages of
       | a circles (almost even max distance in all directions from
       | center) with the advantages of squares (no overlap)?
        
         | Epa095 wrote:
         | Sqaures have overlap if you put them on a sphere, no?
         | 
         | There are some nice things with this. It contains cells at
         | different levels (sizes). So you can use very small cells if
         | you care about small areas, or large cells if you care more
         | about larger areas. Those are given here
         | https://h3geo.org/docs/core-library/restable/#average-area-i...
         | 
         | From a coordinate it is fast to get the cell at any level, so
         | it's fast to group coordinates at whatever level you want.
         | 
         | If is less about comparing two coordinates, it is more about
         | bering able to group large numbers of coordinates such that
         | they go into non-overlapping groups, where everything in the
         | same group is 'close' to each other.
        
           | at0mic22 wrote:
           | I would assume you wouldn't care a lot about overlapping
           | squares as the representation on the screen is mostly square,
           | you don't order uber from the globe.
           | 
           | Thus most screen projections are derived from wgs84. The idea
           | behind h3 (as I see it) that when moving the map to lets say
           | top-right, you'd need to fetch 1 leaf comparing to 3 leafs
           | with rtree
        
           | jillesvangurp wrote:
           | Most grid systems don't have overlapping squares because they
           | are quad trees. Each point identifies a single rectangle they
           | are in, and all the parent rectangles that fully contain that
           | rectangle.
           | 
           | H3 doesn't have that property between levels. Cells at a
           | level don't overlap. But the the seven children of a cell
           | might overlap with neighboring parent cells.
           | 
           | Of course rectangles aren't rectangular when projected on a
           | sphere. Because there are no straight lines. Rectangles and
           | hexagons are 2d shapes that are applied to the 2d projections
           | of spheres. They only look like rectangles or hexagons in 2D.
        
         | cpa wrote:
         | Yes, but it's important to note that it's quite a specialized
         | index. - It's an index that doesn't depend on the data, unlike
         | traditional r-tree indices. This means non-uniform data won't
         | be queried as efficiently as with rtree, but it may be faster
         | to build.
         | 
         | - This data independence property is VERY important for
         | distributed or streaming queries. For example, if you want to
         | join datasets using Spark or other big data tools, each team
         | can add a column for h3 cells independently and join somewhat
         | efficiently. For large volumes of data, constructing the rtree
         | is just not feasible, or more precisely, very disconnected from
         | the rest of the "data ecosystem".
         | 
         | - It doesn't work with any coordinate reference system other
         | than EPSG 4326 (which you may want if you only work on specific
         | geographies to get more precision in your floats)
         | 
         | - It's clearly built with points in mind. Polygons, curves, or
         | lines are an afterthought. For example, the polygonToCells
         | function returns a set of cells that are entirely within the
         | polygon. If you want to join, you'd need to also have the set
         | of all cells that entirely contain the polygon. I've never
         | found a reliable way to get that.
         | 
         | That being said, it's not bad at all, but if you don't have so
         | much data that you can't compute rtree indices, just stick with
         | PostGIS.
        
           | riordan wrote:
           | > Polygons, curves, or lines are an afterthought. For
           | example, the polygonToCells function returns a set of cells
           | that are entirely within the polygon. If you want to join,
           | you'd need to also have the set of all cells that entirely
           | contain the polygon. I've never found a reliable way to get
           | that.
           | 
           | With v4 of h3 they (finally) have a clean syntax for this
           | with polygonToCellsExperimental[0].
           | 
           | Now there's options for
           | 
           | - Cell center is contained in the shape (default) - Cell is
           | fully contained in the shape - Overlapping (covering): Cell
           | overlaps the shape at any point - BBOX: Cell bounding box
           | overlaps shape
           | 
           | Makes life a fair bit easier if you've gotta deal with H3
           | polys. And if you're working locally, DuckDB Spatial's r-tree
           | indexing[1] can make for a nice stand-in for PostGIS as a
           | quick point-in-polygon solution without the need to spin up a
           | service.
           | 
           | [0]: https://h3geo.org/docs/api/regions/#polygontocellsexperi
           | ment... [1]: https://duckdb.org/docs/stable/extensions/spatia
           | l/r-tree_ind...
        
             | cpa wrote:
             | Nice, I did not know about polygonToCellsExperimental.
             | 
             | As for DuckDB & rtree, it's alas not a replacement of
             | postgis yet and the indices cannot be used (yet) in joins.
             | In fact, I even have workflows where I iterate over rows in
             | python and run duckdb queries one after the other rather
             | than joining in just one query because of this very issue.
        
         | ethbr1 wrote:
         | Oh, mon frere... there are many reasons.
         | 
         | Hexagons. Are the bestagons.
         | https://m.youtube.com/watch?v=thOifuHs6eY
        
       | ahoka wrote:
       | How is this better than space filling curves? Or does this solve
       | a different problem? It's a bit hard to see what and how it
       | solves. Why hexagons?
        
         | meowtimemania wrote:
         | If you used curves it'd get complicated deciding where one area
         | starts and the other ends. With hexagons it's easier to divide
         | the world such that (mostly) no hexagon overlaps.
         | 
         | The purpose is to be able to predictably map any coordinate to
         | its associated hexagon.
         | 
         | In database applications this makes it easier to query all data
         | associated with a hexagonal area.
        
           | xhkkffbf wrote:
           | Why not just squares or rectangles? Is there a simple
           | explanation?
        
             | jt_b wrote:
             | You want a shape that can represent an approximately equal
             | area, anywhere on a 3 dimensional surface.
        
         | jandrewrogers wrote:
         | It depends on what you are trying to do.
         | 
         | Hexagonal tessellations are optimized for display and
         | visualization because they approximate equal surface areas on a
         | sphere. They have poor properties for scalable and efficient
         | analytical data processing because they are not congruent.
         | 
         | Indexing for scalable analytical processing on a sphere
         | requires congruent equal _volume_ decomposition, even if you
         | only care about the surface. You can trivially project it to
         | the surface later if needed. Binary space decomposition, of
         | which space-filling curves are a subset, are strongly
         | preferable for this type of indexing.
         | 
         | In practice, a lot of data processing systems will render data
         | as an H3 tiles only for visualization as a final step. That
         | conversion is fast and trivial and it makes pretty pictures. It
         | is not as commonly used to index the underlying data model
         | because scalability and performance is prohibitive unless the
         | data is small.
        
       | dplarson wrote:
       | I found this presentation helpful for an intro to H3's
       | design/motivation: "Engineering Sub-City Geos for a Hyper-Local
       | Marketplace with Uber":
       | https://youtu.be/wDuKeUkNLkQ?si=-9JmxZQJ2LZo6Kh4
        
       | naiv wrote:
       | This online tool gives you great idea about what this means on
       | different levels:
       | 
       | https://wolf-h3-viewer.glitch.me/
        
       | kemotep wrote:
       | This is like the advantage of using 6 mile hexes in a tabletop
       | rpg map.[0]
       | 
       | Each hex can be divided into smaller hexes until you get to the
       | level of feet/meters as opposed to miles/kilometers.
       | 
       | [0]: https://steamtunnel.blogspot.com/2009/12/in-praise-
       | of-6-mile...
        
       | crazygringo wrote:
       | I've read all the comments here and still don't understand the
       | rationale for hexagons as opposed to squares.
       | 
       | In every sense, squares seem to be much easier to reason about
       | and easier to hierarchically partition than hexagons are.
       | 
       | There are certain advantages to hexagons in certain contexts,
       | like six degrees of movement instead of four in board games, but
       | I don't see how any of those advantages translate here for
       | geographical indexing.
       | 
       | I'd love to understand why hexagons as opposed to squares in
       | _this_ context are a superior solution rather than unnecessary
       | complexity?
        
         | gleenn wrote:
         | IIRC it has to do with tiling the globe. You can't correctly
         | tild a sphere with swuares. On a local level, it seems easier
         | to reason about squares on a flat surface, but the Earth is
         | round and Uber is international.
        
           | crazygringo wrote:
           | Thanks! I'm still a bit confused because you can't tile the
           | globe with hexagons either, if I understand correctly --
           | everything I can find online says you need a mix of hexagons
           | and pentagons, and you can think of a soccer ball as an
           | example.
           | 
           | Has Uber figured out a way to do it with just hexagons?
        
             | noplacelikehome wrote:
             | Since cars don't need to cross water, maybe this isn't
             | actually relevant. Uber aren't trying to model the planet,
             | just the routes within the geographies they support.
        
             | tiagod wrote:
             | h3 has the required pentagons in the ocean (dymaxion
             | icosahedron vertices)
        
               | Tistron wrote:
               | At least one of the pentagons is on Norway. And another
               | one contains Beijing.
        
               | zamadatix wrote:
               | I think what GP is saying is if you zoom to the max level
               | 15 there remain only 12 cells which come out as pentagons
               | and all 12 of those are in the ocean (albeit sometimes
               | very near shore). If you zoom to the global scale those
               | 12 get inherited into ever larger parent cells which do
               | cover large areas of the Earth though, as you found.
        
             | setr wrote:
             | iirc it's exactly 12 pentagons required to tile the sphere
             | with hexagons, independent of hex-tiling size. So it's
             | _almost_ fully consistent, and uber tries to put most of
             | those pentagons in the ocean
        
         | korkoros wrote:
         | Squares have a contiguity problem: does a square in a grid have
         | 4 neighbors (rook contiguity) or 8 (queen contiguity)? Hexagons
         | do not have this problem. All neighbors of a hexagon share a
         | full edge, not just a vertex. All neighbors of a hex also have
         | their centers equidistant from that hex. A massive number of
         | spatial problems turn on neighborhood definitions, and hexes
         | are almost always a better representation of reality than
         | squares.
        
           | crazygringo wrote:
           | But if you assume rook contiguity then it seems equivalent to
           | hexagons. All neighbors share a full edge, all neighboring
           | centers are equidistant.
           | 
           | I get that you're saying hexes are almost always a better
           | representation. I still don't see a concrete example of why,
           | for geographical indexing specifically.
           | 
           | [Edit: sibling reply explained that at the end of the day,
           | it's not about indexing but rather route planning.]
        
             | ruined wrote:
             | and if you assume cows are spherical and frictionless it's
             | quite convenient
             | 
             | sometimes roads go diagonal
        
             | alexanderchr wrote:
             | Let's say you are looking for the closest gas station. One
             | that is in the close corner of a "diagonal neighbour" would
             | be closer than most points in the "edge neighbours". So if
             | you want to find something nearby, you'd usually want to
             | look at all 8 neighbours. The hexagonal neighbours look
             | more like a circle centered in the original hexagon, thus
             | more convenient for that purpose.
        
         | duncanfwalker wrote:
         | (closer to) equal distance between all the neighbouring cells
         | is really helpful when you're modelling a problem as a network
         | - eg analysing and planning routes and capacity.
         | 
         | Hierarchical partitioning with exact containment is useful when
         | aggregating but that's not always the most critical property.
        
           | crazygringo wrote:
           | Ah OK thank you that is making more sense -- that since Uber
           | is primarily about navigation and route planning, hexagons
           | minimize the number of cells or total land coverage you need
           | to retrieve/analyze between two points? I think that was the
           | missing piece of information I was looking for!
        
         | jandrewrogers wrote:
         | For the purposes of visualization, you want each cell to
         | enclose approximately equal surface area. These are your
         | "pixels". H3 is a way of rendering any subset of a sphere for
         | display.
         | 
         | While squares have superior properties for analytical
         | geospatial data processing that H3 doesn't have, such as
         | congruency, they really only work for Euclidean spaces and the
         | surface of a 2-sphere is non-Euclidean. Any system using
         | squares will be a poor approximation of "equal area" relative
         | to hexagons, which makes them poor for visualization. To use
         | squares for indexing, you need an extra step that allows non-
         | Euclidean space to be addressable from Euclidean space. There
         | are two main ways of doing this.
         | 
         | First, one can project the surface of the sphere onto the
         | surface of a Euclidean cube. Second, one can use an embedding,
         | indexing the 2-sphere in Euclidean 3-space. Both of these can
         | be trivially projected to a hexagonal system like H3 for
         | visualization purposes and commonly are.
         | 
         | If you primarily need visualization and your data is small,
         | using H3 eliminates the step where you need to figure out how
         | to map non-Euclidean data models to Euclidean data models. If
         | you are doing large-scale geospatial processing, it becomes
         | worth the effort for both scalability and performance reasons.
        
         | doktorhladnjak wrote:
         | You can more smoothly interpolate a function for points on a
         | surface using hexagons than with squares. For Uber, think about
         | things like surge pricing. If you compute a surge % for each
         | hexagon, you can interpolate it for each rider location.
        
         | feverzsj wrote:
         | In practice, there is little to no advantage. Any spacing
         | filling index is trying to fit into an ordered index like
         | B-tree, so existing horizontal scaling solution can be directly
         | applied. The problem is it's much harder and inefficient to
         | implement computational geometry algorithms on spacing filling
         | index than a real spatial index like R-tree. On the other hand,
         | scaling can be easily solved by table partition on different
         | area.
        
         | zamadatix wrote:
         | This will seem like a nit at first but it's really a key driver
         | to why you'd look at other shapes: a hexagon is more comparable
         | to a quadrangle, a square is more comparable to what is called
         | "a regular hexagon". Regular meaning the sides and angles which
         | make up the shape are all equal. In the 2D world, such as on
         | board games, equilateral triangles, squares, and regular
         | hexagons can all tile a plane perfectly. This is not the case
         | for the surface of a sphere, there is no tiling regular polygon
         | in that case.
         | 
         | From there you're just trying to optimize uniformity in
         | distance to neighbors, how big the adjustments to the irregular
         | polygons need to be to get them to tile on the surface are, how
         | easy the polygon is to split up into smaller similarly shaped
         | variants of itself as sub tiles, and trying to be somewhat
         | close to a circle in shape as that means the average distance
         | to the center of the area defined by the index is as close to
         | as it can be.
         | 
         | If you chunk through those you'll find quadrangles aren't
         | attractively simple anymore and hexagons tend to optimize the
         | parameters very well. H3 actually uses both hexagons and the
         | occasional pentagon (12 total, no matter the zoom level). It
         | all comes down to "tiling isn't going to be perfect - what is
         | the most optimal answer for the purpose of the tiling".
        
         | hankchinaski wrote:
         | From the docs
         | 
         | >The cell shape of that grid system is an important
         | consideration. For simplicity, it should be a polygon that
         | tiles regularly: the triangle, the square, or the hexagon. Of
         | these, triangles and squares have neighbors with different
         | distances. Triangles have three different distances, and
         | squares have two different distances. For hexagons, all
         | neighbors are equidistant
         | 
         | And
         | 
         | >This property allows for simpler analysis of movement.
         | Hexagons have the property of expanding rings of neighbors
         | approximating circles
         | 
         | >Hexagons are also optimally space-filling. On average, a
         | polygon may be filled with hexagon tiles with a smaller margin
         | of error than would be present with square tiles.
        
         | KaiserPro wrote:
         | I'm not sure either. http://s2geometry.io/ works well and you
         | only have to really worry about 4 points to draw the circle.
        
         | serjester wrote:
         | I created an h3 polars implementation and I think one of the
         | big one's is flow modeling. As other's have mentioned, each
         | hexagon has 6 equidistant neighbors. As a result, it lends
         | itself to analyzing telematics data (hence Uber's backing of
         | h3) - I attached a colab notebook where I do some of this.
         | Trying to do this sort of analysis any other way would be
         | incredibly painful.
         | 
         | [1] https://github.com/Filimoa/polars-h3
         | 
         | [2]
         | https://drive.google.com/file/d/18jIVEbE_1QbwTbHdMqj0AVqguf2...
        
       | tomtomistaken wrote:
       | Hexagons could be a new kind of political entity of the future,
       | where, for example, rewards for climate progress (measured by
       | satellites) could be put into practice.
        
       ___________________________________________________________________
       (page generated 2025-03-09 22:01 UTC)