[HN Gopher] Reverse geocoding is hard
       ___________________________________________________________________
        
       Reverse geocoding is hard
        
       Author : pavel_lishin
       Score  : 154 points
       Date   : 2025-04-27 14:45 UTC (8 hours ago)
        
 (HTM) web link (shkspr.mobi)
 (TXT) w3m dump (shkspr.mobi)
        
       | the_arun wrote:
       | Nicely written article. So simple yet interesting. I wish more
       | people made projects like these.
        
         | edent wrote:
         | Thank you! I appreciate that :-)
        
       | johnlk wrote:
       | It's almost more of a UX challenge than anything. The feedback
       | widget idea at the end could offer a crowd sourced solution the
       | same way Twitch solved translation via crowdsourcing.
        
       | andrewaylett wrote:
       | It's a lot more expensive, but measuring navigation distance
       | rather than straight line distance would avoid the "river" issue.
       | Although depending on the routing engine and dataset it might
       | well introduce more issues where points can be really close on
       | foot but the only known route is a driving route.
        
         | edent wrote:
         | If you know of an API which does navigation distance to POI,
         | I'd love to hear about it!
        
           | nerdralph wrote:
           | I've used OSRM and Arcgis for addresses in Canada. I think
           | one or both of them have POI support in their APIs.
           | https://route.arcgis.com/arcgis/
        
           | rahimnathwani wrote:
           | Google has Routes API:
           | https://developers.google.com/maps/documentation/routes
        
           | petre wrote:
           | Check out Graphopper. But if your POIs are from OSM, OSRM
           | might be okay as well.
        
           | mootothemax wrote:
           | You can self-host or run locally Valhalla
           | (https://github.com/valhalla/valhalla), reading in data from
           | OSM as a starting point.
           | 
           | (For my purposes, I went with local running, generating
           | walking-distance isochrones across pretty much well the
           | entire UK)
        
       | morkalork wrote:
       | If I were giving directions to another human and not using house
       | addresses I'd say something like "Queen street about half way
       | down the block between Crawford and Shaw"
        
         | edent wrote:
         | That's great for cities with a grid layout, but ignores most of
         | the world.
         | 
         | How would you give directions to something in the middle of a
         | park?
        
         | Propelloni wrote:
         | Fascinating to read. Around here people would do something like
         | "follow the road (hand pointing down the road) at the first
         | t-crossing turn left into a smaller road (hand pointing in the
         | meant direction) continue for, I don't know, a few minutes. On
         | your right (hand shows the meant direction) you'll see the
         | park. A road comes up on the left and directly opposite of it
         | is an entrance into the park. Go into the park and follow the
         | path until you reach the first crossing. Turn left (hand shows
         | meant direction) then follow until you reach the end of the
         | park. The park should make a right turn there. The bench should
         | be to your left."
         | 
         | Rarely if ever do people use road names to direct pedestrians,
         | or car drivers. I guess, the people don't know them. I
         | wouldn't.
        
       | rovr138 wrote:
       | Have you looked at the geonames database?,
       | https://www.geonames.org/
       | 
       | Info and schema is here,
       | https://download.geonames.org/export/dump/readme.txt
       | 
       | Could be a good source. Not sure how good it is worldwide, but
       | the countries I've used it for, it's been useful and pretty good.
       | 
       | Try the search too,
       | https://www.geonames.org/search.html?q=R%C3%ADo+grande&count...
       | 
       | Not just roads, but there's rivers, and other things too
        
         | edent wrote:
         | That does look interesting. I _could_ search through it for a
         | lat  & long, but it looks like it only gives a name (e.g.
         | "Silicon Oasis") without a corresponding country. Food for
         | thought though.
         | 
         | Thanks!
        
           | rovr138 wrote:
           | Yeah. It's not flat.
           | 
           | You can use admin fields, and it's a recursive query to find.
           | 
           | I have recursive CTE (thanks to ChatGPT).
           | 
           | Could also be done on save, since they shouldn't change for
           | locations.
           | 
           | The recursiveness though, gives you a benefit if you extract
           | type and save the intermediate steps, it allows you to start
           | grouping things together at different levels which is one of
           | the use cases you mentioned.
        
         | juliansimioni wrote:
         | Geonames is a great dataset, in fact it's one of the "OG" open-
         | source databases of the modern era, dating back to 2005.
         | 
         | It has fairly comprehensive coverage of countries, cities, and
         | major landmarks. It also has stable, simple identifiers that
         | are somewhat of a lingua-franca in the geospatial data world
         | (i.e. Geonames ID 5139572 points to the Statue of Liberty and
         | if you have other data that you need to unambiguously associate
         | with the one Statue of Liberty in New York Harbor, putting a
         | `geonames_id` column in your database with that integer will
         | pretty much solve it, and will allow anyone else you work with
         | to understand the connection clearly too).
         | 
         | However, to be honest, it hasn't really kept pace with modern
         | times. The velocity of changes and updates is pretty low, it
         | doesn't actively grow the community anymore. The data format is
         | simple and rigid and built on old tech that's increasingly hard
         | to work with. You can trust Geonames to have the Statue of
         | Liberty, but not the latest restaurants in NYC.
         | 
         | For a problem like the post author has of finding ways everyday
         | people can easily navigate to something like a park bench that
         | might not have a single address associated with it, or even if
         | it does, needs more granularity to find _that_ specific bench
         | in a park with 100 benches, Geonames probably won't help.
         | 
         | Source: I'm co-founder of Geocode Earth, one of the geocoding
         | companies linked in the blog post. We use Geonames as one
         | source of POI data amongst many others.
        
       | jandrewrogers wrote:
       | Most people don't have an intuitive sense of just how technically
       | difficult mapping from real geospatial coordinates to feature
       | spaces is. This is a great example of a relatively simple case.
       | You are essentially doing inference on a sparse data model with
       | complex local non-linearities throughout. If you add in dynamic
       | relationships, like things that move in space, it becomes another
       | order of magnitude worse. We frequently don't have enough data to
       | make a reliable inference even in theory and you need a way of
       | reliably determining that.
       | 
       | This problem has been the subject of intense interest by the
       | defense research community for decades. It has been conjectured
       | to be an AI-complete type problem for at least ten years, i.e.
       | solving it is equivalent to solving AGI. The current crop of LLM
       | type AI persistently fails at this class of problems, which is
       | one of the arguments for why LLM tech can't lead to true AGI.
        
         | TimTheTinker wrote:
         | Just putting this out there. This is one area where Esri's
         | software really shines. They have _so many_ software offerings
         | and so much is said about different things you can do with
         | ArcGIS (and competing systems), but the capability of their
         | projection engine and geocoding systems - the code that lies at
         | its heart - is unmatched, by far, at least as of 5 years ago
         | when I left for a different company.
         | 
         | I had long conversations with Esri's projection engine lead.
         | Really remarkable guy - he's got graduate degrees in geography
         | and math (including a PhD) and he's an excellent C/C++
         | developer. That kind of expertise trifecta is rare. I'd walk by
         | his office and sometimes see him working out an integral of a
         | massive equation on his whiteboard (not that he didn't also use
         | a CAS). "Oh yeah, I'm adding support for a new projection this
         | week."
        
           | jandrewrogers wrote:
           | Many people don't appreciate the extent that building robust
           | geospatial systems requires seriously hardcore mathematics
           | and physics skills. All of the mapping companies have really
           | smart PhDs wrangling with these problems. I've always enjoyed
           | talking with them about the subtleties of the challenges.
           | There are so many nuances that never occurred to me until
           | they mentioned them.
        
       | AlotOfReading wrote:
       | I haven't found a better way do this than the Google maps
       | solution [0]:
       | 
       | You write a query of all the different kinds of addresses you'd
       | like to display. The query result is a list of valid candidate
       | addresses for the point matching at least one format that you can
       | rank based on whatever criteria you like.
       | 
       | [0]
       | https://developers.google.com/maps/documentation/geocoding/r...
        
         | mvdtnz wrote:
         | It sounds like the author is more interested in getting city or
         | town names from a coordinate. Google maps is massively overkill
         | and horrendously expensive for this use case. I mentioned in
         | another comment I do this in a game I wrote and can complete
         | queries in microseconds.
         | 
         | https://news.ycombinator.com/item?id=43814231
        
       | amelius wrote:
       | Why not take the openstreetmaps address (which is long), chop it
       | into a list of short combinations, then do a lookup for each
       | combination, and see which short address gives you the best
       | (geographically closest) match?
        
       | dpmdpm wrote:
       | I read this as Reverse Genociding Is Hard, thought I was on a
       | Nethack forum, and thought, No, it's pretty easy with a cursed
       | scroll.
        
         | mtmail wrote:
         | At least one spellcheck software likes to correct genocide to
         | geocode. On social media I saw rage posts how Jews and
         | Palestinians are being geocoded.
        
       | Dachande663 wrote:
       | Fun fact that was dredged up because the author mentions
       | Australia: GPS points change. Their example coordinates give 6
       | decimal places, accurate to about 10-15cm. Australia a few years
       | back shifted all locations 1.8m because of continental drift
       | they're moving north at ~7cm/year). So even storing coordinates
       | as a source of truth can be hazardous. We had to move several
       | thousand points for a client when this happened.
        
         | atoav wrote:
         | In the past year or so I have thought a lot about how to design
         | tables and columns within databases and there is nearly nothing
         | that wouldn't get more robust by adding in a "valid_from" and
         | "valid_till" and make it accept multiple values. Someone's name
         | is _Foo_? What if they change it to _Bar_ at some point and you
         | need to access something from before with the old name?
         | 
         | If you have only a name field that has a single value that is
         | going to be a crazy workaround. If your names are referencing a
         | person with a date that is much easier. But you need to make
         | that ddcision pretty early.
        
           | pavel_lishin wrote:
           | If you have an "audit" table, where you write a copy of the
           | data before updating it in the primary table, that's a
           | decision you can make at any point.
           | 
           | Of course, you don't get that historical data, but you do get
           | it going forward from there.
        
             | tough wrote:
             | something like https://www.pgaudit.org/ ?
             | 
             | Basically you keep an history of all changes so you can
             | always roll-back / get that data if needed?
        
               | pavel_lishin wrote:
               | The last time we did this, we basically hand-rolled our
               | own, with a database trigger to insert data into a
               | different table whenever an `UPDATE` statement happened.
               | 
               | But this seems like it's probably a better solution.
        
               | tough wrote:
               | never had used pgaudit yet to vouch for it but have it on
               | the backburner/log of things to try for such a use case!
               | 
               | I think the real magic is it lleverages the WAL (write
               | ahead logs) from pg engine itself, which you could
               | certainly hook up into too, but im not a db expert here
        
             | homebrewer wrote:
             | SQL 2011 defines temporal tables, which few FOSS databases
             | support. I used it in mariadb:
             | 
             | https://mariadb.com/kb/en/temporal-tables/
             | 
             | and if your schema doesn't change much, it's practically
             | free to implement, much easier and simpler than copypasting
             | audit tables, or relying on codegen to do the same.
        
           | boramalper wrote:
           | See also "Eventual Business Consistency"[0] by Kent Beck.
           | Really good read.
           | 
           | > _Double-dated data_ --we tag each bit of business data with
           | 2 dates:
           | 
           | > * The date on which the data changed out in the real world,
           | the _effective_ date.
           | 
           | > * The date on which the system found out about the change,
           | the _posting_ date.
           | 
           | > Using effective & posting dates together we can record all
           | the strange twists & turns of feeding data into a system.
           | 
           | [0] https://tidyfirst.substack.com/p/eventual-business-
           | consisten...
        
           | jandrewrogers wrote:
           | The tradeoff is that this is very expensive at the scale of
           | large geospatial data models both in terms of performance and
           | storage. In practice, it is much more common to just take
           | regular snapshots of the database. If you want to go back in
           | time, you have to spin-up an old snapshot of the database
           | model.
           | 
           | A less obvious issue is that to make this work well, you need
           | to do time interval intersection searches/joins at scale.
           | There is a dearth of scalable data structures and algorithms
           | for this in databases.
        
           | nradov wrote:
           | Anyone who works with human names should take a look at the
           | HL7 V3 and FHIR data models, which were designed for
           | healthcare. They support name validity ranges, and a bunch of
           | other related metadata. It can be challenging to efficiently
           | represent those abstract data models in a traditional
           | traditional database because with a fully normalized schema
           | you end up needing a lot of joins.
        
         | AlotOfReading wrote:
         | GPS coordinates actually account for the motion of the Earth's
         | tectonic plates. The problem is that it's a highly approximate
         | model that doesn't accurately reflect areas like Australia very
         | well.
         | 
         | There's a great visualizer of the coordinate velocity from the
         | Earthscope team:
         | 
         | https://www.unavco.org/software/visualization/GPS-Velocity-V...
        
           | meindnoch wrote:
           | >GPS coordinates actually account for the motion of the
           | Earth's tectonic plates.
           | 
           | What?
        
             | 867-5309 wrote:
             | _accounts for (something)_ , phrasal verb meaning
             | "considers; incorporates; takes on board" as opposed to the
             | more obvious "gives rise to; is responsible for". I had to
             | read twice too
        
               | meindnoch wrote:
               | Yeah, I know what "accounts for" means.
               | 
               | I just can't comprehend how GPS coordinates could account
               | for the tectonic plates' motion. Never heard of such a
               | thing, and can't see how it would work on a conceptual,
               | mathematical level.
        
           | janzer wrote:
           | I'm pretty positive that is showing the reverse, i.e. how
           | much a given "location" is moving using gps coordinates. Not
           | adjusting the gps coordinates to refer to a constant
           | "location".
        
           | jandrewrogers wrote:
           | GPS coordinates do not account for tectonic motion. It is a
           | synthetic spheroidal model that is not fixed to any point on
           | Earth. The meridians are derived from the average motion of
           | many objects, some of which are not on the planetary surface.
           | 
           | The motion of tectonic plates can be calculated relative to
           | this spatial reference system but they are not part of the
           | spatial reference system and would kind of defeat the purpose
           | if they were.
        
             | AlotOfReading wrote:
             | The corrections are incorporated into the datum. WGS84 is
             | updated every 6 months to follow ITRF by changing the
             | tracking station locations as the plates move around.
        
               | meindnoch wrote:
               | That's about correcting the ground stations' coordinates.
               | It doesn't help keeping your house's GPS coordinates
               | fixed. If the tectonic plate your house is built on moves
               | a meter over the course of a decade, then your house's
               | GPS coords will change in the lower decimals, and
               | eventually your government's land registry will need to
               | update those values.
        
               | jandrewrogers wrote:
               | If WGS84 was correcting for tectonic drift it would imply
               | that the coordinates of the terrestrial fixed points used
               | to compute the reference meridian never change under
               | WGS84. Rebasing the coordinates of terrestrial fixed
               | points prior to calculation disregards tectonic drift in
               | the reference meridian calculation, it doesn't correct
               | it. It is a noise reduction exercise to minimize the
               | influence of plate tectonics on meridian drift. The
               | meridian uses non-terrestrial fixed points too that don't
               | have a concept of tectonic drift (but may introduce their
               | own idiosyncratic sources of noise).
               | 
               | Basically, these are corrections to their "fixed points"
               | to make them behave more like actual fixed points in the
               | reference meridian model. It doesn't eliminate tectonic
               | drift effects when using coordinates in that spatial
               | reference system.
        
         | xucheng wrote:
         | Can this be solved by storing a timestamp of the record along
         | with precise GPS coordinates? Could we then utilize some
         | database to compute the drift from then and now?
        
           | haneefmubarak wrote:
           | I mean, certainly - if you store both GPS time and derived
           | coordinates from the same sampling, then you can always later
           | interpret it as needed - whether relative to legal or
           | geographical boundaries etc as you might want to interpret in
           | the future.
        
           | jandrewrogers wrote:
           | Yes, in fact it should essentially be mandatory because the
           | spatial reference system for GPS is not fixed to a point on
           | Earth. This has become a major issue for old geospatial data
           | sets in the US where no one remembered to record _when_ the
           | coordinates were collected.
           | 
           | To correct for these cases you need to be able to separately
           | attribute drift vectors due to the spatial reference system,
           | plate tectonics, and other geophysical phenomena. Without a
           | timestamp that allows you to precisely subtract out the
           | spatial reference system drift vector, the magnitude of the
           | uncertainty is quite large.
        
         | pavel_lishin wrote:
         | Damn! 7cm per year feels blazing fast when you consider the
         | fact that it's a whole continent.
        
           | anotherevan wrote:
           | We're coming for you!
        
           | XorNot wrote:
           | I mean I'm still mind blown that the Three Gorges dam in
           | China literally changed the rotational speed of the Earth,
           | and thus the length of the day.
        
         | akst wrote:
         | My knowledge of geospatial sets is fairly shallow, but I've
         | worked a bit with Australian map data and I'm assuming are you
         | referring to the different CRSs, GDA2020 and GDA1994?
         | 
         | I'd imagine older coordinates would work with the earlier CRS?
         | 
         | But I can understand not all coordinates specify their CRS.
         | This have really been an issue for me personally, but I've
         | mostly worked with NSW spatial and the Australian Bureau of
         | statistics geodata.
        
         | jandrewrogers wrote:
         | Even accounting for tectonic drift, there is a concept of
         | positioning reproducibility that is separate from precision. In
         | general the precision of the measurements is much higher than
         | the reproducibility of the same measurements. That is, you may
         | be able to measure a fixed point on the Earth using an
         | instrument with 1cm precision at a specific point in time but
         | if you measure that same point every hour for a year with the
         | same instrument, the disagreement across measurements will
         | often be >10cm (sometimes much greater), which is much larger
         | than e.g. tectonic drift effects.
         | 
         | For this reason, many people use the reproducibility rather
         | than instrument precision as the noise floor. It doesn't matter
         | how precise an instrument you use if the "fixed point" you are
         | measuring doesn't sit still relative to any spatial reference
         | system you care to use.
        
           | Robotbeat wrote:
           | The whole accuracy vs precision thing.
        
             | jandrewrogers wrote:
             | Related but slightly different. The accuracy is real but it
             | is only valid at a point in time. Consequently, you can
             | have both high precision and high accuracy that nonetheless
             | give different measurements depending on when the
             | measurements were made.
             | 
             | In most scientific and engineering domains, a high-
             | precision, high-accuracy measurement is assumed to be
             | reproducible.
        
         | RainyDayTmrw wrote:
         | This is one of many reasons why property surveying records use
         | so many seemingly obscure or redundant points of reference. In
         | case anyone wonders why modern property surveying isn't only
         | recording lots of GPS coordinates.
        
       | vintermann wrote:
       | Genealogy applications run into this a lot. The person of
       | interest lived at Engeset. FamilySearch has geocoded a place
       | called "Engeset, More og Romsdal, Norway". So that's it, right?
       | Not so fast, [there are at least 3 Engesets in More og Romsdal](h
       | ttps://www.google.com/maps/search/Engeset/@62.3358577,6.225...).
       | 
       | But that's at least better than when it's some local place name
       | which it's never heard of, and thinks sounds most similar to a
       | place in Afghanistan (this happens all the time).
       | 
       | And to add to it, there are administrative regions, and
       | ecclesiastical regions. Do you put them in the parish, or in the
       | municipality? The birth in the parish and the baptism in the
       | municipality, maybe? How about the burial then...
        
         | modeless wrote:
         | Converting from a name/address to coordinates is geocoding.
         | Reverse geocoding is mapping from coordinates to a
         | name/address.
        
       | punnerud wrote:
       | I created this to solve my own need for reverse geocoding:
       | https://github.com/punnerud/rgcosm (Saving me thousands of $
       | compared to Google API)
       | 
       | Uses OpenStreetmap file, Python and SQLite3.
       | 
       | First it finds all addresses using +/- like a square from
       | lat/lon, then calculate distance based on the smaller list
       | (Pythagoras), and pick the closest. It expands until a set
       | maximum if no address is found in the first search.
        
         | davidmurdoch wrote:
         | Just curious if you looked into using S2 cells for this? It's
         | what Pokemon Go uses for its coordinate system.
         | http://s2geometry.io/devguide/s2cell_hierarchy.html
        
           | punnerud wrote:
           | Isn't the main purpose of S2 to be able to scan from
           | different "directions"? More a purpose of Google Maps when
           | viewing the world as a spherical object compared to Sqlite3
           | just using a simple B-tree index on lat+lon?
        
             | davidmurdoch wrote:
             | The individual cells being sized and being able to easily
             | to compute neighboring cells seems useful for the described
             | algorithm. I haven't given it much thought on applicability
             | here, but it sounded somewhat similar to a search pattern I
             | once implemented within pgsql to locate items on a map that
             | were within proximity of a given latlong.
        
       | gmoore wrote:
       | maybe the 'three words' model? Seems like it would be specific
       | enough to locate a bench
        
         | cjs_ac wrote:
         | WhatThreeWords is a proprietary algorithm and has problems with
         | homophones.
        
         | Liquid_Fire wrote:
         | That's essentially equivalent to coordinates since you still
         | need to translate it to some human-understandable form, so it
         | doesn't solve the problem.
        
         | 1970-01-01 wrote:
         | Yeah, that would absolutely work, but it's not free and
         | donations may or may not cover their costs:
         | https://accounts.what3words.com/select-plan
        
       | sinuhe69 wrote:
       | Not my area of expertise, but is this not a form of perfectionist
       | problem? I mean, most places have a clear and simple address. For
       | the rest, either a human can solve it, or we can make a few
       | examples and let an AI do the work. We can go back to them later
       | and revise them if we need to. Addresses don't change often, so I
       | think things can stay the same for a long time.
       | 
       | Except for emergency dispatch and a few high-profile use cases,
       | you can have a good enough address to let the user find its
       | neighbourhood. But they still have the GPS or other form of
       | address coding, so they can find the exact location easily. I'd
       | say 99.9% of the cases are like that. The rest can be solved
       | quickly by looking at the map!
        
         | edent wrote:
         | I am deeply guilty of being a perfectionist!
         | 
         | Ultimately, I just want something which is a nice balance
         | between being useful for a human and not so long that it is
         | overwhelming.
        
           | curiousObject wrote:
           | You're the author?
           | 
           | The final step in the process "Wait for complaints" seems
           | like a smart acceptance of the "perfect is the enemy of good"
           | challenge
           | 
           | Publish and be damned, or as we say now: Move fast and break
           | things
        
         | smitty1e wrote:
         | I was going to take this tack.
         | 
         | 80% of the problem is just transforming floating point
         | coordinates into API calls.
         | 
         | Getting to something useful with it is the hard 20%, and it
         | will be a diminishing returns problem after that.
         | 
         | While not anybody's LLM proponent, that last mile might be a
         | good AI application.
        
         | ryandrake wrote:
         | You can call it perfectionism or you can call it "doing it
         | right." I think this gets at a fundamental difference in
         | philosophy among [software] engineers: We have a problem with a
         | lot of edge cases, where a "good enough" solution can be done
         | quickly. What do we do? There's a class of engineers who say 1.
         | Do the "good enough" solution and ignore/error on the edge
         | cases--we'll fix them later somehow (may or may not have an
         | actual plan to do this). And there's a class of engineers who
         | say 2. We cannot solve this problem correctly yet and need more
         | research and better data.
         | 
         | Unfortunately (in my view), group #1 is making all the products
         | and is responsible for the majority of applications of
         | technology that get deployed. Obviously this is the case
         | because they will take on projects that group #2 cannot, and
         | have no compunction against shipping them. And we can see the
         | results with our eyes. Terrible software that constantly
         | underestimates the number and frequency of these "edge cases"
         | and defects. Terrible software that still requires the user to
         | do legwork in many cases because the developers made an
         | incorrect assumption or had bad input data.
         | 
         | AI is making this problem even worse, because now we don't even
         | know what the systems can and cannot do. LLMs
         | nondeterministically fail in ways that sometimes can't even be
         | directly corrected with code, and all engineering can do is
         | stochastically fix defects by "training with better models."
         | 
         | I don't know how we get out of this: Every company is
         | understandably biased towards "doing now" rather than "waiting"
         | to research more and make a better product, and the doers
         | outcompete the researchers.
        
           | sbarre wrote:
           | > Unfortunately (in my view), group #1 is making all the
           | products and is responsible for the majority of applications
           | of technology that get deployed.
           | 
           | This is an interesting take, and I think I see where you're
           | coming from..
           | 
           | My first thought on "why" is that so many products today are
           | free to the user, meaning the money is made elsewhere, and so
           | the experience presented to the user can be a lot more
           | imperfect or non-exhaustive than it would otherwise have to
           | be if someone was paying to use that experience.
           | 
           | So edge cases can be ignored because really you're looking
           | for a critical mass of eyeballs to sell to advertisers or to
           | harvest usage data from, etc.. If a small portion of your
           | users has a bad time or experiences errors, well, you get
           | what you pay for as they say..
           | 
           | And does that kind of pervasiveness now mean that many
           | engineers think this is just the way to go no matter what?
        
         | jandrewrogers wrote:
         | The update rate for a global map data model, all of which are
         | still woefully incomplete in many contexts, is surprisingly
         | high. The territory underlying the map is a lot less static
         | than people assume. Also, local reality is often much less
         | "regular" than people assume such that a person really can't
         | figure it out reliably. Currently there are literally thousands
         | of people tasked with incorporating these changes because it
         | has proven to be resistant to automation thus far due to the
         | pervasiveness of edge cases. For your basic global map data
         | model, these are the edge cases that are left _after_ several
         | thousand heuristic and empirically derived rules have been
         | applied.
         | 
         | It is a deeply complex data model that changes millions of
         | times a day in unpredictable ways. Unfortunately, many
         | applications are very sensitive to the local accuracy of the
         | model, which is much higher variance than average accuracy.
         | Only trying to be "good enough" in an 80/20 rule sense is the
         | same as "broken". The updates are also noisy and often contain
         | errors, so the process has to be resilient to those errors.
         | 
         | The resistance of the problem to automation and the high rate
         | of change has made it extremely expensive to asymptotically
         | converge on model with consistently acceptable accuracy for the
         | vast majority of applications.
        
         | mootothemax wrote:
         | > most places have a clear and simple address
         | 
         | That depends on your definition of "clear and simple" and
         | "address" :) While a lot boils down to use case - are you
         | trying to navigate somewhere, or link a string to an address? -
         | even figuring out _what_ is an address can be hard work. Is an
         | address the entrance to a building? Or a building that accepts
         | postal deliveries? Is the  "shell" of a building that contains
         | a bunch of flats/apartments but doesn't itself have a postal
         | delivery point or bills registered directly to it an address?
         | How about the address the a location was known as 1 year ago? 2
         | years ago? 10 years ago?
         | 
         | Park and other public spaces can be fun; they may have many
         | local names that are completely different to the "official"
         | name - and it's a big "if" whether an official name exists at
         | all. Heck, most _roads_ have a bunch of official names that are
         | anything but the names people refer to them as. I have a
         | screaming obsession with the road directly in front of
         | Buckingham Palace that, despite what you see on Google Maps, is
         | registered as "unnamed road" in all of the official sources.
         | 
         | > Addresses don't change often
         | 
         | At the individual level, perhaps. In aggregate? Addresses
         | change all the time, sometimes unrecognisably so. City and town
         | boundaries are forever expanding and contracting, and the
         | borders between countries are hardly static either (and if
         | you're ever near the Netherlands / Belgium border, make a quick
         | trip to Baarle-Hertog and enjoy the full madness). Thanks to
         | intercontinental relative movement, the coordinates we log
         | against locations have a limited shelf life too. All of the
         | things I used to think were certain...
         | 
         | If someone hasn't done "faleshoods programmers believe about
         | addresses," I think its time might be now!
         | 
         | Edit: answering myself with
         | https://www.mjt.me.uk/posts/falsehoods-programmers-believe-a...
        
       | dadadad100 wrote:
       | This problem seems to also exist for services like uber. Their
       | solution seems easier, drop a pin on a map. Perhaps working so
       | hard to find a textual description is missing the simpler
       | solution.
        
       | blacklight wrote:
       | As the developer of a GPS tracking app that relies a lot on
       | OpenStreetMap, I've faced many of these problems myself. A couple
       | of learned lessons/insights:
       | 
       | - I avoid relying on any generic location name/description
       | provided by these APIs. Always prefer structured data whenever
       | possible, and build the locality name from those components
       | (bonus points if you let the user specify a custom format).
       | 
       | - Identifying those components itself is tricky. As the author
       | mentioned, there are countries that have states, others that have
       | regions, other that have counties, or districts, or any
       | combination of those. And there are cities that have suburbs,
       | neighbourhoods, municipalities, or any combination. Oh, and let's
       | not even get started with address names - house numbers?
       | extensions? localization variants - e.g. even the same API may
       | sometimes return "Marrakesh" and sometimes "Marrakech"? and how
       | about places like India where nearby amenities are commonly used
       | instead of house numbers? I'm not aware of any public APIs out
       | there that provide these "expected" taxonomies, preferably from
       | lat/long input, but I'd love to be proven wrong. In the absence
       | of that, I would suggest that is better to avoid double-guessing
       | - unless your software is only intended to run in a specific
       | country, or in a limited number of countries and you can afford
       | to hardcode those rules. It's probably a good option to provide a
       | sensible default, and then let the user override it. Oh, and good
       | catch about abbreviations - I'd say to avoid them unless the user
       | explicitly enables them, if you want to avoid the "does everybody
       | know that IL is Illinois?" problem. Just use "Illinois" instead,
       | at least by default.
       | 
       | - Localization of addresses is a tricky problem only on the
       | surface. My proposed approach is that, again, the user is king.
       | Provide English by default (unless you want to launch your
       | software in a specific country), and let the user override the
       | localization. I feel like the Nominatim's API approach is
       | probably the cleanest: honor the `Accept-Language` HTTP header if
       | available, and if not available, fallback to English. And then
       | just expose that a setting to the user.
       | 
       | - Bounding boxes/polygons can help a lot with solving the
       | proximity/perimeter issue. But they aren't always
       | present/sufficiently accurate in OSM data. And their proper usage
       | usually requires the client's code to run some non-trivial
       | lat/long geometry processing code, even to answer trivial
       | questions such as "is this point inside of this enclosed
       | amenity?" Oh, and let's not even get started with the "what's the
       | exact lat/long of this address?" problem. Is it the entrance of
       | the park? The middle of it? I remember that when I worked with
       | the Bing in the API in the past they provided more granular
       | information at the level of rooftop location, entrance location
       | etc.
       | 
       | - Providing localization information for public benches isn't
       | what I'd call an orthodox use-case for geo software, so I'm not
       | entirely sure of how to solve the "why doesn't everything have an
       | address?" problem :)
        
       | nerdralph wrote:
       | Part of the problem is the different ways addresses are expressed
       | throughout the world. I was born and grew up in Canada, and was
       | confused when I started dealing with companies in China. Instead
       | of street addresses, many are given by province, city, district,
       | sub-district, and a building number.
       | 
       | Another problem is choosing which authority for the "correct"
       | address. I've seen many cases where the official postal address
       | city/town name is different than the 911 database. For example
       | Canada Post will say some street addresses are in Dartmouth,
       | while the official civic address is really Cole Harbour.
       | https://www.canadapost-postescanada.ca/ac/
       | https://nsgi.novascotia.ca/civic-address-finder/
       | 
       | Even streets can have multiple official names/aliases. People who
       | live on "East Bay Hwy", also live on "Highway 4", which is an
       | alias.
        
       | andrew_eu wrote:
       | I have a memorable reverse geocoding story.
       | 
       | I was working with a team that was wrapping up a period of many
       | different projects (including a reverse geocoding service) and
       | adopting one major system to design and maintain. The handover
       | was set to be after the new year holidays and the receiving teams
       | had their own exciting rewrites planned. I was on call the last
       | week of the year and got an alert that sales were halted in
       | Taiwan due to some country code issue and our system seemed at
       | fault. The customer facing application used an address to
       | determine all sorts of personalization stuff: what products
       | they're shown, regulatory links, etc. Our system was essentially
       | a wrapper around Google Maps' reverse geocoding API, building in
       | some business logic on top of the results.
       | 
       | That morning, at 3am, the API stopped serving the country code
       | for queries of Kinmen County. It would keep the rest of the
       | address the same, but just omit the country code, totally
       | botching assumptions downstream. Google Maps seemingly realized
       | all of a sudden what strait the island was in, and silently
       | removed what some people dispute.
       | 
       | Everyone else on the team was on holiday and I couldn't feasibly
       | get a review for any major mitigations (e.g. switching to OSM or
       | some other provider). So I drew a simple polygon around the
       | island, wrote a small function to check if the given coordinates
       | were in the polygon, and shipped the hotfix. Happily, the whole
       | reverse geocoding system was scrapped with a replacement by
       | February.
        
         | modeless wrote:
         | Wow, I had no idea that Taiwan controlled an island less than
         | three miles from mainland China, essentially surrounded by
         | China in a bay. (The main island is 80+ miles away.) I'm really
         | surprised China has allowed that for 80 years. Unsurprisingly,
         | the beach looks like this:
         | https://www.google.com/maps/place/Shuang+Kou+Zhan+Dou+Cun/@2...
         | 
         | Also interesting that there's a Japanese island only 60 miles
         | from Taiwan on the other side. I guess claims to small Pacific
         | islands have been weird for a long time.
        
           | nradov wrote:
           | If the Chinese Communist Party decides to escalate the
           | pressure on Taiwan then one likely scenario is some sort of
           | blockade against those small islands close to the mainland.
        
       | mvdtnz wrote:
       | I dealt with this exact issue and went with that exact solution
       | in my browser based geography game[0].
       | 
       | What the author is looking for is administrative divisions and
       | boundaries[1], in particular probably down to level 3 which is
       | the depth my game goes to. These differ in size greatly by
       | country. With admin boundaries you need to accept there is no
       | one-size-fits-all solution and embrace the quirks of the
       | different countries.
       | 
       | For my game I downloaded a complete database of global admin
       | boundaries[2] and imported them into PostgreSQL for lightning
       | fast querying using PostGIS.
       | 
       | [0] https://guesshole.com
       | 
       | [1]
       | https://en.wikipedia.org/wiki/List_of_administrative_divisio...
       | 
       | [2] https://gadm.org/data.html
        
       | jillesvangurp wrote:
       | This is a while ago but about 12 years ago I experimented with
       | putting the whole of openstreetmap into Elasticsearch.
       | 
       | Reverse geocoding then becomes a problem of figuring out which
       | polygons contain the point with a simple query and which
       | POIs/streets/etc. are closest based on perpendicular distance.
       | For that, I simply did a radius search and some post processing
       | on any street segments. Probably not perfect for everything. But
       | it worked well enough. My goal was actually being able to group
       | things by neighborhood and microneighborhoods (e.g. squares,
       | nightlife areas, etc.).
       | 
       | This should work well enough with anything that allows for
       | geospatial queries. In a pinch you can use geohashes (I actually
       | did this because geospatial search was still a bit experimental
       | in ES).
        
       | 1970-01-01 wrote:
       | >But how do you go from a string of digits to something human
       | readable?
       | 
       | Hasn't What3Words already solved this?
        
         | robin_reala wrote:
         | It's a closed and proprietary system that contains locations
         | like master.beats.slave near Brooklyn and whites.power.life in
         | Washington (to pick just a couple of examples off the top of my
         | head). If your only requirement is "human readable" and assumes
         | the human knows how to read and pronounce English words then I
         | guess it kinda does.
        
           | 1970-01-01 wrote:
           | >and assumes the human knows how to read and pronounce
           | English words
           | 
           | I hit a dozen random benches. If these are the low
           | requirements, W3W would and does do a great job for
           | openbenches: https://openbenches.org/bench/random/
        
             | cmcconomy wrote:
             | so would google's plus-codes,
             | 
             | https://maps.google.com/pluscodes/
        
           | mootothemax wrote:
           | Yeesh slave.trade.market too!
        
       | osmanscam wrote:
       | you can use https://map.name for reverse geocoding
        
       | kylecazar wrote:
       | Good article. FWIW, some major cities offer seating data. New
       | York, for example, returns bench locations as a Point
       | (coordinates). They even have a column in the data for the
       | nearest address of the "seating feature".
       | 
       | https://data.cityofnewyork.us/Transportation/Seating-Locatio...
        
       ___________________________________________________________________
       (page generated 2025-04-27 23:00 UTC)