[HN Gopher] Reverse geocoding is hard
___________________________________________________________________
Reverse geocoding is hard
Author : pavel_lishin
Score : 154 points
Date : 2025-04-27 14:45 UTC (8 hours ago)
(HTM) web link (shkspr.mobi)
(TXT) w3m dump (shkspr.mobi)
| the_arun wrote:
| Nicely written article. So simple yet interesting. I wish more
| people made projects like these.
| edent wrote:
| Thank you! I appreciate that :-)
| johnlk wrote:
| It's almost more of a UX challenge than anything. The feedback
| widget idea at the end could offer a crowd sourced solution the
| same way Twitch solved translation via crowdsourcing.
| andrewaylett wrote:
| It's a lot more expensive, but measuring navigation distance
| rather than straight line distance would avoid the "river" issue.
| Although depending on the routing engine and dataset it might
| well introduce more issues where points can be really close on
| foot but the only known route is a driving route.
| edent wrote:
| If you know of an API which does navigation distance to POI,
| I'd love to hear about it!
| nerdralph wrote:
| I've used OSRM and Arcgis for addresses in Canada. I think
| one or both of them have POI support in their APIs.
| https://route.arcgis.com/arcgis/
| rahimnathwani wrote:
| Google has Routes API:
| https://developers.google.com/maps/documentation/routes
| petre wrote:
| Check out Graphopper. But if your POIs are from OSM, OSRM
| might be okay as well.
| mootothemax wrote:
| You can self-host or run locally Valhalla
| (https://github.com/valhalla/valhalla), reading in data from
| OSM as a starting point.
|
| (For my purposes, I went with local running, generating
| walking-distance isochrones across pretty much well the
| entire UK)
| morkalork wrote:
| If I were giving directions to another human and not using house
| addresses I'd say something like "Queen street about half way
| down the block between Crawford and Shaw"
| edent wrote:
| That's great for cities with a grid layout, but ignores most of
| the world.
|
| How would you give directions to something in the middle of a
| park?
| Propelloni wrote:
| Fascinating to read. Around here people would do something like
| "follow the road (hand pointing down the road) at the first
| t-crossing turn left into a smaller road (hand pointing in the
| meant direction) continue for, I don't know, a few minutes. On
| your right (hand shows the meant direction) you'll see the
| park. A road comes up on the left and directly opposite of it
| is an entrance into the park. Go into the park and follow the
| path until you reach the first crossing. Turn left (hand shows
| meant direction) then follow until you reach the end of the
| park. The park should make a right turn there. The bench should
| be to your left."
|
| Rarely if ever do people use road names to direct pedestrians,
| or car drivers. I guess, the people don't know them. I
| wouldn't.
| rovr138 wrote:
| Have you looked at the geonames database?,
| https://www.geonames.org/
|
| Info and schema is here,
| https://download.geonames.org/export/dump/readme.txt
|
| Could be a good source. Not sure how good it is worldwide, but
| the countries I've used it for, it's been useful and pretty good.
|
| Try the search too,
| https://www.geonames.org/search.html?q=R%C3%ADo+grande&count...
|
| Not just roads, but there's rivers, and other things too
| edent wrote:
| That does look interesting. I _could_ search through it for a
| lat & long, but it looks like it only gives a name (e.g.
| "Silicon Oasis") without a corresponding country. Food for
| thought though.
|
| Thanks!
| rovr138 wrote:
| Yeah. It's not flat.
|
| You can use admin fields, and it's a recursive query to find.
|
| I have recursive CTE (thanks to ChatGPT).
|
| Could also be done on save, since they shouldn't change for
| locations.
|
| The recursiveness though, gives you a benefit if you extract
| type and save the intermediate steps, it allows you to start
| grouping things together at different levels which is one of
| the use cases you mentioned.
| juliansimioni wrote:
| Geonames is a great dataset, in fact it's one of the "OG" open-
| source databases of the modern era, dating back to 2005.
|
| It has fairly comprehensive coverage of countries, cities, and
| major landmarks. It also has stable, simple identifiers that
| are somewhat of a lingua-franca in the geospatial data world
| (i.e. Geonames ID 5139572 points to the Statue of Liberty and
| if you have other data that you need to unambiguously associate
| with the one Statue of Liberty in New York Harbor, putting a
| `geonames_id` column in your database with that integer will
| pretty much solve it, and will allow anyone else you work with
| to understand the connection clearly too).
|
| However, to be honest, it hasn't really kept pace with modern
| times. The velocity of changes and updates is pretty low, it
| doesn't actively grow the community anymore. The data format is
| simple and rigid and built on old tech that's increasingly hard
| to work with. You can trust Geonames to have the Statue of
| Liberty, but not the latest restaurants in NYC.
|
| For a problem like the post author has of finding ways everyday
| people can easily navigate to something like a park bench that
| might not have a single address associated with it, or even if
| it does, needs more granularity to find _that_ specific bench
| in a park with 100 benches, Geonames probably won't help.
|
| Source: I'm co-founder of Geocode Earth, one of the geocoding
| companies linked in the blog post. We use Geonames as one
| source of POI data amongst many others.
| jandrewrogers wrote:
| Most people don't have an intuitive sense of just how technically
| difficult mapping from real geospatial coordinates to feature
| spaces is. This is a great example of a relatively simple case.
| You are essentially doing inference on a sparse data model with
| complex local non-linearities throughout. If you add in dynamic
| relationships, like things that move in space, it becomes another
| order of magnitude worse. We frequently don't have enough data to
| make a reliable inference even in theory and you need a way of
| reliably determining that.
|
| This problem has been the subject of intense interest by the
| defense research community for decades. It has been conjectured
| to be an AI-complete type problem for at least ten years, i.e.
| solving it is equivalent to solving AGI. The current crop of LLM
| type AI persistently fails at this class of problems, which is
| one of the arguments for why LLM tech can't lead to true AGI.
| TimTheTinker wrote:
| Just putting this out there. This is one area where Esri's
| software really shines. They have _so many_ software offerings
| and so much is said about different things you can do with
| ArcGIS (and competing systems), but the capability of their
| projection engine and geocoding systems - the code that lies at
| its heart - is unmatched, by far, at least as of 5 years ago
| when I left for a different company.
|
| I had long conversations with Esri's projection engine lead.
| Really remarkable guy - he's got graduate degrees in geography
| and math (including a PhD) and he's an excellent C/C++
| developer. That kind of expertise trifecta is rare. I'd walk by
| his office and sometimes see him working out an integral of a
| massive equation on his whiteboard (not that he didn't also use
| a CAS). "Oh yeah, I'm adding support for a new projection this
| week."
| jandrewrogers wrote:
| Many people don't appreciate the extent that building robust
| geospatial systems requires seriously hardcore mathematics
| and physics skills. All of the mapping companies have really
| smart PhDs wrangling with these problems. I've always enjoyed
| talking with them about the subtleties of the challenges.
| There are so many nuances that never occurred to me until
| they mentioned them.
| AlotOfReading wrote:
| I haven't found a better way do this than the Google maps
| solution [0]:
|
| You write a query of all the different kinds of addresses you'd
| like to display. The query result is a list of valid candidate
| addresses for the point matching at least one format that you can
| rank based on whatever criteria you like.
|
| [0]
| https://developers.google.com/maps/documentation/geocoding/r...
| mvdtnz wrote:
| It sounds like the author is more interested in getting city or
| town names from a coordinate. Google maps is massively overkill
| and horrendously expensive for this use case. I mentioned in
| another comment I do this in a game I wrote and can complete
| queries in microseconds.
|
| https://news.ycombinator.com/item?id=43814231
| amelius wrote:
| Why not take the openstreetmaps address (which is long), chop it
| into a list of short combinations, then do a lookup for each
| combination, and see which short address gives you the best
| (geographically closest) match?
| dpmdpm wrote:
| I read this as Reverse Genociding Is Hard, thought I was on a
| Nethack forum, and thought, No, it's pretty easy with a cursed
| scroll.
| mtmail wrote:
| At least one spellcheck software likes to correct genocide to
| geocode. On social media I saw rage posts how Jews and
| Palestinians are being geocoded.
| Dachande663 wrote:
| Fun fact that was dredged up because the author mentions
| Australia: GPS points change. Their example coordinates give 6
| decimal places, accurate to about 10-15cm. Australia a few years
| back shifted all locations 1.8m because of continental drift
| they're moving north at ~7cm/year). So even storing coordinates
| as a source of truth can be hazardous. We had to move several
| thousand points for a client when this happened.
| atoav wrote:
| In the past year or so I have thought a lot about how to design
| tables and columns within databases and there is nearly nothing
| that wouldn't get more robust by adding in a "valid_from" and
| "valid_till" and make it accept multiple values. Someone's name
| is _Foo_? What if they change it to _Bar_ at some point and you
| need to access something from before with the old name?
|
| If you have only a name field that has a single value that is
| going to be a crazy workaround. If your names are referencing a
| person with a date that is much easier. But you need to make
| that ddcision pretty early.
| pavel_lishin wrote:
| If you have an "audit" table, where you write a copy of the
| data before updating it in the primary table, that's a
| decision you can make at any point.
|
| Of course, you don't get that historical data, but you do get
| it going forward from there.
| tough wrote:
| something like https://www.pgaudit.org/ ?
|
| Basically you keep an history of all changes so you can
| always roll-back / get that data if needed?
| pavel_lishin wrote:
| The last time we did this, we basically hand-rolled our
| own, with a database trigger to insert data into a
| different table whenever an `UPDATE` statement happened.
|
| But this seems like it's probably a better solution.
| tough wrote:
| never had used pgaudit yet to vouch for it but have it on
| the backburner/log of things to try for such a use case!
|
| I think the real magic is it lleverages the WAL (write
| ahead logs) from pg engine itself, which you could
| certainly hook up into too, but im not a db expert here
| homebrewer wrote:
| SQL 2011 defines temporal tables, which few FOSS databases
| support. I used it in mariadb:
|
| https://mariadb.com/kb/en/temporal-tables/
|
| and if your schema doesn't change much, it's practically
| free to implement, much easier and simpler than copypasting
| audit tables, or relying on codegen to do the same.
| boramalper wrote:
| See also "Eventual Business Consistency"[0] by Kent Beck.
| Really good read.
|
| > _Double-dated data_ --we tag each bit of business data with
| 2 dates:
|
| > * The date on which the data changed out in the real world,
| the _effective_ date.
|
| > * The date on which the system found out about the change,
| the _posting_ date.
|
| > Using effective & posting dates together we can record all
| the strange twists & turns of feeding data into a system.
|
| [0] https://tidyfirst.substack.com/p/eventual-business-
| consisten...
| jandrewrogers wrote:
| The tradeoff is that this is very expensive at the scale of
| large geospatial data models both in terms of performance and
| storage. In practice, it is much more common to just take
| regular snapshots of the database. If you want to go back in
| time, you have to spin-up an old snapshot of the database
| model.
|
| A less obvious issue is that to make this work well, you need
| to do time interval intersection searches/joins at scale.
| There is a dearth of scalable data structures and algorithms
| for this in databases.
| nradov wrote:
| Anyone who works with human names should take a look at the
| HL7 V3 and FHIR data models, which were designed for
| healthcare. They support name validity ranges, and a bunch of
| other related metadata. It can be challenging to efficiently
| represent those abstract data models in a traditional
| traditional database because with a fully normalized schema
| you end up needing a lot of joins.
| AlotOfReading wrote:
| GPS coordinates actually account for the motion of the Earth's
| tectonic plates. The problem is that it's a highly approximate
| model that doesn't accurately reflect areas like Australia very
| well.
|
| There's a great visualizer of the coordinate velocity from the
| Earthscope team:
|
| https://www.unavco.org/software/visualization/GPS-Velocity-V...
| meindnoch wrote:
| >GPS coordinates actually account for the motion of the
| Earth's tectonic plates.
|
| What?
| 867-5309 wrote:
| _accounts for (something)_ , phrasal verb meaning
| "considers; incorporates; takes on board" as opposed to the
| more obvious "gives rise to; is responsible for". I had to
| read twice too
| meindnoch wrote:
| Yeah, I know what "accounts for" means.
|
| I just can't comprehend how GPS coordinates could account
| for the tectonic plates' motion. Never heard of such a
| thing, and can't see how it would work on a conceptual,
| mathematical level.
| janzer wrote:
| I'm pretty positive that is showing the reverse, i.e. how
| much a given "location" is moving using gps coordinates. Not
| adjusting the gps coordinates to refer to a constant
| "location".
| jandrewrogers wrote:
| GPS coordinates do not account for tectonic motion. It is a
| synthetic spheroidal model that is not fixed to any point on
| Earth. The meridians are derived from the average motion of
| many objects, some of which are not on the planetary surface.
|
| The motion of tectonic plates can be calculated relative to
| this spatial reference system but they are not part of the
| spatial reference system and would kind of defeat the purpose
| if they were.
| AlotOfReading wrote:
| The corrections are incorporated into the datum. WGS84 is
| updated every 6 months to follow ITRF by changing the
| tracking station locations as the plates move around.
| meindnoch wrote:
| That's about correcting the ground stations' coordinates.
| It doesn't help keeping your house's GPS coordinates
| fixed. If the tectonic plate your house is built on moves
| a meter over the course of a decade, then your house's
| GPS coords will change in the lower decimals, and
| eventually your government's land registry will need to
| update those values.
| jandrewrogers wrote:
| If WGS84 was correcting for tectonic drift it would imply
| that the coordinates of the terrestrial fixed points used
| to compute the reference meridian never change under
| WGS84. Rebasing the coordinates of terrestrial fixed
| points prior to calculation disregards tectonic drift in
| the reference meridian calculation, it doesn't correct
| it. It is a noise reduction exercise to minimize the
| influence of plate tectonics on meridian drift. The
| meridian uses non-terrestrial fixed points too that don't
| have a concept of tectonic drift (but may introduce their
| own idiosyncratic sources of noise).
|
| Basically, these are corrections to their "fixed points"
| to make them behave more like actual fixed points in the
| reference meridian model. It doesn't eliminate tectonic
| drift effects when using coordinates in that spatial
| reference system.
| xucheng wrote:
| Can this be solved by storing a timestamp of the record along
| with precise GPS coordinates? Could we then utilize some
| database to compute the drift from then and now?
| haneefmubarak wrote:
| I mean, certainly - if you store both GPS time and derived
| coordinates from the same sampling, then you can always later
| interpret it as needed - whether relative to legal or
| geographical boundaries etc as you might want to interpret in
| the future.
| jandrewrogers wrote:
| Yes, in fact it should essentially be mandatory because the
| spatial reference system for GPS is not fixed to a point on
| Earth. This has become a major issue for old geospatial data
| sets in the US where no one remembered to record _when_ the
| coordinates were collected.
|
| To correct for these cases you need to be able to separately
| attribute drift vectors due to the spatial reference system,
| plate tectonics, and other geophysical phenomena. Without a
| timestamp that allows you to precisely subtract out the
| spatial reference system drift vector, the magnitude of the
| uncertainty is quite large.
| pavel_lishin wrote:
| Damn! 7cm per year feels blazing fast when you consider the
| fact that it's a whole continent.
| anotherevan wrote:
| We're coming for you!
| XorNot wrote:
| I mean I'm still mind blown that the Three Gorges dam in
| China literally changed the rotational speed of the Earth,
| and thus the length of the day.
| akst wrote:
| My knowledge of geospatial sets is fairly shallow, but I've
| worked a bit with Australian map data and I'm assuming are you
| referring to the different CRSs, GDA2020 and GDA1994?
|
| I'd imagine older coordinates would work with the earlier CRS?
|
| But I can understand not all coordinates specify their CRS.
| This have really been an issue for me personally, but I've
| mostly worked with NSW spatial and the Australian Bureau of
| statistics geodata.
| jandrewrogers wrote:
| Even accounting for tectonic drift, there is a concept of
| positioning reproducibility that is separate from precision. In
| general the precision of the measurements is much higher than
| the reproducibility of the same measurements. That is, you may
| be able to measure a fixed point on the Earth using an
| instrument with 1cm precision at a specific point in time but
| if you measure that same point every hour for a year with the
| same instrument, the disagreement across measurements will
| often be >10cm (sometimes much greater), which is much larger
| than e.g. tectonic drift effects.
|
| For this reason, many people use the reproducibility rather
| than instrument precision as the noise floor. It doesn't matter
| how precise an instrument you use if the "fixed point" you are
| measuring doesn't sit still relative to any spatial reference
| system you care to use.
| Robotbeat wrote:
| The whole accuracy vs precision thing.
| jandrewrogers wrote:
| Related but slightly different. The accuracy is real but it
| is only valid at a point in time. Consequently, you can
| have both high precision and high accuracy that nonetheless
| give different measurements depending on when the
| measurements were made.
|
| In most scientific and engineering domains, a high-
| precision, high-accuracy measurement is assumed to be
| reproducible.
| RainyDayTmrw wrote:
| This is one of many reasons why property surveying records use
| so many seemingly obscure or redundant points of reference. In
| case anyone wonders why modern property surveying isn't only
| recording lots of GPS coordinates.
| vintermann wrote:
| Genealogy applications run into this a lot. The person of
| interest lived at Engeset. FamilySearch has geocoded a place
| called "Engeset, More og Romsdal, Norway". So that's it, right?
| Not so fast, [there are at least 3 Engesets in More og Romsdal](h
| ttps://www.google.com/maps/search/Engeset/@62.3358577,6.225...).
|
| But that's at least better than when it's some local place name
| which it's never heard of, and thinks sounds most similar to a
| place in Afghanistan (this happens all the time).
|
| And to add to it, there are administrative regions, and
| ecclesiastical regions. Do you put them in the parish, or in the
| municipality? The birth in the parish and the baptism in the
| municipality, maybe? How about the burial then...
| modeless wrote:
| Converting from a name/address to coordinates is geocoding.
| Reverse geocoding is mapping from coordinates to a
| name/address.
| punnerud wrote:
| I created this to solve my own need for reverse geocoding:
| https://github.com/punnerud/rgcosm (Saving me thousands of $
| compared to Google API)
|
| Uses OpenStreetmap file, Python and SQLite3.
|
| First it finds all addresses using +/- like a square from
| lat/lon, then calculate distance based on the smaller list
| (Pythagoras), and pick the closest. It expands until a set
| maximum if no address is found in the first search.
| davidmurdoch wrote:
| Just curious if you looked into using S2 cells for this? It's
| what Pokemon Go uses for its coordinate system.
| http://s2geometry.io/devguide/s2cell_hierarchy.html
| punnerud wrote:
| Isn't the main purpose of S2 to be able to scan from
| different "directions"? More a purpose of Google Maps when
| viewing the world as a spherical object compared to Sqlite3
| just using a simple B-tree index on lat+lon?
| davidmurdoch wrote:
| The individual cells being sized and being able to easily
| to compute neighboring cells seems useful for the described
| algorithm. I haven't given it much thought on applicability
| here, but it sounded somewhat similar to a search pattern I
| once implemented within pgsql to locate items on a map that
| were within proximity of a given latlong.
| gmoore wrote:
| maybe the 'three words' model? Seems like it would be specific
| enough to locate a bench
| cjs_ac wrote:
| WhatThreeWords is a proprietary algorithm and has problems with
| homophones.
| Liquid_Fire wrote:
| That's essentially equivalent to coordinates since you still
| need to translate it to some human-understandable form, so it
| doesn't solve the problem.
| 1970-01-01 wrote:
| Yeah, that would absolutely work, but it's not free and
| donations may or may not cover their costs:
| https://accounts.what3words.com/select-plan
| sinuhe69 wrote:
| Not my area of expertise, but is this not a form of perfectionist
| problem? I mean, most places have a clear and simple address. For
| the rest, either a human can solve it, or we can make a few
| examples and let an AI do the work. We can go back to them later
| and revise them if we need to. Addresses don't change often, so I
| think things can stay the same for a long time.
|
| Except for emergency dispatch and a few high-profile use cases,
| you can have a good enough address to let the user find its
| neighbourhood. But they still have the GPS or other form of
| address coding, so they can find the exact location easily. I'd
| say 99.9% of the cases are like that. The rest can be solved
| quickly by looking at the map!
| edent wrote:
| I am deeply guilty of being a perfectionist!
|
| Ultimately, I just want something which is a nice balance
| between being useful for a human and not so long that it is
| overwhelming.
| curiousObject wrote:
| You're the author?
|
| The final step in the process "Wait for complaints" seems
| like a smart acceptance of the "perfect is the enemy of good"
| challenge
|
| Publish and be damned, or as we say now: Move fast and break
| things
| smitty1e wrote:
| I was going to take this tack.
|
| 80% of the problem is just transforming floating point
| coordinates into API calls.
|
| Getting to something useful with it is the hard 20%, and it
| will be a diminishing returns problem after that.
|
| While not anybody's LLM proponent, that last mile might be a
| good AI application.
| ryandrake wrote:
| You can call it perfectionism or you can call it "doing it
| right." I think this gets at a fundamental difference in
| philosophy among [software] engineers: We have a problem with a
| lot of edge cases, where a "good enough" solution can be done
| quickly. What do we do? There's a class of engineers who say 1.
| Do the "good enough" solution and ignore/error on the edge
| cases--we'll fix them later somehow (may or may not have an
| actual plan to do this). And there's a class of engineers who
| say 2. We cannot solve this problem correctly yet and need more
| research and better data.
|
| Unfortunately (in my view), group #1 is making all the products
| and is responsible for the majority of applications of
| technology that get deployed. Obviously this is the case
| because they will take on projects that group #2 cannot, and
| have no compunction against shipping them. And we can see the
| results with our eyes. Terrible software that constantly
| underestimates the number and frequency of these "edge cases"
| and defects. Terrible software that still requires the user to
| do legwork in many cases because the developers made an
| incorrect assumption or had bad input data.
|
| AI is making this problem even worse, because now we don't even
| know what the systems can and cannot do. LLMs
| nondeterministically fail in ways that sometimes can't even be
| directly corrected with code, and all engineering can do is
| stochastically fix defects by "training with better models."
|
| I don't know how we get out of this: Every company is
| understandably biased towards "doing now" rather than "waiting"
| to research more and make a better product, and the doers
| outcompete the researchers.
| sbarre wrote:
| > Unfortunately (in my view), group #1 is making all the
| products and is responsible for the majority of applications
| of technology that get deployed.
|
| This is an interesting take, and I think I see where you're
| coming from..
|
| My first thought on "why" is that so many products today are
| free to the user, meaning the money is made elsewhere, and so
| the experience presented to the user can be a lot more
| imperfect or non-exhaustive than it would otherwise have to
| be if someone was paying to use that experience.
|
| So edge cases can be ignored because really you're looking
| for a critical mass of eyeballs to sell to advertisers or to
| harvest usage data from, etc.. If a small portion of your
| users has a bad time or experiences errors, well, you get
| what you pay for as they say..
|
| And does that kind of pervasiveness now mean that many
| engineers think this is just the way to go no matter what?
| jandrewrogers wrote:
| The update rate for a global map data model, all of which are
| still woefully incomplete in many contexts, is surprisingly
| high. The territory underlying the map is a lot less static
| than people assume. Also, local reality is often much less
| "regular" than people assume such that a person really can't
| figure it out reliably. Currently there are literally thousands
| of people tasked with incorporating these changes because it
| has proven to be resistant to automation thus far due to the
| pervasiveness of edge cases. For your basic global map data
| model, these are the edge cases that are left _after_ several
| thousand heuristic and empirically derived rules have been
| applied.
|
| It is a deeply complex data model that changes millions of
| times a day in unpredictable ways. Unfortunately, many
| applications are very sensitive to the local accuracy of the
| model, which is much higher variance than average accuracy.
| Only trying to be "good enough" in an 80/20 rule sense is the
| same as "broken". The updates are also noisy and often contain
| errors, so the process has to be resilient to those errors.
|
| The resistance of the problem to automation and the high rate
| of change has made it extremely expensive to asymptotically
| converge on model with consistently acceptable accuracy for the
| vast majority of applications.
| mootothemax wrote:
| > most places have a clear and simple address
|
| That depends on your definition of "clear and simple" and
| "address" :) While a lot boils down to use case - are you
| trying to navigate somewhere, or link a string to an address? -
| even figuring out _what_ is an address can be hard work. Is an
| address the entrance to a building? Or a building that accepts
| postal deliveries? Is the "shell" of a building that contains
| a bunch of flats/apartments but doesn't itself have a postal
| delivery point or bills registered directly to it an address?
| How about the address the a location was known as 1 year ago? 2
| years ago? 10 years ago?
|
| Park and other public spaces can be fun; they may have many
| local names that are completely different to the "official"
| name - and it's a big "if" whether an official name exists at
| all. Heck, most _roads_ have a bunch of official names that are
| anything but the names people refer to them as. I have a
| screaming obsession with the road directly in front of
| Buckingham Palace that, despite what you see on Google Maps, is
| registered as "unnamed road" in all of the official sources.
|
| > Addresses don't change often
|
| At the individual level, perhaps. In aggregate? Addresses
| change all the time, sometimes unrecognisably so. City and town
| boundaries are forever expanding and contracting, and the
| borders between countries are hardly static either (and if
| you're ever near the Netherlands / Belgium border, make a quick
| trip to Baarle-Hertog and enjoy the full madness). Thanks to
| intercontinental relative movement, the coordinates we log
| against locations have a limited shelf life too. All of the
| things I used to think were certain...
|
| If someone hasn't done "faleshoods programmers believe about
| addresses," I think its time might be now!
|
| Edit: answering myself with
| https://www.mjt.me.uk/posts/falsehoods-programmers-believe-a...
| dadadad100 wrote:
| This problem seems to also exist for services like uber. Their
| solution seems easier, drop a pin on a map. Perhaps working so
| hard to find a textual description is missing the simpler
| solution.
| blacklight wrote:
| As the developer of a GPS tracking app that relies a lot on
| OpenStreetMap, I've faced many of these problems myself. A couple
| of learned lessons/insights:
|
| - I avoid relying on any generic location name/description
| provided by these APIs. Always prefer structured data whenever
| possible, and build the locality name from those components
| (bonus points if you let the user specify a custom format).
|
| - Identifying those components itself is tricky. As the author
| mentioned, there are countries that have states, others that have
| regions, other that have counties, or districts, or any
| combination of those. And there are cities that have suburbs,
| neighbourhoods, municipalities, or any combination. Oh, and let's
| not even get started with address names - house numbers?
| extensions? localization variants - e.g. even the same API may
| sometimes return "Marrakesh" and sometimes "Marrakech"? and how
| about places like India where nearby amenities are commonly used
| instead of house numbers? I'm not aware of any public APIs out
| there that provide these "expected" taxonomies, preferably from
| lat/long input, but I'd love to be proven wrong. In the absence
| of that, I would suggest that is better to avoid double-guessing
| - unless your software is only intended to run in a specific
| country, or in a limited number of countries and you can afford
| to hardcode those rules. It's probably a good option to provide a
| sensible default, and then let the user override it. Oh, and good
| catch about abbreviations - I'd say to avoid them unless the user
| explicitly enables them, if you want to avoid the "does everybody
| know that IL is Illinois?" problem. Just use "Illinois" instead,
| at least by default.
|
| - Localization of addresses is a tricky problem only on the
| surface. My proposed approach is that, again, the user is king.
| Provide English by default (unless you want to launch your
| software in a specific country), and let the user override the
| localization. I feel like the Nominatim's API approach is
| probably the cleanest: honor the `Accept-Language` HTTP header if
| available, and if not available, fallback to English. And then
| just expose that a setting to the user.
|
| - Bounding boxes/polygons can help a lot with solving the
| proximity/perimeter issue. But they aren't always
| present/sufficiently accurate in OSM data. And their proper usage
| usually requires the client's code to run some non-trivial
| lat/long geometry processing code, even to answer trivial
| questions such as "is this point inside of this enclosed
| amenity?" Oh, and let's not even get started with the "what's the
| exact lat/long of this address?" problem. Is it the entrance of
| the park? The middle of it? I remember that when I worked with
| the Bing in the API in the past they provided more granular
| information at the level of rooftop location, entrance location
| etc.
|
| - Providing localization information for public benches isn't
| what I'd call an orthodox use-case for geo software, so I'm not
| entirely sure of how to solve the "why doesn't everything have an
| address?" problem :)
| nerdralph wrote:
| Part of the problem is the different ways addresses are expressed
| throughout the world. I was born and grew up in Canada, and was
| confused when I started dealing with companies in China. Instead
| of street addresses, many are given by province, city, district,
| sub-district, and a building number.
|
| Another problem is choosing which authority for the "correct"
| address. I've seen many cases where the official postal address
| city/town name is different than the 911 database. For example
| Canada Post will say some street addresses are in Dartmouth,
| while the official civic address is really Cole Harbour.
| https://www.canadapost-postescanada.ca/ac/
| https://nsgi.novascotia.ca/civic-address-finder/
|
| Even streets can have multiple official names/aliases. People who
| live on "East Bay Hwy", also live on "Highway 4", which is an
| alias.
| andrew_eu wrote:
| I have a memorable reverse geocoding story.
|
| I was working with a team that was wrapping up a period of many
| different projects (including a reverse geocoding service) and
| adopting one major system to design and maintain. The handover
| was set to be after the new year holidays and the receiving teams
| had their own exciting rewrites planned. I was on call the last
| week of the year and got an alert that sales were halted in
| Taiwan due to some country code issue and our system seemed at
| fault. The customer facing application used an address to
| determine all sorts of personalization stuff: what products
| they're shown, regulatory links, etc. Our system was essentially
| a wrapper around Google Maps' reverse geocoding API, building in
| some business logic on top of the results.
|
| That morning, at 3am, the API stopped serving the country code
| for queries of Kinmen County. It would keep the rest of the
| address the same, but just omit the country code, totally
| botching assumptions downstream. Google Maps seemingly realized
| all of a sudden what strait the island was in, and silently
| removed what some people dispute.
|
| Everyone else on the team was on holiday and I couldn't feasibly
| get a review for any major mitigations (e.g. switching to OSM or
| some other provider). So I drew a simple polygon around the
| island, wrote a small function to check if the given coordinates
| were in the polygon, and shipped the hotfix. Happily, the whole
| reverse geocoding system was scrapped with a replacement by
| February.
| modeless wrote:
| Wow, I had no idea that Taiwan controlled an island less than
| three miles from mainland China, essentially surrounded by
| China in a bay. (The main island is 80+ miles away.) I'm really
| surprised China has allowed that for 80 years. Unsurprisingly,
| the beach looks like this:
| https://www.google.com/maps/place/Shuang+Kou+Zhan+Dou+Cun/@2...
|
| Also interesting that there's a Japanese island only 60 miles
| from Taiwan on the other side. I guess claims to small Pacific
| islands have been weird for a long time.
| nradov wrote:
| If the Chinese Communist Party decides to escalate the
| pressure on Taiwan then one likely scenario is some sort of
| blockade against those small islands close to the mainland.
| mvdtnz wrote:
| I dealt with this exact issue and went with that exact solution
| in my browser based geography game[0].
|
| What the author is looking for is administrative divisions and
| boundaries[1], in particular probably down to level 3 which is
| the depth my game goes to. These differ in size greatly by
| country. With admin boundaries you need to accept there is no
| one-size-fits-all solution and embrace the quirks of the
| different countries.
|
| For my game I downloaded a complete database of global admin
| boundaries[2] and imported them into PostgreSQL for lightning
| fast querying using PostGIS.
|
| [0] https://guesshole.com
|
| [1]
| https://en.wikipedia.org/wiki/List_of_administrative_divisio...
|
| [2] https://gadm.org/data.html
| jillesvangurp wrote:
| This is a while ago but about 12 years ago I experimented with
| putting the whole of openstreetmap into Elasticsearch.
|
| Reverse geocoding then becomes a problem of figuring out which
| polygons contain the point with a simple query and which
| POIs/streets/etc. are closest based on perpendicular distance.
| For that, I simply did a radius search and some post processing
| on any street segments. Probably not perfect for everything. But
| it worked well enough. My goal was actually being able to group
| things by neighborhood and microneighborhoods (e.g. squares,
| nightlife areas, etc.).
|
| This should work well enough with anything that allows for
| geospatial queries. In a pinch you can use geohashes (I actually
| did this because geospatial search was still a bit experimental
| in ES).
| 1970-01-01 wrote:
| >But how do you go from a string of digits to something human
| readable?
|
| Hasn't What3Words already solved this?
| robin_reala wrote:
| It's a closed and proprietary system that contains locations
| like master.beats.slave near Brooklyn and whites.power.life in
| Washington (to pick just a couple of examples off the top of my
| head). If your only requirement is "human readable" and assumes
| the human knows how to read and pronounce English words then I
| guess it kinda does.
| 1970-01-01 wrote:
| >and assumes the human knows how to read and pronounce
| English words
|
| I hit a dozen random benches. If these are the low
| requirements, W3W would and does do a great job for
| openbenches: https://openbenches.org/bench/random/
| cmcconomy wrote:
| so would google's plus-codes,
|
| https://maps.google.com/pluscodes/
| mootothemax wrote:
| Yeesh slave.trade.market too!
| osmanscam wrote:
| you can use https://map.name for reverse geocoding
| kylecazar wrote:
| Good article. FWIW, some major cities offer seating data. New
| York, for example, returns bench locations as a Point
| (coordinates). They even have a column in the data for the
| nearest address of the "seating feature".
|
| https://data.cityofnewyork.us/Transportation/Seating-Locatio...
___________________________________________________________________
(page generated 2025-04-27 23:00 UTC)