[HN Gopher] Stop using zip codes for geospatial analysis (2019)
___________________________________________________________________
Stop using zip codes for geospatial analysis (2019)
Author : voxadam
Score : 137 points
Date : 2025-02-07 16:46 UTC (6 hours ago)
(HTM) web link (carto.com)
(TXT) w3m dump (carto.com)
| hammock wrote:
| Great article. Zip codes can be super expedient. But you have to
| be self aware that for many uses cases they function WORSE than a
| random grid. Because they have built-in aggregation of a central
| post office(and surrounding) with a certain radius of rural/less
| dense surrounding.
|
| So for example, if you are sorting "rural zips" vs "urban zips"
| it will only take you so far, and may actually be harmful.
|
| Same goes with MSAs/DMAs (media markets). These have to be used
| for buying media, but for geospatial analysis they are suboptimal
| for the same reasons.
|
| Easiest way to dip your toe into the water of something better is
| to start with A-D census counties.
| jihadjihad wrote:
| To put it in plain mathematical language, ZIP codes are not
| defined as polygons [0]. The consequence is that performing any
| analysis with an assumption that ZIP codes are polygons is bound
| to be error-prone.
|
| 0: https://manifold.net/doc/mfd8/zip_codes_are_not_areas.htm
| mholt wrote:
| Yeah. ZIP codes are sets in the abstract-dimensional space of
| carrier delivery points. I suppose you could think of them as
| lines, but definitely not polygons.
| cogman10 wrote:
| Zip codes (in the US) are machine readable numbers a mail
| sorter can use to send a parcel to the right delivery truck
| for final delivery. In the US, they represent the hierarchy
| of postal centers with the most significant digit
| representing the primary hub for a region and the smallest
| number the actual post office that will be in charge of
| delivering the letter (or truck if you do the extended post
| code).
|
| They don't represent geography at all, they represent the
| organizational structure of USPS.
|
| They work by making the address on a letter almost
| meaningless. For some smaller population zip codes you can
| practically just put the name and zip code down and achieve
| delivery.
| Spivak wrote:
| Right but this ends up being a good approximation for
| geography because the reality of logistics is that you end
| up doing a cute n-ary search of the geography. When you
| know the regional hub you can say for certain a huge chunk
| of the US the zip code doesn't represent. And then you keep
| n-secting. Sometimes the land-mass you get at the end is
| specific enough for your uses.
|
| You're not going to wind up with a situation where zip
| codes with the same regional marker end up on different
| coasts.
| mattforrest wrote:
| Just use a spatial query. That's what they are made for.
| makeitdouble wrote:
| > You're not going to wind up with a situation where zip
| codes with the same regional marker end up on different
| coasts.
|
| Couldn't this happen for military or proxy codes (PO
| boxes or other) ?
| alsodumb wrote:
| I agree that they weren't explicitly meant to represent
| geography, but implicitly they do, right? Are there cases
| where this is violated?
|
| In other words, is it safe to assume that for entity in a
| zip code is less than x distance away from the closest
| entity in the same zip code?
| freyfogle wrote:
| it is safe to assume nothing.
|
| Please see: https://opencagedata.com/guides/how-to-think-
| about-postcodes...
|
| I write this as someone who grew up in the ZIP code 09180
| makeitdouble wrote:
| It might be true, but does it help if the x varies from
| "on a nearby mountain" to "within a street block", and
| you sometimes have every habitants closer to another zip
| code than theirs ?
| mywittyname wrote:
| > For some smaller population zip codes you can practically
| just put the name and zip code down and achieve delivery.
|
| A 5+4 formatted ZIP code maps to just a handful of
| addresses. In cities with larger populations, the +4 could
| map to a single building, and in more sparely populated
| place, it might include houses on a handful of roads.
|
| For smaller datasets, ZIP+4 might as well be a unique
| household identifier. I just checked a 10 million address
| database and 60% of entries had a unique ZIP+4, so one
| other bit of PII would be enough to be a 99.99% unique
| identifier per person.
|
| With a geo-coded ZIP+4 database, you could locate people
| with a precision that's proportional to the population
| density of their region.
| mattforrest wrote:
| Yeah but we have that already in the census hierarchy.
| Plus you have to pay to access Zip+4 geospatial data and
| it changes sometime as frequently as quarterly
| mattforrest wrote:
| Well put
| mcphage wrote:
| > The consequence is that performing any analysis with an
| assumption that ZIP codes are polygons is bound to be error-
| prone.
|
| Yeah, but any analysis you're likely to perform is approximate
| enough that the fact that ZIP codes aren't polygons is
| basically a rounding error.
|
| Plus, it's a lot easier to get ZIP codes, and they're more
| reliably correct, so you might still get better results, than
| you would going with another indicator that is either (a) less
| reliable or (b) less available.
| mattforrest wrote:
| They aren't reliable correct actually. The boundaries that
| the Census publishes are called Zip Code Tabulation Areas
| which are approximations of zip codes and include overlaps.
| wombatpm wrote:
| ZCTA5 roughly corresponds to the area of a 5 digit zip
| code. Problem is there are large areas of the west that
| don't have permanent residents and no mail delivery. Plus
| they change over time.
| mattforrest wrote:
| Funny to see this one pop up today (I wrote this one way back
| when) but I just refreshed it into a video on my channel:
| https://www.youtube.com/watch?v=x-opv4REEic
| nancyp wrote:
| Instead of zip use the following?
|
| Use Addresses Use Census Units Use your own Spatial Index
|
| Why not lat, long?
| ajfriend wrote:
| It depends on if you want to model a point or an area. lat/lng
| gives you a _point_ , but you often want an _area_ to, for
| example, count how many people are in that area. A spatial
| index like H3 provides a grid of area units.
| HappMacDonald wrote:
| But so do lat long ranges.
| ajfriend wrote:
| You can use those if they work for your application. One
| downside would be that you're storing 4 numbers compared to
| a single `int64` index with H3.
|
| You also have to decide how you'll do that binning. Can
| bins overlap? What do you do at the poles? H3 provides some
| reasonable default choices for you so don't have to worry
| about that part of your solution design.
| ww520 wrote:
| Lat/lon is in a spherical coordinate. It's more complicated to
| do calculation.
|
| Btw. I have a need recently to compute the shortest distance
| from a point to a line defined by two points, all in lat/lon.
| Anyone has any lead on how to do it?
| ajfriend wrote:
| ...and use H3 instead! https://h3geo.org/
| sbrother wrote:
| Very different use case -- ZIPs/ZCTAs have some semblance of
| population normalization
| mattforrest wrote:
| Not necessarily true. The population isn't balanced at all
| between many. Census units are.
| ellisv wrote:
| Absolutely this. Use other Census areal units if you can
| and ZCTAs only if you have to.
| ajfriend wrote:
| If you care about that and have a data source, you can add,
| for example, population density per H3 cell as part of your
| analysis. That has the additional benefit of denoting the
| this quantity of interest _explicitly_ , rather than some
| implicitly assumed correlation which may not be true.
| ingenieroariel wrote:
| Hey AJ, this is almost on topic, do you know of a more up
| to date version of the dataset you used on the blog post
| release for H3 v4.0.0 [1]? They stopped updating in Oct
| 2023. Thanks! [1] https://data.humdata.org/dataset/kontur-
| population-dataset
| ajfriend wrote:
| I don't. And maybe I should have emphasized "and have a
| data source" more, since its doing a lot of the heavy-
| lifting in my statement :)
| diggan wrote:
| What H3 do I belong to if my house is split between three
| different ones, pretty much equally? Any/all of them?
| maxmouchet wrote:
| You take a smaller H3 :-) The maximum area of a resolution 15
| H3 is 1 square meter, so unlikely to split a house in two.
| hammock wrote:
| What is the benefit of H3 over a rectangular grid?
| jpjoi wrote:
| Zip codes are just weird to use for anything other than mail in
| general because they're set up based off infrastructure.
|
| CGP Grey has a great video on this:
| https://m.youtube.com/watch?v=1K5oDtVAYzk
| diggan wrote:
| I've noticed more and more super/hypermarkets started asking
| for your zip/postal code sometime during self-checkout. I'm
| guessing they use these as approximations about where people
| travel from, so they can evaluate if to open more stores closer
| to popular areas, or something like that. Pretty sure there is
| more use cases for postal codes too.
| paraboli wrote:
| They use bulk mail to send out flyers, coupons, and can use
| zip codes to AB test these.
| kjellsbells wrote:
| Postcodes are very useful (but not perfect) proxies for
| household socioeconomic status, which is useful for marketing
| and sales analysis.
|
| That data linked with the payment method that the register
| collects pretty much gives the store exactly who you are and
| where you live even if you chose not to sign up to the
| store's loyalty program.
| Spivak wrote:
| Wait until you find out that this is the same way phones used
| to work. The number was the row/colum for the operator needed
| to plug your line into.
| paulddraper wrote:
| People use ZIP codes because they have ZIP codes.
|
| No one has census blocks.
|
| And coordinates can work but lack some inherent advantages, such
| as human readability and a semblance of pop density
| normalization.
| funkaster wrote:
| If you want to learn a bit more, there was a recent, really good
| Planet Money episode[1] about this exact same topic. They focus
| on the problems that you might face when using zip code for
| demographic analysis.
|
| [1]: https://www.npr.org/2025/01/08/1223466587/zip-code-history
| throw0101c wrote:
| CGP Grey recently posted a video on Zip codes, "The Hidden
| Pattern in Post Codes":
|
| * https://www.youtube.com/watch?v=1K5oDtVAYzk
| Cthulhu_ wrote:
| That's what I was thinking of earlier, the succinct version is
| "your address is where mail needs to go, the zip code is how to
| get it there". Or in other words, the zip code is the
| address(es) of the sorting centers and post offices to the
| destination.
| jonas21 wrote:
| ZIP codes are an emergent property of the mail delivery system.
| While the author might consider this a bad thing, this makes them
| "good enough" on multiple axes in practice. They tend to be:
|
| - Well-known (everybody knows their zip code)
|
| - Easily extracted (they're part of every address, no geocoding
| required)
|
| - Uniform-enough (not perfect, but in most cases close)
|
| - Granular-enough
|
| - Contiguous-enough by travel time
|
| Notably, the alternatives the author proposes all fail on one or
| more of these:
|
| - Census units: almost nobody knows what census tract they live
| in, and it can be non-trivial to map from address to tract
|
| - Spatial cells: uneven distribution of population, and arbitrary
| division of space (boundaries pass right through buildings), and
| definitely nobody knows what S2 or H3 cell they live in.
|
| - Address: this option doesn't even make sense. Yes, you can
| geocode addresses, but you still need to aggregate by something.
| mattforrest wrote:
| Well you hit on all the points that discuss the compromises
| that zip codes offer. Just because you have them in your data
| doesn't mean that they can produce anything useful. You are
| correct that no one knows their census unit is (if you are
| thinking from someone entering this on a website) but
| collecting location or address will be a lot better.
|
| Fact is a lot of web data contains a zip but if you can collect
| something better it will usually render better results. Unless
| you are analyzing shipments then that is fine.
| ellisv wrote:
| There are point process models, but, yes, its much more common
| to want to aggregate to a spatial area.
|
| Another consideration is what kind of reference information is
| available at different spatial units. There are plenty of
| Census Bureau data available by ZCTA but some data may only be
| available at other aggregate units. Zip Codes are often used as
| political boundaries.
|
| I'd also mention the "best" areal unit depends on the data.
| There is a well known phenomenon called the modifiable areal
| unit problem in which spatial effects appear and vanish at
| different spatial resolutions. It can sort of be thought of as
| a spatial variation of the ecological fallacy.
| ericrallen wrote:
| This is a tangent, but addresses are also way more complicated
| than most people realize - especially if you're relying on a
| user to input a correct address or if you need to support
| multiple countries, somewhere with unique addresses like
| Queens[0], or you need to differentiate between units of a
| specific street address that uses something other than unit
| numbers for a unit designation.
|
| At that point you need something like Smarty[1] to validate and
| parse addresses.
|
| [0]: https://stackoverflow.com/questions/2783155/how-to-
| distingui...
|
| [1]: https://www.smarty.com/
| nitwit005 wrote:
| Yes, unfortunately, their assertion that everyone knows their
| zip code is wrong. People often write a neighboring code, and
| the post office just delivers it.
|
| Similar issues for city name, of course.
| VWWHFSfQ wrote:
| Very common in NYC. People will use all of "New York, NY",
| "Queens, NY", or "Astoria, NY" all interchangeably and the
| post office will still just deliver it to the same place.
| steezeburger wrote:
| This sounds like the person doesn't know the receiver's zip
| code. Why are you extending that to not knowing their own
| zip code? Are they mailing something to themselves?
| toast0 wrote:
| People often give out their mailing address, and may be
| misinformed about their zip code.
|
| If you get close enough, it usually gets handled in the
| local sort, but not always.
|
| On cities, the mailing address city really is the name of
| the post office that handles your delivery route. Often
| there's a relationship with the city you live in, but
| there's cases both ways --- I used to live outside city
| limits, we had a census designated place name, a
| municipal sanitary district and had a fire department at
| one time... but never a post office, so our mailing
| address used the nearby city name, where our post office
| resided. The place name had an incorporated city on the
| other side of the state, so using that wouldn't be great.
|
| Nowadays, post offices often have a list of alternative
| place names, so where I live now, I can pick between the
| incorporated city name, the nearby large city where a
| post office that processes all my mail is located, or any
| of the numerous small post offices that once served my
| city.
| paulddraper wrote:
| They know their ZIP code far, far better than any other
| plausible geographic cell.
| rented_mule wrote:
| An annoyance for me is that I've yet to see any address
| validator get my current home address right. They all insist
| my address is on the road that leads to my road rather than
| my actual road. It's understandable that they can't be 100%
| accurate given the scale / complexity of addresses.
|
| Most sites/apps will let me override the validator, but a few
| won't. The most common ones that insist on using the wrong
| address are financial institutions that say the law requires
| them to have my proper physical address and therefore they go
| with the (incorrectly) validated version.
|
| USPS does not do home delivery in our area, and
| UPS/FedEx/etc. usually figure it out given that street
| numbers alone uniquely identify properties in our town.
| killjoywashere wrote:
| Same! My wife ran a business from home during the pandemic
| and we actually went through the effort to work with Google
| Maps (they called us) to get it on the map. And of course
| USPS has no problem. But our address was originally a
| federal building with a letter, still only has a letter, no
| number, and there are now all sorts of work-arounds
| floating around on how resolve addresses in our
| neighborhood. What's wild is the Post Office is literally
| down the street from our house, and our house predates the
| founding of most of the big delivery services, which all
| manage to deliver to us, given their preferred incantation.
| If I can't get the shipper to pass the right incantation to
| their shipping service, shenanigans ensue. My (least?)
| favorite was an item that went across the Pacific Ocean 3
| times over the course of 3 months.
| ghaff wrote:
| Just last week I had to deal with the fact that my house has
| the wrong address in multiple databases because things
| changed when an interstate went in 40-something years ago.
| It's not a big change--main st. vs N main st. but it was
| enough to mess up various things. Not as much as when I moved
| in 30 years ago but still enough to be wrong in old town and
| telco records. Took me a couple of days to get a permit
| issued to get electrical hooked back up after a fire as a
| result because apparently some town clerk insisted the
| address wasn't valid.
| bob1029 wrote:
| Addresses are a huge ordeal in banking. Easily one of the
| most tortured domain types when it comes to edge cases and
| integration pain.
|
| Every customer I've worked with insisted on having all
| addresses ran through the USPS verification API so they could
| get their bulk mailing discounts.
|
| Even if you get the delivery/cost side under control, you
| still have to make sure you are talking about the right
| address from a logical perspective. Mailing, physical,
| seasonal, etc. address types add a whole extra dimension of
| fun.
| o11c wrote:
| Also, "use a different grid" is only masking the problem, not
| actually fixing it.
|
| The real problem is _ever_ using an average without also
| specifying some sort of bounds. For median-based data, this
| probably means the upper and lower quartiles (or possibly other
| percentiles); for mean-based data, this probably means standard
| deviation.
| JumpCrisscross wrote:
| Would add that there are network effects with zip code data. If
| you collect H2 data, you have fewer sources with which to join.
| walrus01 wrote:
| In terms of "good enough", a Canadian postal code, broadly
| equivalent to a zip code, is much more granular and can often
| identify an individual apartment building, or single city
| block. Plenty of large office buildings in major Canadian
| cities also have their own postal code.
|
| The functionality of it is closer to the "Zip+4" with extension
| used to have a more granular routing of physical mail for USPS.
|
| https://www.canadapost-postescanada.ca/cpc/en/support/articl...
|
| https://en.wikipedia.org/wiki/Postal_codes_in_Canada
| ssl-3 wrote:
| Sure, and in the States, ZIP+4 could once nail my postal
| location to a subset of 4 (of a group of 16) mailboxes within
| a particular set of entry doors on a particular apartment
| building.
|
| But broadly speaking, nobody knows what their ZIP+4 is, while
| I imagine that most people in Canada know their postal code
| by heart.
|
| It is interesting.
| bluGill wrote:
| The plus four changes all the time so it isn't feasable to
| know it. The use is large mailers can get a discount by
| looking it up and presorting mail. If the mail coming into
| my post office has my mail next to my next door neighbors
| that saves them a lot of time.
| kstrauser wrote:
| Is that still true? I would imagine any reasonably modern
| computer could map every physical address in a huge
| region to a (route number, stop number) pair. I wouldn't
| think the +4 would add a lot of value anymore.
| mattforrest wrote:
| Yeah but Zip+4 represent a collection of houses not a polygon
| so not useful for aggregations or statistical work
| throw0101c wrote:
| > _In terms of "good enough", a Canadian postal code, broadly
| equivalent to a zip code, is much more granular and can often
| identify an individual apartment building, or single city
| block._
|
| To the point that StatCan and other agencies have rules on
| the number of characters that are collected/disseminated with
| other data to make sure it's not too identifying:
|
| * https://www.canada.ca/en/government/system/digital-
| governmen...
|
| * https://www12.statcan.gc.ca/nhs-enm/2011/ref/DQ-
| QD/guide_2-e...
| hinkley wrote:
| Contiguous enough by data travel time as well. A few people
| will get 5 ms more latency than the exact optimal route, but
| it's not like your routes are exactly optimal anyway.
|
| And don't forget sales tax. Which is state + county + city
| kstrauser wrote:
| ... + special entertainment district + business renovation
| area + exception + exception + exception + ...
| michaelmrose wrote:
| If you are worrying about address at all instead of tax or
| legal jurisdiction its probable that you as a business have a
| physical presence. You can probably correlate better by
| predicting which location a given address would likely interact
| with if you don't know already by prior purchases/interaction
| which they normally do so. I would suggest actual purchase data
| followed by travel time.
|
| Zip and distance as the crow flies often gives shit data. My
| zip suggests I'm off in bum fuck and since I'm on the puget
| sound things that are relatively near as the crow flies can
| actually be hours away.
| killjoywashere wrote:
| ZIPs are also specifically used in a variety of medical,
| epidemiologic, public health contexts and HHS has explicit,
| fairly fine-grained rules on their use:
| https://www.hhs.gov/hipaa/for-professionals/special-topics/d...
| raphman wrote:
| One more advantage: ZIP codes are a good trade-off if you want
| to gather anonymous data in a survey or provide anonymized data
| to an outside entity. For example, we recently conducted a
| survey on mobility patterns within our university. To offer
| respondents a reasonable amount of anonymity, we just asked for
| their (German) ZIP code and the location of their primary
| workplace. This allows us to determine the distance and
| approximate route people would take between home and university
| campus - to a degree that is sufficient for our goals.
| trgn wrote:
| First the mercator projection, now they're coming after the zip
| codes.
| serjester wrote:
| H3 is awesome here! What I don't think many people realize is
| that H3 cells and normal geographic data (like zips) are not
| mutually exclusive. You can take zip outlines, and find all the
| h3 cells within them and allocate your metric accordingly
| (population, income, etc).
|
| This makes joining disparate data sources quite easy. And this
| also lets you do all sorts of cool stuff like aggregations,
| smoothing, flow modeling, etc.
|
| We do some geospatial stuff and I wrote a polars plugin to help
| with this a while back [1].
|
| [1] https://github.com/Filimoa/polars-h3
| hammock wrote:
| What is the benefit of H3 vs a rectangular grid?
| kylebarron wrote:
| Equal distances to each adjacent neighbor:
| https://www.uber.com/blog/h3/
| ajfriend wrote:
| They also only have one _type_ of neighbor. Square grids
| have 2 neighbor types. Triangular grids have 3.
| hammock wrote:
| Makes perfect sense. Thanks both
| agtech_andy wrote:
| Zip codes are great for anything with delivery logistics.
|
| Anything else is a loose correlation at best, that will likely
| change over time.
| PLenz wrote:
| I gave a talk at DataEngConf many years ago:
| https://www.datacouncil.ai/talks/zip-codes-and-other-lies-yo...
| zuhayeer wrote:
| This is interesting since zip codes came up in consideration for
| how we built out our pay choropleth map in the US:
| https://levels.fyi/heatmap
|
| Though ultimately it was far too granular (for example the Bay
| Area would be so many different zip codes). Instead we went with
| Nielsen's DMA (Designated Market Area) mappings within the US to
| abstract aggregated data a bit better. And of course this DMA
| dataset also had a different original use case. It was used for
| TV / media market surveys so it has some weird vestiges. Some
| regions are grouped very far and wide (you'll notice there's a
| bit of Denver within Nevada and its just a remnant of how it used
| to be categorized), but it still provides a bit of a broader
| level grouping than something acute like zip code.
|
| I do like this map from the article though and the granularity
| you can get with zip code when zooming:
| https://clausa.app.carto.com/map/29fd0873-64cb-42a6-a90d-c83...
|
| We've also been considering using Combined Statistical Areas
| using population instead. This is something that is under way,
| and in the interim we've considered charting styles that don't
| necessarily need borders (for example this bubble map:
| https://www.levels.fyi/bubble-plot/europe/). The benefit with
| DMAs is that it offers full border coverage of the entire US
| whereas some hubs can still be missing from CSAs if relying on a
| population threshold. But the plan is to create some of our own
| regional definitions and borders using our own submissions
| combined with population. Will be an interesting project.
|
| GeoJSON data for the map borders:
| https://github.com/PublicaMundi/MappingAPI/blob/master/data/...
|
| Nielsen DMA regions:
| https://blocks.roadtolarissa.com/simzou/6459889
| dhunter_mn wrote:
| I used to work for a company that basically merged USPS and
| Census Bureau data on a monthly basis. The output would be a
| roadbase that was optimized for address ranges on road segments.
| ZIP Codes were extra fun to work with.
| eterevsky wrote:
| ZIP codes are a simple approximation, which does their job good
| enough in most cases.
|
| The alternatives that the author suggests are much more
| complicated, both in terms of the implementation and in terms of
| convincing the user to give you their full address.
| ej1 wrote:
| [flagged]
| dang wrote:
| Please stop. Automated comments are not allowed here.
| Zamicol wrote:
| I wrote the blackout system for Comcast TV scheduling. My
| understanding was that blackouts were used mostly for sports
| where games need to be available in one area and not others.
| Contractually, they were required to use zip codes, so I used the
| US Post office's zip code data to enforce blackouts.
| lacoolj wrote:
| For anyone curious, here is the official US Gov list of ZIP codes
| in CSV with lots of helpful related data (longitude, latitude,
| etc.)
|
| http://federalgovernmentzipcodes.us/free-zipcode-database-Pr...
| yellowbkpk wrote:
| There is no "official US Gov list of ZIP codes". They come from
| the US Postal Service, and those aren't published for free.
| ivell wrote:
| India is experimenting with Digipin
| https://www.indiapost.gov.in/Navigation_Documents/Static_Nav...
|
| Which is derived from longitude and latitude..
| extraduder_ire wrote:
| I've seen a few attempts like this, like loc8 and google's plus
| codes. Is there any advantage to Digipin over existing
| solutions other than avoiding splitting major cities into very
| different codes? None stood out to me from that document. The
| description is written pretty well.
|
| Always sad when these schemes don't include a check digit in
| them though, even if the layout of this one gets typo'd codes
| pretty close to their intended destination.
| mannyv wrote:
| Zip codes, zctas, and tiger/line are good enough for what most
| people need. Maybe you can find an edge by using something more
| granular...but I'm not sure what edge you'd be looking to get
| with geodata. Maybe for real estate trends and/or market
| analysis?
| clutchdude wrote:
| I agree.
|
| Reading their alternatives, it strikes me with "ZCTA's are the
| worst form of small area aggregation except for all others."
|
| Its not a great geography to use but it is _quite_ useful if
| you know it 's limitations and inaccuracies when you get into
| it. Stuff like multipolygon entities, island-polys, etc aren't
| fun to resolve but can be accounted for.
|
| Add on that ZCTA's will historically follow some sort of actual
| boundary(rivers/highways/etc) they can tell a story in a way
| Census tracts can't.
| mmmlinux wrote:
| Can anyone tell me why I have to enter both my city / state and a
| zip code. shouldn't one or the other of those plus my street
| address be enough information?
| ubermonkey wrote:
| Web devs not using a good library that will populate the former
| from the latter?
| jayknight wrote:
| Some libraries will insist that my address is in a different
| city because my zip code spans the border. I mean if my mail
| has the other city it still gets to me, but for anything
| other than mail, they now have the wrong city for me.
| mmmlinux wrote:
| Does it matter if the "city" is wrong if your street
| address + zip code is unique?
| jayknight wrote:
| It depends on what they're doing with it. But mostly
| probably not.
| sophacles wrote:
| Several posts in this thread have linked the recent GCP-gray
| video on the topic, and it addresses this question better than
| I can. It's pretty interesting actually
| ubermonkey wrote:
| I'm reminded of this:
|
| https://www.npr.org/2004/04/01/1805651/post-office-calls-for...
| freyfogle wrote:
| There are many problems with zip codes / postal codes but the
| biggest two we see are:
|
| a. Excel treats them as numbers instead of strings of digits and
| thus drops the leading 0
|
| b. Developers make assumptions about postal codes based on how
| they work (or more usually how the developer incorrectly thinks
| they work) in their own country and these assumptions absolutely
| do NOT hold in other countries.
|
| A relevant guide to geocoding and postal codes:
| https://opencagedata.com/guides/how-to-think-about-postcodes...
| paganel wrote:
| Also, "everybody" knows their zip-code/postal-code is mostly an
| American/British thing, I still remember my British former boss
| asking me about my zip-code about 20 years ago (I live in
| Romania, we were implementing the first google-maps-based
| mashup in this country) and me answering that I have no idea,
| and that no-one around these parts really knows his/her postal-
| code. We do know our address, though, or used to, before we had
| smart-phones.
| gibspaulding wrote:
| > Developers make assumptions about postal codes
|
| Until very recently I naively assumed that the area of a given
| zip code would be entirely within the area of some single city
| or town which would then be entirely within the area of a
| single county.
|
| It was quite a rude awakening working with software that tries
| to apply the correct local taxes to a given address and finding
| that the statement "A given X can contain multiple Y" is true
| for every possible combination of zip, city, and county.
| 0xbadcafebee wrote:
| Here's a recent podcast about why ZIP codes are not great for
| analysis: https://www.npr.org/2025/01/08/1223466587/zip-code-
| history
| spankalee wrote:
| Unironically, what a great sales blog post!
|
| It's so well written and informative that I completely didn't
| mind the "and here's how to do it in Carto" bit in the middle.
| Instead I thought they earned it.
| Anon84 wrote:
| This is an example of the well known Modifiable Areal Unit
| problem:
| https://en.wikipedia.org/wiki/Modifiable_areal_unit_problem In
| general, your statistics depend on how you define your areas and
| you _will_ get different pictures with different definitions.
| flappyeagle wrote:
| No
| JackFr wrote:
| When doing your first ML project, zip codes are unsurpassed in
| providing a set of hand written digits to train on.
| OriPekelman wrote:
| Well funny story, some twenty something years ago I actually
| worked on an election cycle volunteer infra thing in France, and
| living in Paris which is department 75 and therefore 750xx the
| prefecture being 75000 I assumed it was neatly hierarchical 75004
| won't be far away from 75003 (true)... The French thing being
| orderly and rational.
|
| I didn't need much precision so truncating seemed an easy way to
| group stuff.
|
| Oh the surprise. I never again made such assumptions, let's just
| say I should have gotten a clue from Corsica being 2A and 2B.
___________________________________________________________________
(page generated 2025-02-07 23:00 UTC)