[HN Gopher] Stop using zip codes for geospatial analysis (2019)
       ___________________________________________________________________
        
       Stop using zip codes for geospatial analysis (2019)
        
       Author : voxadam
       Score  : 137 points
       Date   : 2025-02-07 16:46 UTC (6 hours ago)
        
 (HTM) web link (carto.com)
 (TXT) w3m dump (carto.com)
        
       | hammock wrote:
       | Great article. Zip codes can be super expedient. But you have to
       | be self aware that for many uses cases they function WORSE than a
       | random grid. Because they have built-in aggregation of a central
       | post office(and surrounding) with a certain radius of rural/less
       | dense surrounding.
       | 
       | So for example, if you are sorting "rural zips" vs "urban zips"
       | it will only take you so far, and may actually be harmful.
       | 
       | Same goes with MSAs/DMAs (media markets). These have to be used
       | for buying media, but for geospatial analysis they are suboptimal
       | for the same reasons.
       | 
       | Easiest way to dip your toe into the water of something better is
       | to start with A-D census counties.
        
       | jihadjihad wrote:
       | To put it in plain mathematical language, ZIP codes are not
       | defined as polygons [0]. The consequence is that performing any
       | analysis with an assumption that ZIP codes are polygons is bound
       | to be error-prone.
       | 
       | 0: https://manifold.net/doc/mfd8/zip_codes_are_not_areas.htm
        
         | mholt wrote:
         | Yeah. ZIP codes are sets in the abstract-dimensional space of
         | carrier delivery points. I suppose you could think of them as
         | lines, but definitely not polygons.
        
           | cogman10 wrote:
           | Zip codes (in the US) are machine readable numbers a mail
           | sorter can use to send a parcel to the right delivery truck
           | for final delivery. In the US, they represent the hierarchy
           | of postal centers with the most significant digit
           | representing the primary hub for a region and the smallest
           | number the actual post office that will be in charge of
           | delivering the letter (or truck if you do the extended post
           | code).
           | 
           | They don't represent geography at all, they represent the
           | organizational structure of USPS.
           | 
           | They work by making the address on a letter almost
           | meaningless. For some smaller population zip codes you can
           | practically just put the name and zip code down and achieve
           | delivery.
        
             | Spivak wrote:
             | Right but this ends up being a good approximation for
             | geography because the reality of logistics is that you end
             | up doing a cute n-ary search of the geography. When you
             | know the regional hub you can say for certain a huge chunk
             | of the US the zip code doesn't represent. And then you keep
             | n-secting. Sometimes the land-mass you get at the end is
             | specific enough for your uses.
             | 
             | You're not going to wind up with a situation where zip
             | codes with the same regional marker end up on different
             | coasts.
        
               | mattforrest wrote:
               | Just use a spatial query. That's what they are made for.
        
               | makeitdouble wrote:
               | > You're not going to wind up with a situation where zip
               | codes with the same regional marker end up on different
               | coasts.
               | 
               | Couldn't this happen for military or proxy codes (PO
               | boxes or other) ?
        
             | alsodumb wrote:
             | I agree that they weren't explicitly meant to represent
             | geography, but implicitly they do, right? Are there cases
             | where this is violated?
             | 
             | In other words, is it safe to assume that for entity in a
             | zip code is less than x distance away from the closest
             | entity in the same zip code?
        
               | freyfogle wrote:
               | it is safe to assume nothing.
               | 
               | Please see: https://opencagedata.com/guides/how-to-think-
               | about-postcodes...
               | 
               | I write this as someone who grew up in the ZIP code 09180
        
               | makeitdouble wrote:
               | It might be true, but does it help if the x varies from
               | "on a nearby mountain" to "within a street block", and
               | you sometimes have every habitants closer to another zip
               | code than theirs ?
        
             | mywittyname wrote:
             | > For some smaller population zip codes you can practically
             | just put the name and zip code down and achieve delivery.
             | 
             | A 5+4 formatted ZIP code maps to just a handful of
             | addresses. In cities with larger populations, the +4 could
             | map to a single building, and in more sparely populated
             | place, it might include houses on a handful of roads.
             | 
             | For smaller datasets, ZIP+4 might as well be a unique
             | household identifier. I just checked a 10 million address
             | database and 60% of entries had a unique ZIP+4, so one
             | other bit of PII would be enough to be a 99.99% unique
             | identifier per person.
             | 
             | With a geo-coded ZIP+4 database, you could locate people
             | with a precision that's proportional to the population
             | density of their region.
        
               | mattforrest wrote:
               | Yeah but we have that already in the census hierarchy.
               | Plus you have to pay to access Zip+4 geospatial data and
               | it changes sometime as frequently as quarterly
        
             | mattforrest wrote:
             | Well put
        
         | mcphage wrote:
         | > The consequence is that performing any analysis with an
         | assumption that ZIP codes are polygons is bound to be error-
         | prone.
         | 
         | Yeah, but any analysis you're likely to perform is approximate
         | enough that the fact that ZIP codes aren't polygons is
         | basically a rounding error.
         | 
         | Plus, it's a lot easier to get ZIP codes, and they're more
         | reliably correct, so you might still get better results, than
         | you would going with another indicator that is either (a) less
         | reliable or (b) less available.
        
           | mattforrest wrote:
           | They aren't reliable correct actually. The boundaries that
           | the Census publishes are called Zip Code Tabulation Areas
           | which are approximations of zip codes and include overlaps.
        
             | wombatpm wrote:
             | ZCTA5 roughly corresponds to the area of a 5 digit zip
             | code. Problem is there are large areas of the west that
             | don't have permanent residents and no mail delivery. Plus
             | they change over time.
        
       | mattforrest wrote:
       | Funny to see this one pop up today (I wrote this one way back
       | when) but I just refreshed it into a video on my channel:
       | https://www.youtube.com/watch?v=x-opv4REEic
        
       | nancyp wrote:
       | Instead of zip use the following?
       | 
       | Use Addresses Use Census Units Use your own Spatial Index
       | 
       | Why not lat, long?
        
         | ajfriend wrote:
         | It depends on if you want to model a point or an area. lat/lng
         | gives you a _point_ , but you often want an _area_ to, for
         | example, count how many people are in that area. A spatial
         | index like H3 provides a grid of area units.
        
           | HappMacDonald wrote:
           | But so do lat long ranges.
        
             | ajfriend wrote:
             | You can use those if they work for your application. One
             | downside would be that you're storing 4 numbers compared to
             | a single `int64` index with H3.
             | 
             | You also have to decide how you'll do that binning. Can
             | bins overlap? What do you do at the poles? H3 provides some
             | reasonable default choices for you so don't have to worry
             | about that part of your solution design.
        
         | ww520 wrote:
         | Lat/lon is in a spherical coordinate. It's more complicated to
         | do calculation.
         | 
         | Btw. I have a need recently to compute the shortest distance
         | from a point to a line defined by two points, all in lat/lon.
         | Anyone has any lead on how to do it?
        
       | ajfriend wrote:
       | ...and use H3 instead! https://h3geo.org/
        
         | sbrother wrote:
         | Very different use case -- ZIPs/ZCTAs have some semblance of
         | population normalization
        
           | mattforrest wrote:
           | Not necessarily true. The population isn't balanced at all
           | between many. Census units are.
        
             | ellisv wrote:
             | Absolutely this. Use other Census areal units if you can
             | and ZCTAs only if you have to.
        
           | ajfriend wrote:
           | If you care about that and have a data source, you can add,
           | for example, population density per H3 cell as part of your
           | analysis. That has the additional benefit of denoting the
           | this quantity of interest _explicitly_ , rather than some
           | implicitly assumed correlation which may not be true.
        
             | ingenieroariel wrote:
             | Hey AJ, this is almost on topic, do you know of a more up
             | to date version of the dataset you used on the blog post
             | release for H3 v4.0.0 [1]? They stopped updating in Oct
             | 2023. Thanks! [1] https://data.humdata.org/dataset/kontur-
             | population-dataset
        
               | ajfriend wrote:
               | I don't. And maybe I should have emphasized "and have a
               | data source" more, since its doing a lot of the heavy-
               | lifting in my statement :)
        
         | diggan wrote:
         | What H3 do I belong to if my house is split between three
         | different ones, pretty much equally? Any/all of them?
        
           | maxmouchet wrote:
           | You take a smaller H3 :-) The maximum area of a resolution 15
           | H3 is 1 square meter, so unlikely to split a house in two.
        
         | hammock wrote:
         | What is the benefit of H3 over a rectangular grid?
        
       | jpjoi wrote:
       | Zip codes are just weird to use for anything other than mail in
       | general because they're set up based off infrastructure.
       | 
       | CGP Grey has a great video on this:
       | https://m.youtube.com/watch?v=1K5oDtVAYzk
        
         | diggan wrote:
         | I've noticed more and more super/hypermarkets started asking
         | for your zip/postal code sometime during self-checkout. I'm
         | guessing they use these as approximations about where people
         | travel from, so they can evaluate if to open more stores closer
         | to popular areas, or something like that. Pretty sure there is
         | more use cases for postal codes too.
        
           | paraboli wrote:
           | They use bulk mail to send out flyers, coupons, and can use
           | zip codes to AB test these.
        
           | kjellsbells wrote:
           | Postcodes are very useful (but not perfect) proxies for
           | household socioeconomic status, which is useful for marketing
           | and sales analysis.
           | 
           | That data linked with the payment method that the register
           | collects pretty much gives the store exactly who you are and
           | where you live even if you chose not to sign up to the
           | store's loyalty program.
        
         | Spivak wrote:
         | Wait until you find out that this is the same way phones used
         | to work. The number was the row/colum for the operator needed
         | to plug your line into.
        
       | paulddraper wrote:
       | People use ZIP codes because they have ZIP codes.
       | 
       | No one has census blocks.
       | 
       | And coordinates can work but lack some inherent advantages, such
       | as human readability and a semblance of pop density
       | normalization.
        
       | funkaster wrote:
       | If you want to learn a bit more, there was a recent, really good
       | Planet Money episode[1] about this exact same topic. They focus
       | on the problems that you might face when using zip code for
       | demographic analysis.
       | 
       | [1]: https://www.npr.org/2025/01/08/1223466587/zip-code-history
        
       | throw0101c wrote:
       | CGP Grey recently posted a video on Zip codes, "The Hidden
       | Pattern in Post Codes":
       | 
       | * https://www.youtube.com/watch?v=1K5oDtVAYzk
        
         | Cthulhu_ wrote:
         | That's what I was thinking of earlier, the succinct version is
         | "your address is where mail needs to go, the zip code is how to
         | get it there". Or in other words, the zip code is the
         | address(es) of the sorting centers and post offices to the
         | destination.
        
       | jonas21 wrote:
       | ZIP codes are an emergent property of the mail delivery system.
       | While the author might consider this a bad thing, this makes them
       | "good enough" on multiple axes in practice. They tend to be:
       | 
       | - Well-known (everybody knows their zip code)
       | 
       | - Easily extracted (they're part of every address, no geocoding
       | required)
       | 
       | - Uniform-enough (not perfect, but in most cases close)
       | 
       | - Granular-enough
       | 
       | - Contiguous-enough by travel time
       | 
       | Notably, the alternatives the author proposes all fail on one or
       | more of these:
       | 
       | - Census units: almost nobody knows what census tract they live
       | in, and it can be non-trivial to map from address to tract
       | 
       | - Spatial cells: uneven distribution of population, and arbitrary
       | division of space (boundaries pass right through buildings), and
       | definitely nobody knows what S2 or H3 cell they live in.
       | 
       | - Address: this option doesn't even make sense. Yes, you can
       | geocode addresses, but you still need to aggregate by something.
        
         | mattforrest wrote:
         | Well you hit on all the points that discuss the compromises
         | that zip codes offer. Just because you have them in your data
         | doesn't mean that they can produce anything useful. You are
         | correct that no one knows their census unit is (if you are
         | thinking from someone entering this on a website) but
         | collecting location or address will be a lot better.
         | 
         | Fact is a lot of web data contains a zip but if you can collect
         | something better it will usually render better results. Unless
         | you are analyzing shipments then that is fine.
        
         | ellisv wrote:
         | There are point process models, but, yes, its much more common
         | to want to aggregate to a spatial area.
         | 
         | Another consideration is what kind of reference information is
         | available at different spatial units. There are plenty of
         | Census Bureau data available by ZCTA but some data may only be
         | available at other aggregate units. Zip Codes are often used as
         | political boundaries.
         | 
         | I'd also mention the "best" areal unit depends on the data.
         | There is a well known phenomenon called the modifiable areal
         | unit problem in which spatial effects appear and vanish at
         | different spatial resolutions. It can sort of be thought of as
         | a spatial variation of the ecological fallacy.
        
         | ericrallen wrote:
         | This is a tangent, but addresses are also way more complicated
         | than most people realize - especially if you're relying on a
         | user to input a correct address or if you need to support
         | multiple countries, somewhere with unique addresses like
         | Queens[0], or you need to differentiate between units of a
         | specific street address that uses something other than unit
         | numbers for a unit designation.
         | 
         | At that point you need something like Smarty[1] to validate and
         | parse addresses.
         | 
         | [0]: https://stackoverflow.com/questions/2783155/how-to-
         | distingui...
         | 
         | [1]: https://www.smarty.com/
        
           | nitwit005 wrote:
           | Yes, unfortunately, their assertion that everyone knows their
           | zip code is wrong. People often write a neighboring code, and
           | the post office just delivers it.
           | 
           | Similar issues for city name, of course.
        
             | VWWHFSfQ wrote:
             | Very common in NYC. People will use all of "New York, NY",
             | "Queens, NY", or "Astoria, NY" all interchangeably and the
             | post office will still just deliver it to the same place.
        
             | steezeburger wrote:
             | This sounds like the person doesn't know the receiver's zip
             | code. Why are you extending that to not knowing their own
             | zip code? Are they mailing something to themselves?
        
               | toast0 wrote:
               | People often give out their mailing address, and may be
               | misinformed about their zip code.
               | 
               | If you get close enough, it usually gets handled in the
               | local sort, but not always.
               | 
               | On cities, the mailing address city really is the name of
               | the post office that handles your delivery route. Often
               | there's a relationship with the city you live in, but
               | there's cases both ways --- I used to live outside city
               | limits, we had a census designated place name, a
               | municipal sanitary district and had a fire department at
               | one time... but never a post office, so our mailing
               | address used the nearby city name, where our post office
               | resided. The place name had an incorporated city on the
               | other side of the state, so using that wouldn't be great.
               | 
               | Nowadays, post offices often have a list of alternative
               | place names, so where I live now, I can pick between the
               | incorporated city name, the nearby large city where a
               | post office that processes all my mail is located, or any
               | of the numerous small post offices that once served my
               | city.
        
             | paulddraper wrote:
             | They know their ZIP code far, far better than any other
             | plausible geographic cell.
        
           | rented_mule wrote:
           | An annoyance for me is that I've yet to see any address
           | validator get my current home address right. They all insist
           | my address is on the road that leads to my road rather than
           | my actual road. It's understandable that they can't be 100%
           | accurate given the scale / complexity of addresses.
           | 
           | Most sites/apps will let me override the validator, but a few
           | won't. The most common ones that insist on using the wrong
           | address are financial institutions that say the law requires
           | them to have my proper physical address and therefore they go
           | with the (incorrectly) validated version.
           | 
           | USPS does not do home delivery in our area, and
           | UPS/FedEx/etc. usually figure it out given that street
           | numbers alone uniquely identify properties in our town.
        
             | killjoywashere wrote:
             | Same! My wife ran a business from home during the pandemic
             | and we actually went through the effort to work with Google
             | Maps (they called us) to get it on the map. And of course
             | USPS has no problem. But our address was originally a
             | federal building with a letter, still only has a letter, no
             | number, and there are now all sorts of work-arounds
             | floating around on how resolve addresses in our
             | neighborhood. What's wild is the Post Office is literally
             | down the street from our house, and our house predates the
             | founding of most of the big delivery services, which all
             | manage to deliver to us, given their preferred incantation.
             | If I can't get the shipper to pass the right incantation to
             | their shipping service, shenanigans ensue. My (least?)
             | favorite was an item that went across the Pacific Ocean 3
             | times over the course of 3 months.
        
           | ghaff wrote:
           | Just last week I had to deal with the fact that my house has
           | the wrong address in multiple databases because things
           | changed when an interstate went in 40-something years ago.
           | It's not a big change--main st. vs N main st. but it was
           | enough to mess up various things. Not as much as when I moved
           | in 30 years ago but still enough to be wrong in old town and
           | telco records. Took me a couple of days to get a permit
           | issued to get electrical hooked back up after a fire as a
           | result because apparently some town clerk insisted the
           | address wasn't valid.
        
           | bob1029 wrote:
           | Addresses are a huge ordeal in banking. Easily one of the
           | most tortured domain types when it comes to edge cases and
           | integration pain.
           | 
           | Every customer I've worked with insisted on having all
           | addresses ran through the USPS verification API so they could
           | get their bulk mailing discounts.
           | 
           | Even if you get the delivery/cost side under control, you
           | still have to make sure you are talking about the right
           | address from a logical perspective. Mailing, physical,
           | seasonal, etc. address types add a whole extra dimension of
           | fun.
        
         | o11c wrote:
         | Also, "use a different grid" is only masking the problem, not
         | actually fixing it.
         | 
         | The real problem is _ever_ using an average without also
         | specifying some sort of bounds. For median-based data, this
         | probably means the upper and lower quartiles (or possibly other
         | percentiles); for mean-based data, this probably means standard
         | deviation.
        
         | JumpCrisscross wrote:
         | Would add that there are network effects with zip code data. If
         | you collect H2 data, you have fewer sources with which to join.
        
         | walrus01 wrote:
         | In terms of "good enough", a Canadian postal code, broadly
         | equivalent to a zip code, is much more granular and can often
         | identify an individual apartment building, or single city
         | block. Plenty of large office buildings in major Canadian
         | cities also have their own postal code.
         | 
         | The functionality of it is closer to the "Zip+4" with extension
         | used to have a more granular routing of physical mail for USPS.
         | 
         | https://www.canadapost-postescanada.ca/cpc/en/support/articl...
         | 
         | https://en.wikipedia.org/wiki/Postal_codes_in_Canada
        
           | ssl-3 wrote:
           | Sure, and in the States, ZIP+4 could once nail my postal
           | location to a subset of 4 (of a group of 16) mailboxes within
           | a particular set of entry doors on a particular apartment
           | building.
           | 
           | But broadly speaking, nobody knows what their ZIP+4 is, while
           | I imagine that most people in Canada know their postal code
           | by heart.
           | 
           | It is interesting.
        
             | bluGill wrote:
             | The plus four changes all the time so it isn't feasable to
             | know it. The use is large mailers can get a discount by
             | looking it up and presorting mail. If the mail coming into
             | my post office has my mail next to my next door neighbors
             | that saves them a lot of time.
        
               | kstrauser wrote:
               | Is that still true? I would imagine any reasonably modern
               | computer could map every physical address in a huge
               | region to a (route number, stop number) pair. I wouldn't
               | think the +4 would add a lot of value anymore.
        
           | mattforrest wrote:
           | Yeah but Zip+4 represent a collection of houses not a polygon
           | so not useful for aggregations or statistical work
        
           | throw0101c wrote:
           | > _In terms of "good enough", a Canadian postal code, broadly
           | equivalent to a zip code, is much more granular and can often
           | identify an individual apartment building, or single city
           | block._
           | 
           | To the point that StatCan and other agencies have rules on
           | the number of characters that are collected/disseminated with
           | other data to make sure it's not too identifying:
           | 
           | * https://www.canada.ca/en/government/system/digital-
           | governmen...
           | 
           | * https://www12.statcan.gc.ca/nhs-enm/2011/ref/DQ-
           | QD/guide_2-e...
        
         | hinkley wrote:
         | Contiguous enough by data travel time as well. A few people
         | will get 5 ms more latency than the exact optimal route, but
         | it's not like your routes are exactly optimal anyway.
         | 
         | And don't forget sales tax. Which is state + county + city
        
           | kstrauser wrote:
           | ... + special entertainment district + business renovation
           | area + exception + exception + exception + ...
        
         | michaelmrose wrote:
         | If you are worrying about address at all instead of tax or
         | legal jurisdiction its probable that you as a business have a
         | physical presence. You can probably correlate better by
         | predicting which location a given address would likely interact
         | with if you don't know already by prior purchases/interaction
         | which they normally do so. I would suggest actual purchase data
         | followed by travel time.
         | 
         | Zip and distance as the crow flies often gives shit data. My
         | zip suggests I'm off in bum fuck and since I'm on the puget
         | sound things that are relatively near as the crow flies can
         | actually be hours away.
        
         | killjoywashere wrote:
         | ZIPs are also specifically used in a variety of medical,
         | epidemiologic, public health contexts and HHS has explicit,
         | fairly fine-grained rules on their use:
         | https://www.hhs.gov/hipaa/for-professionals/special-topics/d...
        
         | raphman wrote:
         | One more advantage: ZIP codes are a good trade-off if you want
         | to gather anonymous data in a survey or provide anonymized data
         | to an outside entity. For example, we recently conducted a
         | survey on mobility patterns within our university. To offer
         | respondents a reasonable amount of anonymity, we just asked for
         | their (German) ZIP code and the location of their primary
         | workplace. This allows us to determine the distance and
         | approximate route people would take between home and university
         | campus - to a degree that is sufficient for our goals.
        
       | trgn wrote:
       | First the mercator projection, now they're coming after the zip
       | codes.
        
       | serjester wrote:
       | H3 is awesome here! What I don't think many people realize is
       | that H3 cells and normal geographic data (like zips) are not
       | mutually exclusive. You can take zip outlines, and find all the
       | h3 cells within them and allocate your metric accordingly
       | (population, income, etc).
       | 
       | This makes joining disparate data sources quite easy. And this
       | also lets you do all sorts of cool stuff like aggregations,
       | smoothing, flow modeling, etc.
       | 
       | We do some geospatial stuff and I wrote a polars plugin to help
       | with this a while back [1].
       | 
       | [1] https://github.com/Filimoa/polars-h3
        
         | hammock wrote:
         | What is the benefit of H3 vs a rectangular grid?
        
           | kylebarron wrote:
           | Equal distances to each adjacent neighbor:
           | https://www.uber.com/blog/h3/
        
             | ajfriend wrote:
             | They also only have one _type_ of neighbor. Square grids
             | have 2 neighbor types. Triangular grids have 3.
        
               | hammock wrote:
               | Makes perfect sense. Thanks both
        
       | agtech_andy wrote:
       | Zip codes are great for anything with delivery logistics.
       | 
       | Anything else is a loose correlation at best, that will likely
       | change over time.
        
       | PLenz wrote:
       | I gave a talk at DataEngConf many years ago:
       | https://www.datacouncil.ai/talks/zip-codes-and-other-lies-yo...
        
       | zuhayeer wrote:
       | This is interesting since zip codes came up in consideration for
       | how we built out our pay choropleth map in the US:
       | https://levels.fyi/heatmap
       | 
       | Though ultimately it was far too granular (for example the Bay
       | Area would be so many different zip codes). Instead we went with
       | Nielsen's DMA (Designated Market Area) mappings within the US to
       | abstract aggregated data a bit better. And of course this DMA
       | dataset also had a different original use case. It was used for
       | TV / media market surveys so it has some weird vestiges. Some
       | regions are grouped very far and wide (you'll notice there's a
       | bit of Denver within Nevada and its just a remnant of how it used
       | to be categorized), but it still provides a bit of a broader
       | level grouping than something acute like zip code.
       | 
       | I do like this map from the article though and the granularity
       | you can get with zip code when zooming:
       | https://clausa.app.carto.com/map/29fd0873-64cb-42a6-a90d-c83...
       | 
       | We've also been considering using Combined Statistical Areas
       | using population instead. This is something that is under way,
       | and in the interim we've considered charting styles that don't
       | necessarily need borders (for example this bubble map:
       | https://www.levels.fyi/bubble-plot/europe/). The benefit with
       | DMAs is that it offers full border coverage of the entire US
       | whereas some hubs can still be missing from CSAs if relying on a
       | population threshold. But the plan is to create some of our own
       | regional definitions and borders using our own submissions
       | combined with population. Will be an interesting project.
       | 
       | GeoJSON data for the map borders:
       | https://github.com/PublicaMundi/MappingAPI/blob/master/data/...
       | 
       | Nielsen DMA regions:
       | https://blocks.roadtolarissa.com/simzou/6459889
        
       | dhunter_mn wrote:
       | I used to work for a company that basically merged USPS and
       | Census Bureau data on a monthly basis. The output would be a
       | roadbase that was optimized for address ranges on road segments.
       | ZIP Codes were extra fun to work with.
        
       | eterevsky wrote:
       | ZIP codes are a simple approximation, which does their job good
       | enough in most cases.
       | 
       | The alternatives that the author suggests are much more
       | complicated, both in terms of the implementation and in terms of
       | convincing the user to give you their full address.
        
       | ej1 wrote:
       | [flagged]
        
         | dang wrote:
         | Please stop. Automated comments are not allowed here.
        
       | Zamicol wrote:
       | I wrote the blackout system for Comcast TV scheduling. My
       | understanding was that blackouts were used mostly for sports
       | where games need to be available in one area and not others.
       | Contractually, they were required to use zip codes, so I used the
       | US Post office's zip code data to enforce blackouts.
        
       | lacoolj wrote:
       | For anyone curious, here is the official US Gov list of ZIP codes
       | in CSV with lots of helpful related data (longitude, latitude,
       | etc.)
       | 
       | http://federalgovernmentzipcodes.us/free-zipcode-database-Pr...
        
         | yellowbkpk wrote:
         | There is no "official US Gov list of ZIP codes". They come from
         | the US Postal Service, and those aren't published for free.
        
       | ivell wrote:
       | India is experimenting with Digipin
       | https://www.indiapost.gov.in/Navigation_Documents/Static_Nav...
       | 
       | Which is derived from longitude and latitude..
        
         | extraduder_ire wrote:
         | I've seen a few attempts like this, like loc8 and google's plus
         | codes. Is there any advantage to Digipin over existing
         | solutions other than avoiding splitting major cities into very
         | different codes? None stood out to me from that document. The
         | description is written pretty well.
         | 
         | Always sad when these schemes don't include a check digit in
         | them though, even if the layout of this one gets typo'd codes
         | pretty close to their intended destination.
        
       | mannyv wrote:
       | Zip codes, zctas, and tiger/line are good enough for what most
       | people need. Maybe you can find an edge by using something more
       | granular...but I'm not sure what edge you'd be looking to get
       | with geodata. Maybe for real estate trends and/or market
       | analysis?
        
         | clutchdude wrote:
         | I agree.
         | 
         | Reading their alternatives, it strikes me with "ZCTA's are the
         | worst form of small area aggregation except for all others."
         | 
         | Its not a great geography to use but it is _quite_ useful if
         | you know it 's limitations and inaccuracies when you get into
         | it. Stuff like multipolygon entities, island-polys, etc aren't
         | fun to resolve but can be accounted for.
         | 
         | Add on that ZCTA's will historically follow some sort of actual
         | boundary(rivers/highways/etc) they can tell a story in a way
         | Census tracts can't.
        
       | mmmlinux wrote:
       | Can anyone tell me why I have to enter both my city / state and a
       | zip code. shouldn't one or the other of those plus my street
       | address be enough information?
        
         | ubermonkey wrote:
         | Web devs not using a good library that will populate the former
         | from the latter?
        
           | jayknight wrote:
           | Some libraries will insist that my address is in a different
           | city because my zip code spans the border. I mean if my mail
           | has the other city it still gets to me, but for anything
           | other than mail, they now have the wrong city for me.
        
             | mmmlinux wrote:
             | Does it matter if the "city" is wrong if your street
             | address + zip code is unique?
        
               | jayknight wrote:
               | It depends on what they're doing with it. But mostly
               | probably not.
        
         | sophacles wrote:
         | Several posts in this thread have linked the recent GCP-gray
         | video on the topic, and it addresses this question better than
         | I can. It's pretty interesting actually
        
       | ubermonkey wrote:
       | I'm reminded of this:
       | 
       | https://www.npr.org/2004/04/01/1805651/post-office-calls-for...
        
       | freyfogle wrote:
       | There are many problems with zip codes / postal codes but the
       | biggest two we see are:
       | 
       | a. Excel treats them as numbers instead of strings of digits and
       | thus drops the leading 0
       | 
       | b. Developers make assumptions about postal codes based on how
       | they work (or more usually how the developer incorrectly thinks
       | they work) in their own country and these assumptions absolutely
       | do NOT hold in other countries.
       | 
       | A relevant guide to geocoding and postal codes:
       | https://opencagedata.com/guides/how-to-think-about-postcodes...
        
         | paganel wrote:
         | Also, "everybody" knows their zip-code/postal-code is mostly an
         | American/British thing, I still remember my British former boss
         | asking me about my zip-code about 20 years ago (I live in
         | Romania, we were implementing the first google-maps-based
         | mashup in this country) and me answering that I have no idea,
         | and that no-one around these parts really knows his/her postal-
         | code. We do know our address, though, or used to, before we had
         | smart-phones.
        
         | gibspaulding wrote:
         | > Developers make assumptions about postal codes
         | 
         | Until very recently I naively assumed that the area of a given
         | zip code would be entirely within the area of some single city
         | or town which would then be entirely within the area of a
         | single county.
         | 
         | It was quite a rude awakening working with software that tries
         | to apply the correct local taxes to a given address and finding
         | that the statement "A given X can contain multiple Y" is true
         | for every possible combination of zip, city, and county.
        
       | 0xbadcafebee wrote:
       | Here's a recent podcast about why ZIP codes are not great for
       | analysis: https://www.npr.org/2025/01/08/1223466587/zip-code-
       | history
        
       | spankalee wrote:
       | Unironically, what a great sales blog post!
       | 
       | It's so well written and informative that I completely didn't
       | mind the "and here's how to do it in Carto" bit in the middle.
       | Instead I thought they earned it.
        
       | Anon84 wrote:
       | This is an example of the well known Modifiable Areal Unit
       | problem:
       | https://en.wikipedia.org/wiki/Modifiable_areal_unit_problem In
       | general, your statistics depend on how you define your areas and
       | you _will_ get different pictures with different definitions.
        
       | flappyeagle wrote:
       | No
        
       | JackFr wrote:
       | When doing your first ML project, zip codes are unsurpassed in
       | providing a set of hand written digits to train on.
        
       | OriPekelman wrote:
       | Well funny story, some twenty something years ago I actually
       | worked on an election cycle volunteer infra thing in France, and
       | living in Paris which is department 75 and therefore 750xx the
       | prefecture being 75000 I assumed it was neatly hierarchical 75004
       | won't be far away from 75003 (true)... The French thing being
       | orderly and rational.
       | 
       | I didn't need much precision so truncating seemed an easy way to
       | group stuff.
       | 
       | Oh the surprise. I never again made such assumptions, let's just
       | say I should have gotten a clue from Corsica being 2A and 2B.
        
       ___________________________________________________________________
       (page generated 2025-02-07 23:00 UTC)