[HN Gopher] How to build a IP geolocation database from scratch?
___________________________________________________________________
How to build a IP geolocation database from scratch?
Author : incolumitas
Score : 343 points
Date : 2023-09-14 11:00 UTC (11 hours ago)
(HTM) web link (ipapi.is)
(TXT) w3m dump (ipapi.is)
| fasteo wrote:
| >>> Consider Open Source Geolocation Projects
|
| Not the definition of "from scratch" in my book
| dboreham wrote:
| Interesting but this isn't actually how geolocation is done,
| right? The ARIN/RIPE data isn't sufficiently accurate to be
| useful beyond country. Commercial geolocation involves
| correlating client IP vs known physical location e.g. from WiFi
| AP or mailing a package to the user. At least that's what I have
| been told over the decades.
| shortrounddev2 wrote:
| I work in adtech and this is how we do geolocation. There's
| also device geolocation but if the user doesn't consent to
| sharing their GPS data with us, we just use IP address for
| targeting. Common provider for this is Maxmind; they ship a
| database that you host locally and query
| oh_come_on wrote:
| [dead]
| tiffanyh wrote:
| Does Cloudflare have the same data as Maxmind?
|
| Because Cloudflare and Maxmind geolocate me to the exact same
| longitude/latitude.
| klaussilveira wrote:
| CloudFlare uses Maxmind: https://developers.cloudflare.com/
| support/network/configurin...
| dawnerd wrote:
| Even the free maxmind db is accurate enough for most
| applications.
| klaussilveira wrote:
| Since you are in adtech: do you buy MaxMind, or roll your
| own? Are there any providers for US-only data, and therefore,
| cheaper?
| shortrounddev2 wrote:
| We licensed Maxmind's DB recently (it's like $300 a year or
| something). idk if there are US-only databases. Our
| customers are all in the US, and we use geo IP to filter
| european users for compliance (GDPR and otherwise)
| ChopSticksPlz wrote:
| This is a very useful .csv, what is the license? Is it free for
| personal and commercial use?
| jl6 wrote:
| Is anybody maintaining a historical archive of "IP address
| metadata" (which would include geolocation)?
|
| If I have logs from 10 years ago, can I look up information about
| that IP as it was at the time?
| sneak wrote:
| I feel like a more useful and accurate way would be to buy client
| ip and GPS location data in bulk from one of the mobile data
| brokers who have their spyware embedded in zillions of popular
| apps/games and then group it by /24 or something.
| johnklos wrote:
| I think it's interesting that the one IP range I decided to check
| has correct information on the ipapi.is web site, but
| unambiguously incorrect information in the downloadable
| geolocationDatabaseIPv4.csv. Somehow Bedford, New Hampshire
| (which came straight from WHOIS) became Bedford, Texas.
|
| How'd that happen?
| alberth wrote:
| What are common use cases for needing IP geolocation?
| kiririn wrote:
| A modern version of the ping-based geoip mentioned
|
| https://github.com/Ne00n/yammdb
| JoshGlazebrook wrote:
| This just links to a mmdb file that is already compiled, there
| isn't anything relevant to show this is a "modern"
| implementation of anything if the implementation isn't
| available.
| mootothemax wrote:
| Any suggestions for geolocating datacenter IPs, even very
| roughly? I'm analysing traceroute data, and while I have known
| start and end locations, it's the bit in the middle I'm
| interested in.
|
| I can infer certain details from airport codes in node hostnames,
| for example.
|
| It would also be possible - I guess - to infer locations based on
| average RTT times, presuming a given node's not having a bad day.
|
| Anyone have any other ideas?
|
| Edit: A couple of troublesome example IPs are 193.142.125.129,
| 129.250.6.113, and 129.250.3.250. They come up in a UK traceroute
| - and I believe they're in London - but geolocate all over the
| world.
| toast0 wrote:
| Those IPs are owned by Google and NTT, who both run large
| international networks and can redeploy their IPs around the
| world when they feel like it. So lookup based geolocation is
| going to be iffy, as you've seen.
|
| Traceroute to those IPs certainly looks like the networking
| goes to London.
|
| The google IP doesn't respond to ping, but the NTT/Verio ones
| do. I'd bet if you ping from London based hosting, you'll get
| single digit ms ping responses, which sets an upper bound on
| the distance from London. Ping from other hosting in the
| country and across the channel, and you can confirm the lowest
| ping you can get is from London hosting, and there you go. It
| could also be that its connectivity is through London, but it's
| elsewhere --- you can't really tell.
|
| Check from other vantage points, just to make sure it's not
| anycast; if you ping 8.8.8.8 from most networks around the
| world, you'll get something nearby; but these IPs give
| traceroutes to london from the Seattle area, so probably not
| anycast (at least at the moment, things can change).
|
| If you don't have hosting around the world, search for public
| looking glasses at well connected network that you can use for
| pings like this from time to time.
| dontdoxxme wrote:
| https://ensa.fi/papers/geolocation_imc17.pdf has some ideas.
|
| Using RIPE atlas probes to get RTT to the IPs from known
| locations is close to your idea and probably the best anyway.
| tyingq wrote:
| This looked promising:
|
| _" TULIP's purpose is to geolocate a specified target host
| (identified by IP name or address) using ping RTT delay
| measurements to the target from reference landmark hosts whose
| positions are well known (see map or table)."_
|
| https://tulip.slac.stanford.edu/
|
| But the endpoint it posts to seems dead.
| vinay_ys wrote:
| > A couple of troublesome example IPs are 193.142.125.129,
| 129.250.6.113, and 129.250.3.250. They come up in a UK
| traceroute - and I believe they're in London - but geolocate
| all over the world.
|
| If I'm running a popular app/web service, I would have my own
| AS number and I will have purchased a few blocks of IP
| addresses under this AS and then I would advertize these
| addresses from multiple owned/rented datacenters around the
| world.
|
| These BGP advertisements would be to my different upstream
| Internet service providers (ISPs) in different locations.
|
| For a given advertisement from a particular location, if you
| see a regional ISP as upstream, you can make an educated guess
| that this particular datacenter is in that region. If these are
| Tier 1 ISPs who provide direct connectivity around the world,
| then even that guess is not possible.
|
| You can see the BGP relationships in a looking glass tool like
| bgp.tools -
| https://bgp.tools/prefix/193.142.125.0/24#connectivity
|
| If you have ability to do traceroute from multiple probes
| sprinkled across the globe with known locations, then you could
| triangulate by looking at the fixed IPs of the intermediate
| router interfaces.
|
| Even this is is defeated if I were to use a CDN like Cloudflare
| to advertise my IP blocks to their 200+ PoPs and ride their
| private networks across the globe to my datacenters.
| mannyv wrote:
| [dead]
| bullen wrote:
| Here is a solution for those that care about speed:
|
| https://www.miyuru.lk/geoiplegacy
| hddqsb wrote:
| Somewhat relevant: Google Maps can learn the location of your IP
| based on which locations you browse in the map. If you browse a
| specific location enough times, it will use that as the default
| location when you open Google Maps, even if you clear all
| cookies. (I discovered this just from using Google Maps, and I'm
| a little concerned by the privacy implications, considering that
| multiple people may share an IP address.)
| gniv wrote:
| I suspect it's the other way around. Google just has a very
| good IP geolocation db, so it uses that when you browse, absent
| any other info.
| hddqsb wrote:
| Google certainly uses its geolocation DB, but it _also_
| learns based on map browsing patterns.
|
| To clarify, the scenario I described is as follows: 1.
| Initially, when I open Google Maps in a clean browser it
| defaults to my real location. 2. I repeatedly browse some
| other location. 3. When I open Google Maps in a clean
| browser, it defaults to that other location. The _only_
| reason for Google Maps to pick that other location is my map
| browsing.
| gniv wrote:
| Thanks for clarifying. That is indeed surprising and you
| are probably right.
| netsharc wrote:
| Well it has reporting beacons all over the world with GPS
| receivers, in the form of Android phones, and perhaps Google
| Maps users on iPhone too..
| is_true wrote:
| That would explain why it sometimes it thinks I'm in a river I
| paddle often and other times where I have my summer house.
| overcast wrote:
| Step 1: Download Geolocation Database
| Aachen wrote:
| Scroll down, the article is confusingly below that
| nonethewiser wrote:
| Step 1: Download Geolocation Data
|
| Unless you think CSV is a database?
| debesyla wrote:
| Maybe a dumb question (I have no knowledge), but why wouldn't
| we think of .CSV files as databases? It can have columns and
| rows filled with information and isn't that what makes a
| thing a database?
| nobleach wrote:
| Best I can guess here, the reply is considering relational
| databases as "real databases" and flat files.... not real.
| nobleach wrote:
| Are we really going to do the mincing of words here? Did you
| need the word "dump" or "export" before you understood?
| Although I wasn't wild about the original poster's "step 1"
| terseness, it's silly to think a normal person wouldn't be
| able to parse the sentence well enough to understand
| "download the database contents - perhaps stored in CSV
| format".
| tmpX7dMeXU wrote:
| If in your mind database implies a type of technology and not
| something conceptual, you're really just outing yourself as
| someone that needs someone between you and the boardroom.
| Certainly not something to show off on Hacker News.
| n2dasun wrote:
| Step 1. Download Visual Basic
| nanmu42 wrote:
| Thanks for sharing.
|
| I have heard there is much effort to use BGP data to build GeoIP
| database.
| bjornsing wrote:
| I expected traceroute to play a bigger part in this. If you know
| the route to an IP address and the location of routers, perhaps
| even from a few different servers, then you should be able to
| locate it fairly well.
| T3RMINATED wrote:
| [dead]
| TZubiri wrote:
| "how to scrape an ip geolocation database"
|
| You know you can just run a whois query per ip you want to
| analyze, no point in scraping the whole ipvN space.
| incolumitas wrote:
| I have to scrape the whole IP address space since I offer
| location information as part of my API.
|
| Also I only need to scrape as many WHOIS records as there are
| different networks out there. So for example for the IPv4
| address space, there are much less networks as there are IPv4
| addresses (2^32).
|
| Also, most RIR's provide their WHOIS databases for download.
|
| Therefore, "scraping" is not really the correct word, it's an
| hybrid approach, but mostly based on publicly available data
| from the five RIR's.
| notlukesky wrote:
| What was the easiest and the most frustrating part?
| djbusby wrote:
| The whois data for IP is not accurate.
| gsich wrote:
| whois has no sane format.
| louison11 wrote:
| If you don't want to do this yourself, you can actually just get
| Cloudflare to do it for you for free using a simple Worker since
| all Cloudflare requests contain approximate IP location
| information.
|
| You can also just send a request to my URL (Cloudflare Worker
| operated - so it should have global low latency):
| https://www.edenmaps.net/iplocation
|
| Use it for small applications, I don't mind. Just don't start
| sending me 10M requests per day ;-)
| oh_come_on wrote:
| [dead]
| tiffanyh wrote:
| This is excellent!
|
| Would you mind open sourcing the code for that?
| louison11 wrote:
| This is the code running this endpoint:
| export function onRequest(context) { return new Respo
| nse(JSON.stringify([parseFloat(context.request.cf.longitude),
| parseFloat(context.request.cf.latitude)]), {headers:
| {"Content-Type": "application/json;charset=UTF-8"}}) }
|
| This is a function on Cloudflare Pages (which is just a
| different name for Cloudflare Workers). Minor adjustment
| needed for Workers (get rid of "context", I believe)
| emadda wrote:
| Does anyone know how accurate Cloudflare geolocation is (for
| workers requests)?
| reincoder wrote:
| I work for IPinfo and we do ping based geolocation. The best
| thing you can do to verify geolocation accuracy is the
| following:
|
| - Download a few free IP databases - Generate a random list
| of IP addresses - Do the IP address lookups across all those
| databases - Identify the IP address that can be pinged -
| Visit a site that can ping an IP address from multiple server
| - Sort the results by lowest avg ping time
|
| Then check where the geolocation provider is locating the IP
| address and what is the nearest server from there.
| banana_giraffe wrote:
| As accurate as MaxMind[1], since that's what they use [2]. In
| my experience, it's reasonably accurate for the US, less so
| for other countries. MaxMind publishes some accuracy data
| which might be an interesting starting point [3]
|
| That said, for any analytics use cases of this data, be aware
| that MaxMind will group a lot of what should be unknowns in
| the middle of a country. Or, in the case the US now, I think
| they all end up in the middle of some lake, since some farm
| owners in Butler County, Kansas got tired of cops showing up
| and sued MaxMind. It can cause odd artifacts unless you
| filter the addresses out somehow.
|
| 1 https://developers.cloudflare.com/support/network/configuri
| n...
|
| 2 https://www.maxmind.com/en/geoip-demo
|
| 3 https://www.maxmind.com/en/geoip2-city-accuracy-comparison
| matwood wrote:
| Yeah, MaxMind is the best I have used with caveats. You
| need to update it frequently, and you need to allow for
| overrides.
| [deleted]
| carstenhag wrote:
| I'm in Munich. Cloudflare tells a position that is 730km to the
| north in a random forest.
| Aachen wrote:
| Or you download an IP database rather than sharing with a third
| party which IP address is likely connecting to your service
| with a third party
| hotgeart wrote:
| Located 100km from the Somali coast... I'm in Brussels,
| Belgium, thx for protecting my privacy :D
| louison11 wrote:
| The result is [lon, lat]. You've most likely copied it onto
| Google maps, which works with [lat, lon]. Believe it or not,
| the industry still hasn't come up with a standard order.
| cstuder wrote:
| Question: What's the motivation to put coordinates in one's own
| WHOIS record? (geoloc/geofeed)
| incolumitas wrote:
| Many service providers actually want their clients to be able
| to locate them.
| dontdoxxme wrote:
| geofeed is used by big CDNs, it can actually help save money
| for the provider by meaning a CDN uses a more optimal network
| location.
| nonethewiser wrote:
| Comments seem fairly dismissive but I actually found this really
| interesting. It reminds me of a task I had in my first position
| to add PostGIS to our database and a location based search. That
| was based off addresses and zipcodes.
| mannyv wrote:
| That's relatively simple to do, even in mysql. One trick is to
| use a square instead of a circle, which avoids a lot of math.
| junto wrote:
| As someone that lives in a country where the national language is
| not my first language, I hate websites that use IP location to
| make assumptions about my choice of language and it being forced
| on me based on a lazy assumption, when my browser is sending
| language headers quite clearly, and they are ignored.
| jwie wrote:
| The easiest way to get a geolocation is to ask the user. Maybe
| they'll just tell you, and if that's good enough for your
| application there's no need for such solutions.
| jedberg wrote:
| It all depends on what you want to use it for and how accurate it
| needs to be.
|
| The best way to build a geolocation service is to have a billion
| devices that report their location to you at the same time they
| report their IP to you. That's basically Apple and Google. They
| have by far the best geolocation databases in the world, because
| they get constant updates of IP and location.
|
| The trick is basically to make an app where people willingly give
| you their location, and then get a lot of people to use it.
| That's the best way to build an accurate geo-location database,
| and why every app in the world now asks for your location.
|
| 4-square had the right idea, they were just ahead of their time.
| flounder3 wrote:
| Even 10 years ago, Apple internal privacy policies prevented
| itself from collecting precise lat/long. We had to use HTTP
| session telemetry to determine which endpoints were best for a
| given IP (or subnet, but not ASN), which informed our own
| pseudo-geoIP database so we knew which endpoint to connect to
| based on real world conditions.
|
| Even still, it had to be as ephemeral as possible for the sake
| of privacy. We weren't allowed to use or record results from
| Apple Maps' reverse geo service outside of the context of a
| live user request (finding nearby restaurants, etc).
| jedberg wrote:
| You don't need precise lat/lon to make a good database. Even
| a 1km circle would be more than enough.
|
| > but not ASN
|
| Why wasn't ASN allowed? That's what Netflix used to make
| endpoint routing decisions and worked really well.
| flounder3 wrote:
| You're not wrong, but privacy concerns were paramount.
|
| ASNs were allowed but too vague. We needed more
| granularity. Corporate proxies, subdelegations, many
| providers aggregating announcements below /24, etc.
| [deleted]
| [deleted]
| bagels wrote:
| Surely someone is using online shopping shipping addresses for
| this?
| SirMaster wrote:
| These IP geolocation lookups never seen to work for me.
|
| They are always multiple states off, and checking multiple
| different services pretty much never even seem to agree.
| reincoder wrote:
| First, I am big fan of your articles even before I joined IPinfo,
| where we provide IP geolocation data service.
|
| Our geolocation methodology expands on the methodology you
| described. We utilize some of the publicly available datasets
| that you are using. However, the core geolocation data comes from
| our ping-based operation.
|
| We ping an IP address from multiple servers across the world and
| identify the location of the IP address through a process called
| multilateration. Pinging an IP address from one server gives us
| one dimension of location information meaning that based on
| certain parameters the IP address could be in any place within a
| certain radius on the globe. Then as we ping that IP from our
| other servers, the location information becomes more precise.
| After enough pings, we have a very precise IP location
| information that almost reaches zip code level precision with a
| high degree of accuracy. Currently, we have more than 600 probe
| servers across the world and it is expanding.
|
| The publicly available information that you are referring to is
| sometimes not very reliable in providing IP location data as:
|
| - They are often stale and not frequently updated.
|
| - They are not precise enough to be generally useful.
|
| - They provide location context at an large IP range level or
| even at organization level scale.
|
| And last but not least, there is no verification process with
| these public datasets. With IPv4 trade and VPN services being
| more and more popular we have seen evidence that in some
| instances inaccurate information is being injected in these
| datasets. We are happy and grateful to anyone who submits IP
| location corrections to us but we do verify these correction
| submissions for that reason.
|
| From my experience with our probe network, I can definitely say
| that it is far easier and cheaper to buy a server in New York
| than in any country in the middle of Africa. Location of an IP
| address greatly influences the value it can provide.
|
| We have a free IP to Country ASN database that you can use in
| your project if you like.
|
| https://ipinfo.io/developers/ip-to-country-asn-database
| caribdude wrote:
| [dead]
| Daviey wrote:
| Would you consider no-signup inspection of the data you hold on
| the requesters IP address? I would love to see what you have on
| MY IP address, and if sufficiency accurate it feels that it
| would be a good incentive to sign up to use commerically.
|
| It feels like it couldn't be abused by 'freeloaders', because
| i'd guess their use-case is viewing other peoples.
| reincoder wrote:
| We have a very open approach to our data. In fact, our
| website is extremely accessible. It is quite useful for
| researching IP addresses and does not require signing up. The
| data is largely available to view on the website. Although we
| display all IP address meta data on the home page, if you
| intend to use our website frequently, I recommend utilizing
| the IP data pages.
|
| You can enter IP addresses on the right side to look up
| information here: https://ipinfo.io/what-is-my-ip
|
| Additionally, we offer some enjoyable tools that you can use
| here: https://ipinfo.io/tools
|
| The CLI tool is particularly entertaining.
|
| You can also use our API service without signing up, with a
| limit of 1000 requests per day.
|
| If you do choose to sign up for a free account, you will
| receive 50,000 requests per month, free IP databases, a bulk
| lookup feature, and more.
| kam wrote:
| This is literally the most prominent thing on the
| https://ipinfo.io home page.
| qingcharles wrote:
| Huh, that's cool. It got my home IP about 15 miles from
| where I am, but still not bad.
|
| Wait - how does this work for cell IPs? A lot of cellphone
| v4 IPs are now shared between hundreds or thousands of
| devices, right?
| reincoder wrote:
| I work there, and I am supposed to know these things, but
| I don't exactly :/
|
| It probably has something to do with important routers.
| What tags do we show when you visit the IP data page? The
| IP data page can be accessed by visiting
| ipinfo.io/<IP_address>.
|
| We use the generic term "data experts," but it actually
| consists of about 2 dozen engineers, including data
| engineers, data scientists, infrastructure engineers,
| backend engineers, and a great technical CEO working on
| all that. All those folks have gone on a boating trip off
| the coast of Spain for a retreat.....except for me.
|
| I will ask them and try to circle back with some answers.
| Daviey wrote:
| That's embarrassing for me... I thought that was a static
| image of an example. And I did look through the site
| looking for a search. Oops.
| theogravity wrote:
| How does that work with edge servers that use anycast to assume
| the same IP across different regions?
| SnorkelTan wrote:
| Aren't any cast addresses a specific subset of ips and thus
| knowable? Iirc, each autonomous system is allocated anycast
| ip space?
| TheClassic wrote:
| Your comment is extremely interesting and what I was hoping to
| learn from the article (without an existing source of
| information, how do we determine the location of an IP
| address). Thank you!
| reincoder wrote:
| I really appreciate. Thank you. We are very transparent about
| our process. If you have any questions, you can always reach
| out to us.
|
| We have a simplified explanation of our probe network here:
| https://ipinfo.io/blog/probe-network-how-we-make-sure-our-
| da...
|
| The only update is the number of servers is like 600+ now.
| The probe network is growing extremely rapidly.
|
| Our IP geolocation process is quite complicated, and we have
| a team of data engineers, infrastructure engineers, and data
| scientists working on various aspects of it. Therefore, our
| approach is users can ask us questions, and we will try our
| best to answer them.
| freedomben wrote:
| Just wanted to let you know, it's this transparency that
| turned me into a customer!
|
| I love your company and service, but I hate your pricing. I
| work with a lot of small clients/apps that paying for usage
| would be a no-brainer, but the defined monthly price
| buckets don't make any economical sense at their scale. If
| you added a "pay as you go" tier that a small app could
| reasonably start by using dollars worth of API calls per
| month and grow from there, I'd be spreading your seed all
| over the place. I'm not saying this to rag on you, just
| trying to provide some constructive feedback as a thank you
| for your info sharing!
| reincoder wrote:
| Thank you very much; I really appreciate your feedback.
| This is not the first time I have heard this. The
| solution is to try to take as much advantage as you can
| from the free tier.
|
| # Check out the free IP databases
|
| https://ipinfo.io/products/free-ip-database
|
| The free databases come with commercial usage permission,
| and because they are databases, you can make unlimited
| lookups from them. The databases provide full accuracy
| and are updated daily. They are just a subset of our IP
| geolocation database that only provides IP to Country
| information.
|
| # Complement the database with the API service
|
| If you only want city-level information, switch to the
| API service. Use the database to look up IP-to-country
| information as many times as you want. However, use the
| API service only when necessary.
|
| Additionally, if you include a credit link to us, we will
| double your API limit to 100k/month. Visit
| https://ipinfo.io/contact/creditlink.
|
| # Cache data
|
| All of our API libraries have native caching support. We
| strongly recommend that users reduce their number of
| requests by caching the response. I highly recommend you
| check out our libraries: https://github.com/ipinfo
|
| ---
|
| The only challenge with the free IP databases is that you
| need to host the database somewhere to lookup the IP to
| Country information. Having an API service with nearly
| unlimited lookups for IP to Country information will be
| fantastic.
|
| If you know someone who has an IP to Country as API
| service please, let me know. We only require an
| attribution for using our database. If you have a similar
| service that is popular but don't want to maintain it let
| us know as well, we can takeover the site and host it
| ourselves with the IP to Country data.
| freedomben wrote:
| Thank you, that's super useful info. I didn't realize you
| had an Erlang library! I'm definitely going to be putting
| that to use :-)
| sambazi wrote:
| [flagged]
| detourdog wrote:
| I just noticed that my wifes iphone uses the same mycingular ip
| address while driving accross 3 states over 5 hours.l while
| checking mail.
| inemesitaffia wrote:
| There's several options/techniques for doing it. But just
| imagine you have a permanent zero overhead VPN.
|
| I don't know if that provider terminates long running calls,
| but the calls would stay up too regardless of tower.
| detourdog wrote:
| Yes, I'm sure it is iOS anti-tracking and directly related
| to why firewall apps inside SIP my not know what is going
| on.
| Vendan wrote:
| More likely to be just standard Mobile IP
| https://en.wikipedia.org/wiki/Mobile_IP. Fairly standard
| stuff, can cause some false positives around traveling
| (I've seen people get freaked out about stuff like "This
| person just logged in from their home state and then less
| then an hour later logged in from France!" when it was
| just mobile IP treating their phone as still in the US
| while they were in France on a trip, but their laptop
| connected over normal internet was seen as coming from
| France)
| detourdog wrote:
| this was a consistent ip address nothing to do with
| location and nobody was freaked out.
| matsur wrote:
| ICMP response time not useful for "locating" an anycasted
| address, some of which have logical location associated with
| them. See https://blog.cloudflare.com/icloud-private-relay/ for
| an example
| cuu508 wrote:
| Well, at least you can detect it is an anycast address, and
| mark it as such.
| EwanToo wrote:
| Have you considered making your database available for download
| as Parquet format so people could just copy the file to S3,
| Google Cloud, etc, and query it immediately with various tools?
|
| I know it can be done with CSV but it's not as smooth.
| chaps wrote:
| Not gonna lie, this creeps the heck out of me.
| fragmede wrote:
| Your IP address is LEAKING!
| reincoder wrote:
| Thousands of people live in a zip code, while hundreds and
| thousands of people live in a city. We are literally giving
| away that data for free through our API and database. The
| creepiness of IP geolocation is mostly a meme.
|
| IP geolocation is mainly used in cybersecurity and marketing
| analytics. There are many ways to geolocate someone. I once
| came across a project that could estimate the country a user
| is from based on their writing style and grammar mistakes.
| For example, American people sometimes use "should of"
| instead of "should have". Knowing the geolocation of an IP
| address isn't super creepy. It's just how things work on the
| internet.
| chaps wrote:
| And you're literally advertising this project as being
| helpful for targeted ads. So it's pretty clear from the get
| go that what you consider creepy isn't what I consider
| creepy. And having done enough reidentification work to
| scare myself, "thousands of people" might as well be a
| couple dozen or less. I get why you're defensive and why
| you think it's not creepy, but calling it a "meme" is
| insultingingly dismissive.
|
| Just because it's "how things work on the internet" doesn't
| make its mass collection right. Under the same logic, any
| side channel attack is just "how it works", and its abuse
| warrants no ethical question.
| reincoder wrote:
| I grok and understand your concern. I am not being
| defensive; I am just trying to provide an explanation. I
| really enjoy having conversations like this with
| developers as honestly and empathetically possible.
|
| I apologize if I was rude in any way by saying the word
| "meme". I saw a sister comment and thought you were being
| sarcastic. There is a popular meme about "I have your IP
| address", so I thought you were referencing that. I have
| had conversations with many young people who were
| concerned about their IP address being leaked through a
| game server. Therefore, I try to use humor to alleviate
| their stress. However, I now realize that this situation
| was different, and I am sorry for not understanding that.
|
| We provide a service that helps users keep their
| internet-connected services secure by providing IP
| metadata information. Are you being attacked by malicious
| actors? Use our free IP database to identify the location
| and ASN to block them. Do you want to restrict access to
| your service to certain regions? Do that for free with
| our services.
|
| We have the most accurate data available, and yet we
| offer the most generous free tier. We provide a full
| accuracy IP database for free, without any range
| aggregation, and with daily updates and a commercially
| permissible license. We have built a community forum
| solely dedicated to answering users' questions. We invest
| in website tools and open-source tools, all with the goal
| of helping users maintain the security and functionality
| of their services.
|
| We do have premium tier services, but if you use our free
| data as a foundation, you can always replicate those
| premium features to a reliable degree.
|
| Our IP metadata information is being used in marketing
| and sales intelligence. It is the same data that you use
| to protect your internet connected devices, used by our
| customers to sell you something.
|
| IP metadata information that we provide is a cornerstone
| of keeping the internet safe and accessible for everyone.
| That is how things just are. The deepweb is immune to IP
| meta data information, and that is why it is such a messy
| and chaotic place.
|
| That is just truth of the internet. We are essential and
| we prefer to be open about our process and listen to our
| stakeholders (users + customers + non-users).
| chaps wrote:
| Thank you for the well thought out response. I disagree
| with just about everything you say, but I understand
| where you're coming from and I appreciate the validation
| that the use of a VPN is more important than it's ever
| been. As a professional courtesy: calling yourself
| "essential" is an enormous red flag and you might want to
| consider different phrasing.
| reincoder wrote:
| I should have used a different phrasing. :) I was reading
| an article about essential workers today, and that word
| popped up in my head when I wrote the comment.
|
| It's good that you are using a VPN. I advocate for the
| usage of VPNs, and many VPN companies actually use our
| data to verify their server locations. In the VPN
| industry, VPN companies get their VPN servers from
| specialized hosting services that cater to dozens of VPN
| companies. You can check out the ASNs of the VPN IP
| addresses to find them.
|
| - https://ipinfo.io/AS136787
|
| - https://ipinfo.io/AS16247
|
| VPN companies use our IP geolocation data to confirm the
| actual location of their servers. Let me tell you a fun
| story. One VPN company claimed to have a server in the
| Bahamas, but upon investigation, we discovered that the
| server was actually located in New York. It was a
| surprising find. Getting a server in the Bahamas is more
| challenging than getting one in NY. Just imagine users
| thinking their internet activity is immune to US
| jurisdiction because they are using a VPN service based
| in Bahamas but in fact it is actually located in NY. So,
| we might not be essential, but we are certainly very
| useful!
|
| Thank you for the great conversation, dude. Appreciate
| it.
| wpietri wrote:
| For sure. When people work in any industry long enough,
| it's easy to stop thinking about the basics. E.g., a
| retail butcher thinks of his work very differently than a
| cow or a vegan does.
|
| When people work in advertising, they mostly forget that
| the core of their business is for-profit manipulation of
| people with little or no regard for truth or the people
| concerned. But I personally think that's kinda creepy,
| and only getting more so as it goes from broad
| manipulation of millions via mass media down to
| thousands, hundreds, or single individuals.
| goodpoint wrote:
| Together with the tons of data leaked by browsers it makes it
| very easy to track people across places and devices.
| giantrobot wrote:
| You might want to unplug your router then. A conceit of being
| connected to a network is you're connected to the network. If
| you can see other nodes they can see you.
| welder wrote:
| Great comment. I'm a big fan and customer of IPinfo, using your
| API in our login notification emails to say "You just logged in
| from Berlin, Germany. If this wasn't you click here." To
| provide country data for customers in their audit logs. And for
| anti-spam and fraud detection.
| chankstein38 wrote:
| That's pretty neat! You're basically using ping triangulation!
| sib wrote:
| Trilateration (same technique as used for mobile network
| location - in addition to the GPS on the phone)
| incolumitas wrote:
| Big fan of what articles? On https://incolumitas.com/ or on
| https://ipapi.is/?
|
| Great idea with latency triangulation, I used latency
| information for a lot of things, especially VPN and Proxy
| detection.
|
| But I didn't assume you can obtain that accurate location. I am
| honestly impressed. But latency triangulation with 600 servers
| gives some very good approximation. Nice man!
|
| Some questions:
|
| - ICMP traffic is penalised/degraded by some ISP's. How do you
| deal with that?
|
| - In order to geolocate every IPv4 address, you need to
| constantly ping billions of IPv4's, how do you do that? You
| only ping an arbitrary IP of each allocated inetnum/NetRange?
|
| - Most IP addresses do not respond to ICMP packets. Only some
| servers do. How do you deal with that? Do you find the router
| in front of the target IP and you geolocate the closest router
| to the target IP (traceroute)?
| carlhjerpe wrote:
| You can guess pretty well how IP's are related by BGP
| announcements, so as long as a few per block and if small,
| ASN. You can use that logic.
| withinboredom wrote:
| I'm very curious why you'd do VPN/proxy detection...
|
| But at a previous company I worked at that ran a very large
| chunk of the internet, we did indexing of nearly the entire
| internet (even large portions of the dark web) approximately
| every two weeks. There were about 500 servers doing that non-
| stop. So, I think it is relatively reasonable if you have 600
| servers to do that.
| meroje wrote:
| In the business of media streaming, rightholder will
| require that you check for vpn and proxies in addition to
| countries when deciding if a given viewer will be able to
| stream a given media.
| withinboredom wrote:
| Does that actually work? That could explain an issue with
| a particular streaming service I use. There are currently
| some ongoing routing issues in BGP land and my ISP. When
| trying to stream, it says I'm using a proxy, so due to
| the incredible route my packets are taking, that might be
| it. What's funny is that the only way to watch this
| service is to use a vpn right now.
| vGPU wrote:
| They probably just keep a list of known VPN server IP's.
| sitzkrieg wrote:
| of course it doesnt work but they gotta try clutching
| pearls and applying whatever pressure they can think of
| on these fronts
| wpietri wrote:
| Why is this getting downvoted? It seems to me that a lot
| of the media-focused anti-piracy tooling is essentially a
| performance of toughness to make rightsholder execs
| comfortable. Everybody accepts you can't stop piracy
| entirely, and nobody's willing to say, "Fuck it, we'll
| compete on convenience and strong consumer
| relationships," so we all put up with this weird middle
| ground of performative DRM and the like. With only the
| rare occasional bit of honesty, as from Weird Al:
| https://sfba.social/@williampietri/110906012997848549
| at_a_remove wrote:
| This is correct. Imagine in the days of yore, some two
| decades and change ago, when I was charged with
| implementing putting some music reserves "online" for
| streaming ...
|
| [Harp music, progressive diagonal wave distortions
| through the viewport ...]
|
| We had _two_ layers of passwords (one to get to the
| webpage for the class, one when actually streaming via
| the client, which was RealPlayer) as well as an IP range
| restriction to campus (you live off campus? So sorry)
| because our lawyers were worried about what the RIAA 's
| lawyers would find sufficient in the wake of a bunch of
| Napster-baited lawsuits launched at universities. The
| material itself was largely limited to snippets.
|
| I wanted to say, "Calm down, have a martini or something.
| College students are just not going to go wild to
| download 128 kbps segments of old classical music," but
| alas I was not in charge.
| reincoder wrote:
| https://incolumitas.com/
|
| This is my all-time favorite article:
| https://incolumitas.com/2021/11/03/so-you-want-to-scrape-
| lik...
|
| I used to do freelance web scraping, and that article felt
| like some kind of forbidden knowledge. After reading the
| article, I went down the rabbit hole and actually found a
| Discord server that provided carrier-grade traffic relay from
| a van which contained dozens of phones.
|
| For the questions..... we have to kinda wait a bit, someone
| from our engineering team might come here and reply.
|
| By the way, as I have you here have you considered converting
| the CSV files to MMDB format? I was planning to do that with
| our mmdbctl tool later today.
|
| https://github.com/ipinfo/mmdbctl
| sambazi wrote:
| > I used to do freelance web scraping
|
| "don't sell warez"
| voltagex_ wrote:
| Can your probes be identified and blocked?
| kube-system wrote:
| iptables -A INPUT -p icmp -j DROP
| chaps wrote:
| This isn't helpful. The comment was specifically asking
| about the probes, not ICMP traffic.
| kube-system wrote:
| Anybody can do this same thing, if you're worried about
| this, you probably don't want inbound ICMP.
| chaps wrote:
| Cool. Thanks. But let's say I do.
| kube-system wrote:
| Then there's nothing you can do. If you respond to pings,
| then others can take note of the responses you send.
| chaps wrote:
| You're missing the point that the question is effectively
| asking for a list of hosts that they can block.
|
| Edit: they provided a method:
| https://news.ycombinator.com/item?id=37510063
| kube-system wrote:
| I understand that was the initial question. I am saying
| that is a fools errand. Anyone with a few VPSes, a
| calculator, and a map can do this. It isn't just
| ipinfo.io doing this. There are a lot of ip geolocation
| services.
| j16sdiz wrote:
| This breaks PMTU and is the source of many mystery download
| stalls
| eptyc1 wrote:
| Indeed. Openwrt for some reason defaults to reply to pings.
| I see the value of ICMP for servers, but I don't see the
| value for home ISP routers.
|
| I disabled ICMP reply on my home router.
| sambazi wrote:
| > Openwrt for some reason defaults to reply to pings.
|
| it's a bit like greeting-back ppl on the street.
|
| not doing it will not make you invisible. it will break
| somebody's assumption of decency, but most ppl don't care
| either way.
| voltagex_ wrote:
| http://shouldiblockicmp.com/
|
| (But the guy running the probes is making a good counter
| argument)
| reincoder wrote:
| It is just ping data. We ping an IP address, get the RTT,
| draw a radius on the globe, and say that the IP could be
| anywhere inside that radius. Then we do another ping and draw
| another radius, and at the cross-section of the two radii
| could be your IP address. Now, if we do it enough times, we
| can get an estimate of where the IP address is located.
|
| The data is not derived from the IP address itself, but
| rather from the process itself. And it's just a ping.
| Moreover, the majority of the IP addresses are not pingable.
| So, we rely on other in house statistical and scientific
| models to estimate the location. The probe infrastructure is
| extremely complicated and there are billions and billions of
| IP addresses, which is why we do not have a robust range
| filter mechanism.
|
| You can implement a dynamic ping blocking mechanism or use
| our data to find hosting ASNs and block ranges of those ASNs.
| You can download the database for free:
| https://ipinfo.io/developers/ip-to-country-asn-database
| spacedcowboy wrote:
| So, at the risk of outing myself, I wrote http://www.hostip.info
| a long time ago* which used a community approach to get ip
| address location ("is this guess wrong ? Fix it please").
|
| The last time I checked (maybe a decade ago [grin]) it worked
| pretty much perfectly for a country, imperfectly for a region,
| and better-than-a-coin-toss for city resolution. All the data is
| free.
|
| I don't think they have it on the site any more, but I used to
| have a rotating 3D-cube thing (x,y,z were the first 3 octets of
| the address) for things like known-addresses, recent lookups,
| etc. I used different colours for different groups (country,
| continent,...) It was so old it was written as a Java applet.
| Yeah. I guess if I were to do it again, it'd be WebGL.
|
| --
|
| *: I sold it a long time ago, with the proviso that the data must
| always remain free. I actually didn't believe the offer at first
| (it came as an email, and looked like a scam) but it went through
| escrow.com just fine, and I think we both walked away happy. That
| was almost 2 decades ago now though.
___________________________________________________________________
(page generated 2023-09-14 23:00 UTC)