https://www.netmeister.org/blog/whois.html
Signs of Triviality
Opinions, mostly my own, on the importance of being and other things.
---------------------------------------------------------------------
[homepage] [blog] [jschauma@netmeister.org] [@jschauma] [RSS]
---------------------------------------------------------------------
WHOIS: Fragile, unparseable, obsolete... and universally relied upon
January 9th, 2022
The WHOIS protocol is one of the older internet protocols around.
It's infuriatingly simple, by and large considered obsolete, and the
data provided by it unpredictable, unreliable, incomplete, and, of
course, still one of the corner stones of internet operations. In
other words, it's the kind of thing I like to waste my time on trying
to understand.
Originally set up in the 1970s at the Stanford Research Institute
Network Information Center (aka SRI-NIC) by the mother of the DNS and
overall ARPANET boss Elizabeth J. Feinler, WHOIS was first described
in RFC812 (1982). Based on the FINGER protocol, it was as dead simple
as you could imagine:
Connect to the service host (SRI-NIC)
TCP: service port 43 decimal
NCP: ICP to socket 43 decimal, establishing two 8-bit connections
Send a single "command line", ending with .
Receive information in response to the command line.
Yep, that was it. And that's still the full protocol specification
(now RFC3912 (2004)). Here, give it a try:
$ telnet whois.iana.org 43
Trying 2620:0:2d0:200::59...
Connected to ianawhois.vip.icann.org.
Escape character is '^]'.
org
% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org
% This query returned 1 object
domain: ORG
[...]
Congratulations - you just spoke WHOIS!
The data you get back is intentionally not structured and is designed
to be human-, not machine-readable (more on that a bit below). It was
originally intended to provide contact information including "mailing
address, telephone number, and network mailbox" "for ARPANET users"
like so:
Command line: dyer
Response:
Dyer, David A. (DAD2) DDYER@USC-ISIB (213) 822-1511
Dyer, Fred S. (FSD) Dyer@RADC-MULTICS (315) 330-7275
Dyer, Mary K. (MARY) DYER@SRI-NIC (415) 859-4775
Dyer, William R. (WRD) WRDyer@RADC-MULTICS (315) 330-7791
Command line: mary
Response:
Dyer, Mary K. (MARY) DYER@SRI-NIC
SRI International
Network Information Center
Telecommunications Sciences Center
333 Ravenswood Avenue
Menlo Park, California 94025
Phone: (415) 859-4775
And you thought the DNS was the phonebook of the internet...
How to find the responsible WHOIS server
When the internet grew too large for SRI-NIC to continue functioning
as the global phonebook, and eventually with the transfer of the
operation of the DNS root to ICANN, WHOIS also became decentralized.
Information about the various (and increasing number of) TLDs was
provided logically by the Regional Internet Registries (RIRs),
registries, and registrars. Some of them run a so-called "thick"
server, which provides all the information; others are "thin"
servers, only providing the information of the WHOIS server that does
have the full information. Different TLDs, for example, may operate
in either mode, but the protocol does not provide any means to
differentiate the two. In other words: if you wanted to find out
information about a domain, you'd have to know who the responsible
registry is to ask them.
How do you know what WHOIS server to query for a given domain? Well,
you just gotta know. There's no standardized way. Some domains use
SRV DNS records as suggested in this internet draft:
$ host -t srv _nicname._tcp.co.uk
_nicname._tcp.co.uk has SRV record 0 0 43 whois.nic.uk.
$ host -t srv _nicname._tcp.arab
_nicname._tcp.arab has SRV record 10 10 0 your-dns-needs-immediate-attention.arab.
$ host -t srv _nicname._tcp.cpa
_nicname._tcp.cpa has SRV record 10 10 0 your-dns-needs-immediate-attention.cpa.
$ host -t srv _nicname._tcp.music
_nickname._tcp.music has SRV record 10 10 0 your-dns-needs-immediate-attention.music.
$ host -t srv _nicname._tcp.xn--fiqs8s
_nicname._tcp.Zhong Guo is an alias for wildcard.cnnic.cn.
$ host -t srv _nicname._tcp.xn--fiqz9s
_nicname._tcp.Zhong Guo is an alias for wildcard.cnnic.cn.
$ host -t srv _nicname._tcp.xn--mxtq1m
_nicname._tcp.Zheng Fu has SRV record 10 10 0 your-dns-needs-immediate-attention.Zheng Fu
$ host -t srv _nicname._tcp.xn--ngbrx
_nicname.tcp.`rb has SRV record 10 10 0 your-dns-needs-immediate-attention.`rb
$
...but that seems to function primarily as an indicator of a TLD
compromise: out of 1489 TLDs, only nic.uk has a valid entry. Instead,
some TLDs use .whois-servers.net, and the "new" TLDs after 2003
are supposed to have whois.nic.; ccTLDs pretty much all do their
own thing, why not. Hence, your whois(1) client likely contains some
optimistic logic and a number of hardcoded RIR WHOIS servers like
this:
#define ANICHOST "whois.arin.net"
#define BNICHOST "whois.registro.br"
#define CNICHOST "whois.corenic.net"
#define DNICHOST "whois.nic.mil"
#define FNICHOST "whois.afrinic.net"
#define GNICHOST "whois.nic.gov"
#define IANAHOST "whois.iana.org"
#define INICHOST "whois.networksolutions.com"
#define LNICHOST "whois.lacnic.net"
#define MNICHOST "whois.ra.net"
#define NICHOST "whois.crsnic.net"
#define PDBHOST "whois.peeringdb.com"
#define PNICHOST "whois.apnic.net"
#define QNICHOST_TAIL ".whois-servers.net"
#define RNICHOST "whois.ripe.net"
#define RUNICHOST "whois.ripn.net"
[...]
/*
* If no country is specified determine the top level domain from the query
* If the TLD is a number, query ARIN, otherwise, use TLD.whois-server.net.
* If the domain does not contain '.', check to see if it is an NSI handle
* (starts with '!') or a CORE handle (COCO-[0-9]+ or COHO-[0-9]+) or an
* ASN (starts with AS) or IPv6 address (contains ':'). Fall back to
* NICHOST for the non-handle and non-IPv6 case.
*/
Otherwise, if you don't know the WHOIS server to query, you can try
your luck asking IANA, which runs a "thick" server for all TLDs. It
should return to you the referral to the responsible WHOIS server,
which you can then ask for who might be responsible for the final
domain you care about:
$ echo netmeister.org | nc whois.iana.org 43 | grep refer
refer: whois.pir.org
$ echo netmeister.org | nc whois.pir.org 43 | grep refer
$ echo netmeister.org | nc whois.pir.org 43 | grep "Registrar WHOIS Server"
Registrar WHOIS Server: whois.gandi.net
$ echo netmeister.org | nc whois.gandi.net 43 | grep Creation
Creation Date: 2000-04-24T02:15:22Z
$
Notice something? When we ask IANA, we ask for "refer", but when we
ask PIR, we need to ask for "Registrar WHOIS Server". This is because
the WHOIS protocol does not specify the output format of the data,
nor what data should be provided. At all. It's all free form,
unstructured ASCII text -- if you're lucky, that is. (More on that
(again) a bit below.)
Data Privacy
But what data would you expect to be found in WHOIS? Since the early
days, ICANN has had a requirement for registries and registrars to
provide
unrestricted and public access to accurate and
complete WHOIS information, including registrant,
technical, billing, and administrative contact
information.
ICANN Policies
This includes the actual postal address, phone numbers, and email
addresses of the various contact persons or departments (see above re
"phonebook"). Which of course is routinely abused by all sorts of
people, including by scammers, phishers, and for general OSINT. On
the other hand, Law Enforcement really wants this information to be
readily available, and as a geek with at least half a dozen random
domains registered, you are likely familiar with the legal
requirement to keep this information up to date.
Quite obviously this poses a dilemma: the information is required by
ICANN to be openly provided, but for a variety of reasons and privacy
concerns, you don't want your phone number and address out there on
the internet. But more than just a cosmetic concern, the ICANN
requirement now does indeed conflict with modern privacy laws, such
as the EU's GDPR, meaning all domains registered by European
registries are in violation of either GDPR or ICANN's requirement.
Fun!
(ICANN promised not to take action against violators, and registries/
registrars nowadays provide redacted information to the public but
promise to provide detailed information upon "legitimate requests".)
Data Format
As I noted above, the data provided via WHOIS is completely
unstructured and undefined. It is intended for human consumption, and
the service operator is free to decide how to display the
information. Most WHOIS servers use a simple "key: value" format, but
that's far from universal. Similarly, different servers use different
methods to e.g., show that certain pieces of information logically
belong together.
For example, consider the information returned by the different WHOIS
servers involved in a simple lookup of this website:
$ whois netmeister.org
% IANA WHOIS server
% for more information on IANA, visit
% http://www.iana.org
% This query returned 1 object
refer: whois.pir.org
domain: ORG
organisation: Public Interest Registry (PIR)
address: 11911 Freedom Drive 10th Floor,
address: Suite 1000
address: Reston, VA 20190
address: United States
contact: administrative
name: Director of Operations, Compliance and Customer Support
organisation: Public Interest Registry (PIR)
address: 11911 Freedom Drive 10th Floor,
address: Suite 1000
address: Reston, VA 20190
address: United States
phone: +1 703 889 5778
fax-no: +1 703 889 5779
e-mail: ops@pir.org
[...]
# whois.pir.org
Domain Name: NETMEISTER.ORG
Registry Domain ID: D25516943-LROR
Registrar WHOIS Server: whois.gandi.net
Registrar URL: http://www.gandi.net
Updated Date: 2021-02-20T17:59:09Z
Creation Date: 2000-04-24T02:15:22Z
[...]
# whois.gandi.net
Domain Name: netmeister.org
[...]
Registry Registrant ID: REDACTED FOR PRIVACY
Registrant Name: REDACTED FOR PRIVACY
[...]
Registry Admin ID: REDACTED FOR PRIVACY
Admin Name: REDACTED FOR PRIVACY
[...]
Registry Tech ID: REDACTED FOR PRIVACY
Tech Name: REDACTED FOR PRIVACY
[...]
>>>Last update of WHOIS database: 2022-01-09T00:16:58Z <<<
Ok, so far, so good. Different grouping, but still, reasonably easy
to parse. Now compare this to the following other queries returning
results from various WHOIS servers:
$ whois stevens.edu $ whois nic.tg
# whois.educause.edu This is JWhoisServer serving ccTLD tg
Domain Name: STEVENS.EDU Java Whois Server 0.4.1.3 (c) 2006 - 2015 Klaus
Zerwes zero-sys.net
Registrant: All rights reserved.
Stevens Institute of Technology Copyright "NICTogo2 - http://www.nic.tg"
Castle Point on Hudson
Information Technology Domain:.............nic.tg
Hoboken, NJ 07030 Registrar:..........NETMASTER SARL
USA Activation:.........2021-11-11
Expiration:.........2030-06-26
Administrative Contact: Status:.............Activé
Domain Name Administration Contact Type:.......[PRIVEE]
Stevens Institute of Technology Last Name:..........[PRIVEE]
Information Technology First Name:.........[PRIVEE]
Castle Point on the Hudson Address:............[PRIVEE]
Hoboken, NJ 07030 Tel:................[PRIVEE]
USA Fax:................[PRIVEE]
+1.2012165457 e-mail:.............[PRIVEE]
webmaster@stevens.edu Name Server (DB):...ns1.nic.tg
[...] Name Server (DB):...ns2.nic.tg
$ whois norid.no $ whois jprs.jp
[...] Domain Information: [domeinQing Bao ]
Domain Information [Domain Name] JPRS.JP
NORID Handle...............: NIC311D-NORID [Deng Lu Zhe Ming ] Zhu Shi Hui She Ri Ben rezisutorisabisu
Domain Name................: nic.no [Registrant] Japan Registry Services Co.,Ltd.
Registrar Handle...........: REG1-NORID
Tech-c Handle..............: NH55R-NORID [Name Server] ns1.jprs.jp
Tech-c Handle..............: NS7R-NORID [Name Server] ns2.jprs.jp
DNSSEC.....................: Signed [Name Server] ns3.jprs.jp
[Name Server] ns4.jprs.jp
Additional information: [Signing Key] 59551 8 2 (
Created: 2004-02-25 F7700A9A545DD57075E545AFE2D823CB
Last updated: 2021-02-25 90A2C9A1305E1696C61F91BEA26FA137 )
Given how useful the information in WHOIS can be, it's no surprise
that there are many businesses offering proprietary services to
monetize the munging of the public information into a data format
that's easy to process in an automated fashion, such as in XML or
JSON. As you can tell from the above examples, it's fairly obvious
how the information belongs together for a human: Humans are really,
really good at identifying patterns visually, and you can all look at
the output and immediately see what data represents what information,
but trying to convince a computer to understand all these different
formats is a major PITA and exactly what these services build their
profit model on.
Paying for an online service to access public data is a bit annoying,
so I wrote a tool to JSONify WHOIS data: jswhois(1). This tool will
attempt to turn the unstructured, human-readable output above into
structured JSON as shown below:
$ jswhois stevens.edu | jq $ jswhois nic.tg | jq
{ {
"chain": [ "chain": [
"whois.iana.org", "whois.iana.org",
"whois.educause.edu" "whois.nic.tg"
], ],
"query": "stevens.edu", "query": "nic.tg",
"whois.educause.edu": { "whois.nic.tg": {
"Administrative Contact": [ "Activation": "2021-11-11",
"Domain Name Administration", "Address": "[PRIVEE]",
"Stevens Institute of Technology", "Domain": "nic.tg",
"Information Technology", "Expiration": "2030-06-26",
"Castle Point on the Hudson", "First Name": "[PRIVEE]",
"Hoboken, NJ 07030", "Last Name": "[PRIVEE]",
"USA", "Name Server (DB)": [
"+1.2012165457", "ns1.nic.tg",
"webmaster@stevens.edu" "ns2.nic.tg"
], ],
[...] [...]
$ jswhois norid.no | jq $ jswhois jprs.jp | jq
{ {
"chain": [ "chain": [
"whois.iana.org", "whois.iana.org",
"whois.norid.no" "whois.jprs.jp"
], ],
"query": "norid.no", "query": "jprs.jp",
"whois.norid.no": { "whois.jprs.jp": {
"Algorithm 1": "8", "Domain Information": {
"Created": "1999-11-15", "Domain Information": "[domeinQing Bao ]",
"DNSSEC": "Signed", "[Domain Name]": "JPRS.JP",
"DS Key Tag 1": "44384", "[Name Server]": [
"Digest 1": "ac8f61c8a538d1e6dbfd98fd86d788b0222994a8842ebabc0df159b354a09f8d", "ns1.jprs.jp",
"Digest Type 1": "2", "ns2.jprs.jp",
"Domain Name": "norid.no", "ns3.jprs.jp",
"Last updated": "2021-12-14", "ns4.jprs.jp"
"NORID Handle": "NOR18456D-NORID", ],
"Name Server Handle": [ "[Registrant]": "Japan Registry Services Co.,Ltd.",
"AUTH681H-NORID", "[Signing Key]": [
"AUTH682H-NORID", "59551 8 2 (",
"Y4H-NORID", "F7700A9A545DD57075E545AFE2D823CB",
"Z11H-NORID" "90A2C9A1305E1696C61F91BEA26FA137 )"
], ],
}
[...] [...]
This is tedious, sure, but what's even more annoying is that it still
is only of limited usefulness: aside from the lack of a data format,
there is also no standard specification of what data is to be
provided, and for the data that is required at least by ICANN, there
is no requirement or specification of how that data is to be named.
That is, if you want to use jswhois(1) to return to you the email
address of the administrative contact of the domain in question, then
you still have to know what the fields returned by the registrar's
WHOIS server are named. Commercial services may attempt to reformat
or rename fields so that you have consistent keys to extract, but
will that work for all domains? How many different WHOIS formats are
there?
Registrars and Registries
Looking at a subset of TLDs from my previous adventure, I found a
total of 1021 distinct WHOIS servers for 1489 TLDs. Here's the top
ten breakdown of which WHOIS servers are responsible for the most
number of TLDs:
244 whois.iana.org
67 whois.afilias-srs.net
46 whois.nic.google
24 whois.uniregistry.net
16 whois.registry.in
14 whois.nic.gmo
8 whois.gtld.knet.cn
7 whois.teleinfo.cn
6 whois.gtlds.nic.br
5 whois.publicinterestregistry.net
IANA, Afilias, and Uniregistry not surprisingly manage the largest
number of TLDs, and as you may remember from the new-TLD-landrush,
Google had applied for over 100 TLDs and today runs 46 TLDs. (The
largest number of TLDs registered by a single company goes to Donuts
Inc. with 248, but they run a separate WHOIS server for each of those
TLDs at whois.nic..)
But that's only TLDs. There are over 2500 registrars accredited by
ICANN, of which e.g., GoDaddy, currently the largest with over 72
million (!) domains, is just one. In theory, for each of the millions
of second-level domains, there might be a different WHOIS server
responsible, each with its own human-readable output format.
Data in WHOIS
The data found in WHOIS varies from registry to registry, not only in
structure (as shown above), but of course also in content. Some
include nameserver IP addresses, some don't. Some include DNSSEC
information, others don't. I even found an (expired) x509 cert in the
WHOIS data for 2001:dcd::/32.
If you search for IP addresses or CIDRs, you get back rather
different data than if you search for domain names. APNIC, RIPE, and
AFRINIC, for example, even give you some routing and geolocation
information:
$ jswhois 2001:dd8:9:2::101:61 | jq
{
"query": "2001:dd8:9:2::101:61",
"whois.apnic.net": {
"inet6num": {
"geoloc": "-27.473058 153.014208",
"inet6num": "2001:dd8:8::/45",
[...]
}
"route6": {
"country": "AU",
"descr": "APNIC Network",
"last-modified": "2018-11-20T03:36:54Z",
"mnt-by": "MAINT-APNIC-IS-AP",
"origin": "AS4608",
"route6": "2001:dd8:9::/48",
"source": "APNIC"
}
[...]
Given the loose specification, you can use the WHOIS protocol and
server for just about any data. Team Cymru, for example, lets you
look up AS numbers for the given IP addresses using WHOIS:
$ whois -h whois.cymru.com 2001:470:30:84:e276:63ff:fe72:3900
AS | IP | AS Name
2033 | 2001:470:30:84:e276:63ff:fe72:3900 | PANIX, US
And as you've no doubt noticed, some international WHOIS servers may
return data to you in non-ASCII charsets, such as e.g., whois.kr, or
whois.jprs.jp. How well do the various WHOIS API services handle what
effectively amounts to random data that may be returned? I wonder...
$ whois -h whois.netmeister.org log4j
___________________________________________________
< ${jndi:ldap://www.netmeister.org/blog/whois.html} >
---------------------------------------------------
\ ^___^
\ (ooo)\_______
(___)\ )\/\
||----w |
|| ||
Old and busted...
Since the data in WHOIS is unpredictable (who knows what data is
returned to you and what the format might be), unreliable (who knows
if the data you're looking for, if it is present at all, is up to
date), difficult to discover (bouncing from IANA along unpredictable,
unreliable referral entries or betting on a few hard-coded servers),
possibly available via different mechanisms (besides the standard TCP
port 43, several WHOIS servers provide an HTTP API endpint), and
often obscured or redacted (e.g., due to GDPR, but several WHOIS
servers also require registration before either TCP port 43 or API
access is granted)... why haven't we replaced it with Something
Better(tm)?
There were some attempts to overhaul WHOIS, like the "Referral Whois"
protocol (RWhois, RFC2167) or the now obsolete "WHOIS++", but it
seems like one of those things everybody depends on, so changing it
isn't going to be easy.
ICANN decided years ago to replace WHOIS with work dating back to
2012, and the "Registration Data Access Protocol (RDAP, RFC9082)
certainly seems like a much better alternative. RDAP is RESTful and
standardized based on an analysis by the IETF of the TLD WHOIS server
responses; since 2019, ICANN requires registrars and registries to
implement an RDAP service.
Fully replacing WHOIS does, however, not yet seem to be on the
horizon, and we're still relying on what started out as perhaps the
simplest possible protocol intended for human consumption. Sometimes
the internet moves really slowly, and all I can hope is that nobody
comes along and tries to put it on the blockchain...
January 9th, 2022
-------------------------------------------------------
Links:
* This blog post as a Twitter thread
* jswhois(1) -- a tool to turn WHOIS results into json
* TLDs -- Putting the '.fun' in the top of the DNS
* (All) DNS Resource Records
References:
* WHOIS Protocol Specification (RFC3912)
* ARIN whois command reference
* ICANN WHOIS information
* Team Cymru IP to ASN mapping using WHOIS
---------------------------------------------------------------------
Previous: [strlcat(3) > strncat(3)] -- Next: [Infosec Skill Sets]
---------------------------------------------------------------------
[homepage] [blog] [jschauma@netmeister.org] [@jschauma] [RSS]
---------------------------------------------------------------------