[HN Gopher] The Sisyphean Task of DNS Client Config on Linux
       ___________________________________________________________________
        
       The Sisyphean Task of DNS Client Config on Linux
        
       Author : smitop
       Score  : 100 points
       Date   : 2021-04-15 15:01 UTC (8 hours ago)
        
 (HTM) web link (tailscale.com)
 (TXT) w3m dump (tailscale.com)
        
       | jiqiren wrote:
       | If you are just trying to use curl or some other straightforward
       | small application - this is enough. But running Kubernetes,
       | docker, or some other container system? This mess is much more
       | deep...
       | 
       | For example: there are known issues with systemd-resolved &
       | Kubernetes (at least on Ubuntu that defaults to systemd-resolved)
       | https://kubernetes.io/docs/tasks/administer-cluster/dns-debu...
        
         | dijit wrote:
         | It also messes up docker-compose, probably in the same way.
         | 
         | I have to disable systemd-resolved and manually configure
         | resolv.conf to get my containers to resolve each other
         | properly. (Also Ubuntu)
        
         | denysvitali wrote:
         | FYI, if you don't bother opening the link, systemd-resolv sets
         | the /etc/resolv.conf to have only 127.0.0.1 as a DNS server.
         | You can see how things get bad when the CoreDNS pod tries to
         | get the "upstream" DNS servers from the host /etc/resolv.conf
        
           | kelnos wrote:
           | I guess it shouldn't be doing that? I mean, that also breaks
           | in the case where you're running something like dnsmasq as
           | your local resolver, which will also require 127.0.0.1 in
           | your resolv.conf.
           | 
           | As much as I'm not always comfortable with the larger
           | complexity, the CoreDNS pod really needs to be doing
           | something like the flowchart described in the article. If
           | systemd-resolved or NetworkManager are in play, it should be
           | using D-Bus to talk to them to get the information it needs.
           | 
           | resolv.conf is an old idea that is too inflexible for today's
           | DNS needs; a more complex solution with more complex
           | interface is unfortunately warranted here.
        
             | vetinari wrote:
             | > I mean, that also breaks in the case where you're running
             | something like dnsmasq as your local resolver, which will
             | also require 127.0.0.1 in your resolv.conf.
             | 
             | Dnsmasq DNS support is redundant when you are running
             | systemd-resolved. It can do everything dnsmasq does and
             | more.
             | 
             | > resolv.conf is an old idea that is too inflexible for
             | today's DNS needs; a more complex solution with more
             | complex interface is unfortunately warranted here.
             | 
             | That's exactly what systemd-resolved is.
        
       | venamresm__ wrote:
       | I also tried once to make sense of domain name resolution:
       | https://venam.nixers.net/blog/unix/2020/11/01/resolving-a-ho...
        
       | Koffiepoeder wrote:
       | To be honest I hate all of the aforementioned programs trying to
       | battle over /etc/resvolv.conf. That's why by default I have the
       | file marked as immutable, and pointing to 127.0.0.1. This way I
       | cannot have accidental DNS traffic leaking. If I need local
       | network DNS I send a manual dhcp command to get the wifi's DNS
       | and add it temporarily to my dnscrypt config. Similar thing for
       | VPN DNS.
        
         | mindslight wrote:
         | I do something similar in that I just point DNS traffic at the
         | gateway itself and configure it along with the rest of the WAN
         | horizon on the router (eg direct goes to local recursor,
         | wireguard tunnel goes to the recursor on the other side of the
         | tunnel or suitable public resolver, commercial VPN gets DNATted
         | to whatever their recommended resolver is, pihole eventually).
         | Running multiple apps / security contexts / nyms (or whatever
         | else you want to call them) on the same OS instance is just
         | asking for trouble.
        
       | corndoge wrote:
       | I recommend the series "Anatomy of a Linux DNS lookup"[0].
       | 
       | Everything you never wanted to know.
       | 
       | https://zwischenzugs.com/2018/06/08/anatomy-of-a-linux-dns-l...
        
         | tyingq wrote:
         | That appears to have been written prior to systemd-resolved
         | becoming enabled by default on many distros.
        
       | tptacek wrote:
       | Tailscale is awesome, you should use it for everything.
       | 
       | This Linux DNS stuff drives us batty at Fly.io. We run user
       | containers as Firecracker VMs; users belong to "organizations",
       | and organizations share a private IPv6 network. We do DNS for
       | that private network under the fake "internal" TLD, so if you
       | have an app "phoenix-frontend" and another app "rabbitmq-
       | cluster", they can see each other at "phoenix-frontend.internal"
       | and "rabbitmq-cluster.internal".
       | 
       | What you'd want in a perfect work is an option in
       | `/etc/resolv.conf` that sends `.internal` to a special
       | nameserver, and everything else to a normal nameserver. But as
       | far as I can tell, there's no way to take a bare Linux VM that
       | can accept an arbitrary container and set that capability up.
       | 
       | So instead, we end up (by default; you could override) serving
       | _all_ customer DNS, and our `.internal` server has to forward
       | recursive queries to things that aren't `.internal` somewhere
       | else. This sucks; we shouldn't have to be inline for arbitrary
       | customer DNS.
       | 
       | If there's a clean way to resolve this, so that the VM itself can
       | just send `.internal` queries to us, and everything else to
       | `1.1.1.1` or `8.8.8.8` or whatever the customer's container had,
       | I would _love_ to hear it. I've come pretty close to breaking out
       | preload in anger over this problem.
        
         | js2 wrote:
         | On macOS, you could do this trivially by creating
         | /etc/resolver/internal with the name servers in it.
         | 
         | Linux should copy that.
        
           | vetinari wrote:
           | Systemd-resolved does much more than MacOS resolver. It can
           | make the additional resolvers active conditional on whether
           | the link is up, for example (think VPN). MacOS resolver can't
           | do that.
        
         | ptomato wrote:
         | ... I realize I'm a monster, but aren't y'all already
         | extensively using BPF for things? Sticking in a BPF egress
         | filter on the VM that rewrites outbound DNS packets with
         | .internal in the question to point at your DNS server seems
         | like it would be lighter weight than just handling all queries
         | recursively.
        
           | YarickR2 wrote:
           | Why do you need kernel-mode parser for that ? iptables -t nat
           | -I POSTROUTING -p udp --dport 53 -j DNAT --to <your internal
           | recursor, sending .internal to authoritative dns server for
           | that zone, and resolving globally available hostnames by
           | itself>
        
             | ptomato wrote:
             | Well, because he doesn't want to be inline for
             | non-.internal DNS queries.
        
         | dave_universetf wrote:
         | This is also what we're trying to do in Tailscale (to grab your
         | MagicDNS domain, and whatever corporate split DNS you set, but
         | not anything else). And yeah, on linux, basically systemd-
         | resolved is the only thing that gets this right (with dynamic
         | config - with fully static config, other recursors are an
         | option, with varying tradeoffs), everything else assumes "one
         | resolver should be enough for everything".
         | 
         | So, your solution to proxy all the traffic and split out as
         | needed is the right way to do it without adding more software
         | into each VM :(
         | 
         | Doing stuff via LD_PRELOAD would be hilarious, you should
         | definitely do that and report back _ducks behind the blast
         | shield_
        
         | dijit wrote:
         | I used to have a very similar problem (didn't want my company
         | vpn to handle all dns traffic, only what was directed at the
         | company), I managed to solve it with a local dnsmasq on my
         | machine.
         | 
         | Unlikely to fit your usecase. But it's not impossible for us
         | plebs.
        
         | geofft wrote:
         | If you're willing to assume glibc, a reasonably clean approach
         | is to write your own nss_fly module that returns results for
         | .internal.
         | 
         | The downside is that Go code will drop to cgo for all
         | resolutions (because it will see something it can't handle in
         | pure Go in resolv.conf) and non-glibc code (like Alpine
         | containers using musl, or non-libc resolvers like ares) won't
         | get anywhere at all.
         | 
         | But you're kind of doomed in that latter case, anyway, because
         | since there _currently_ isn 't a standard for how resolv.conf
         | should express the rules you want, the effort to get it adopted
         | by glibc, musl, Go, Chrome, ares, and all the other various
         | interpreters of resolv.conf will take way too long to make a
         | practical impact on a startup's product.
        
         | jeffbee wrote:
         | Feels like the actual clean way to resolve this is to not use
         | DNS for discovery of named services.
        
         | 4rtyui9xd wrote:
         | I can think of a variety of ways. Some involve running some
         | code on the VM. For example, dnscache or dqcache would work.
         | This can also be complished using tinydns. Tiny programs that
         | use very little memory. But it sounds like you are trying to
         | avoid asking the user to install anything other than flyctl on
         | the "bare Linux VM". What is not clear is what programs are
         | installed by default in the "bare Linux VM". The other question
         | is how many ".internal" domains the VM will need to resolve.
         | The simplest solution that comes to mind is when the user
         | provisions an IP, flyctl writes the ".internal" domain(s) for
         | that IP to /etc/hosts. /etc/resolv.conf can then point to
         | whatever the user prefers. IMO, users should be running their
         | own DNS servers, not setting /etc/resolv.conf to point to third
         | party DNS addresses like 1.1.1.1 or 8.8.8.8 or whatever. In the
         | case they are running their own DNS server on the VM, then it
         | becomes trivial segregate internal from external domains. The
         | configuration for dnscache is so easy that it could be done by
         | the flyctl program, requiring no user interaction. (Something
         | like tinydns-config.) One could even have configurations for a
         | variety of DNS servers in flyctl, in case the user prefers
         | unbound, etc.
        
         | mwcampbell wrote:
         | > This sucks; we shouldn't have to be inline for arbitrary
         | customer DNS.
         | 
         | Why not? Someone has to do it, and if I were paying Fly for
         | service, I'd rather have Fly handle my DNS queries than be
         | another freeloader on Google or Cloudflare.
        
         | mjevans wrote:
         | Couldn't you run dnsmasq, unbound, or some other configurable
         | resolver on the localhost? Though that's probably already
         | what's done with the internal smart resolver.
        
           | YarickR2 wrote:
           | This is proper advice. My recursor of choice is pdns-recursor
           | , but bind in forward-only mode with several forwarders for
           | different zones will work too
        
         | friseurtermin wrote:
         | Funnily enough, that is the _exact_ same problem I'm facing
         | right now, down to Firecracker and a custom internal TLD. I'm
         | excited to see the solutions. I think the only difference is
         | that I need to run this DNS service on the same host as my VMs,
         | so I will need to use a different port than systemd-resolve.
        
           | YarickR2 wrote:
           | I'd recommend to run it on the same port, but different IP
           | (127.0.1.2 , f.e.) , due to inability of some programs to use
           | non-standard port
        
         | dnr wrote:
         | systemd-resolved can do this, though I haven't played with it
         | too much yet.
         | 
         | Here's a random post I just found:
         | https://gist.github.com/brasey/fa2277a6d7242cdf4e4b7c720d42b...
         | 
         | (Ha, I did the thing where I read the comments before the post,
         | and the post describes how to do this. So what's still
         | missing?)
        
           | tptacek wrote:
           | I'm not sure how reasonable it is for us to run systemd-
           | resolved (or systemd itself) on our VMs. We're trying to
           | provide a clean environment for any random container to run
           | on; we provide our own init, and that's almost the whole of
           | it.
           | 
           | What, I think, you really care about here is glibc's
           | behavior. But then, you can't depend on any one libc, either.
        
             | YarickR2 wrote:
             | It's unreasonable to bring the whole cow to just get a
             | gallon of milk . Use standalone resolver.
        
             | recuter wrote:
             | The cliffs of cloud native software defined networking
             | insanity await you:
             | 
             | https://github.com/containernetworking/cni
             | 
             | https://github.com/firecracker-microvm/firecracker-go-sdk
             | Firecracker, by design, only supports Linux tap devices.
             | The SDK provides facilities to:            Attach a pre-
             | created tap device, optionally with static IP
             | configuration, to the VM. This is referred to as a "static
             | network interface".       Create a tap device via CNI
             | plugins, which will then be attached to the VM
             | automatically by the SDK. This is referred to as a "CNI-
             | configured network interface"
        
         | mdlowman wrote:
         | It depends on what you're using for the resolver. I'm assuming
         | you only care about gethostbyname(3) and friends. With glibc
         | that means nss; generally you're also looking at libnss_dns.so,
         | which uses glibc's resolv (copied from BIND). This doesn't
         | include enough configuration to do what you suggest; it pretty
         | much just points everything towards a server.
         | 
         | So you have two options: use a different NSS module (maybe
         | write your own?) or have a proxy DNS resolver that sends
         | different requests to different places.
         | 
         | systemd-resolved actually handles the first option pretty well
         | (although it would prefer that you use the dbus interface over
         | gai). It can handle multiple interfaces with separate domains
         | and split DNS fairly well! (Not so good with reverse DNS,
         | unfortunately. But I get it, reverse DNS is pretty hacky
         | anyways.)
         | 
         | If you prefer the forwarder route, dnsmasq seems to be fairly
         | popular these days in the embedded world and elsewhere.
         | 
         | If I were you, I think I'd write a short NSS module or use
         | dnsmasq, depending upon your needs.
        
       | bonzini wrote:
       | The most interesting takeaway from this article is that,
       | according to people who actually do the work, NetworkManager and
       | systemd-resolved do get things right.
        
         | kevinoid wrote:
         | I agree! As a non-expert, I've also had good luck with
         | NetworkManager, but wouldn't recommend systemd-resolved yet if
         | you use DNSSEC:
         | 
         | https://github.com/systemd/systemd/issues/6490 (fixed in v248)
         | 
         | https://github.com/systemd/systemd/issues/8451
         | 
         | https://github.com/systemd/systemd/issues/9867
         | 
         | https://github.com/systemd/systemd/issues/12388
        
         | YarickR2 wrote:
         | Mostly right. Article never mentions /etc/hosts , which is
         | still largely a thing, and works wonders in difficult cases (
         | and makes other difficult cases much worse to debug)
        
           | vetinari wrote:
           | Because /etc/hosts is not really part of the chain handled by
           | resolver or nss-dns. It is handled by different nss module
           | which usually has higher priority.
        
         | liveoneggs wrote:
         | one reason resolv.conf is/was so sticky is that gethostbyname
         | (and friends) is a libc thing and pretty low-level
        
           | corty wrote:
           | Problem is, modern browsers do things differently yet again.
           | So the insanity continues, just on different levels.
        
             | liveoneggs wrote:
             | browsers are operating systems in user space. They will
             | continue to replace kernel parts.
        
               | vetinari wrote:
               | Technically, the existing dns resolving mechanism is a
               | part of glibc, so it is userspace already.
        
       | denysvitali wrote:
       | Ok, now let's talk about how messed up the same thing is in Mac
       | OS.
       | 
       | Hint: it is waaay worse
        
         | xrd wrote:
         | Dave, one of the authors of these posts, has been regularly
         | documenting on Twitter his struggles with DNS on all the
         | platforms.
         | 
         | If you like sports, it is like watching a basketball game, with
         | three teams and Dave as the play by play announcer.
         | 
         | I'm a linux user, so I'm always cheering when OSX or Windows
         | "loses" but then very quickly Dave will, paraphrasing say:
         | "And, OSX hits a deep 3 pointer..." ("And, OSX does this right,
         | and Linux sucks").
         | 
         | But, it is great theater and very informative. On Dave's
         | recommendation, I switched to systemd-resolved, and it is
         | working flawlessly for me (with my own wireguard setup, not
         | using Tailscale yet).
        
         | dave_universetf wrote:
         | It's a pity, because macOS got the general idea right, but
         | seemingly every single particular wrong thereafter.
         | 
         | In general, a modern DNS client wants: a set of "default route"
         | resolvers; a set of "DNS routes" that point certain suffixes to
         | other resolver configs; a set of search paths to expand single-
         | label queries; integration with mdns and LLMNR, for seamless
         | zero-config resolution on LANs (super important for printers,
         | in particular); all of the above tied to interface lifetimes,
         | so you can tie resolver reachability to underlying network
         | state; very detailed documentation on the algorithm used to
         | resolve a name, and how you traverse all the above
         | configuration.
         | 
         | macOS has default resolvers, DNS routes, mdns integration (but
         | no LLMNR), interface-tied configs, and knows about search
         | paths.
         | 
         | But then you look at the NetworkExtension API for configuring
         | DNS, and it turns out the search paths field doesn't actually
         | configure the search paths in ways you'd expect, instead all
         | the suffixes you install as "routes" end up also becoming
         | search paths, and your only option is have all or none of them
         | be search paths. Meanwhile, the search paths you specified do
         | get installed... In an interface-scoped config that doesn't
         | actually get used in the majority of name lookups that need
         | name expansion.
         | 
         | It's so frustrating because it's _this_ close to being
         | excellent, and instead ends up being the most limiting of APIs
         | we have to work with, because being apple, it 's either their
         | API or go screw yourself and don't configure DNS.
         | 
         | Oh dear, I've ranted again, haven't I. Anyway, every OS is its
         | own beautiful little snowflake of weirdery an brokenness.
         | Linux's particular flavor is "there's 15 ways to do it, most of
         | which require polyfills". macOS's flavor is "we have an API
         | that should be amazing but somehow does the wrong thing almost
         | always". Windows's flavor is "we can do really cool things but
         | the main source of documentation is people exchanging
         | superstitions about registry keys on stack overflow".
         | 
         | Given that choice, I think I prefer linux. It's way more code
         | to write to make it work, but at least the code can be derived
         | from documentation+source code, and has half a chance of
         | working as desired.
        
       | schmichael wrote:
       | Where does /etc/nsswitch.conf factor in? I don't see it
       | mentioned.
        
         | LukeShu wrote:
         | nsswitch.conf tells it whether to consult `/etc/hosts` (`hosts:
         | files`), `/etc/resolv.conf` (`hosts: dns`), or systemd-resolved
         | (`hosts: resolve`); and if multiple of those, in what order
         | (`hosts: first second third...`).
        
         | daenney wrote:
         | It's probably omitted as nsswitch doesn't affect DNS client
         | configuration.
         | 
         | It affects which sources are consulted to lookup a name
         | (including maybe asking DNS), but it doesn't configure things
         | like which DNS servers to ask or what options to set.
         | 
         | > The Name Service Switch (NSS) configuration file,
         | /etc/nsswitch.conf, is used by the GNU C Library and certain
         | other applications to determine the sources from which to
         | obtain name-service information in a range of categories, and
         | in what order.
        
           | LukeShu wrote:
           | nsswitch.conf is what tells it whether to use the traditional
           | nss_dns (the thing that looks at /etc/resolv.conf), or
           | whether to use the newer nss_resolve (the thing that talks to
           | systemd-resolved).
           | 
           | So yeah, if the article is contrasting resolv.conf vs
           | systemd-resolved, nsswitch.conf is how you select which of
           | those approaches is used.
        
             | daenney wrote:
             | The article focussed on how /etc/resolv.conf is being
             | managed and populated. That's not affected by nsswitch.
             | 
             | I only addressed why nsswitch was likely not mentioned, not
             | that it doesn't serve a purpose.
        
             | vetinari wrote:
             | Even with nss_resolve, you still have to have
             | /etc/resolv.conf correctly populated, because some apps
             | will ignore gethostbyname() and just parse resolv.conf and
             | do DNS by themselves. For example, golang stdlib does that
             | (but to the credit of golang runtime, it checks whether
             | nsswitch.conf is in expected state and falls back to glibc
             | if it is not).
             | 
             | Some apps go even further and do their DNS entirely on
             | their own (Firefox DoH controversy).
             | 
             | nsswitch however doesn't just configure mechanism for
             | public resolver, it is a hook where alternate mechanisms
             | like nss-mdns or nss-mymachines can hook up.
        
       | tgbugs wrote:
       | An enlightening article that explains why I think I am losing my
       | mind every time I try to get the network configured correctly on
       | any number of distros. My main take home is to never install
       | NetworkManager or a resolvconf and to avoid systemd if at all
       | possible, unless I absolutely know that I am going to need those
       | state of the art DNS capabilities.
       | 
       | I have mostly managed to avoid resolv.conf issues since the
       | default Gentoo image has a sane setup (even if using dhcp on a
       | laptop and switching wifi and wired). However, every single time
       | I have tried to set up a system using another distro something
       | has gone wrong, and in support of the theory that NetworkManager
       | is a major contributor to the problem the one Gentoo system with
       | NetworkManager installed had the same issues.
       | 
       | To echo the plea in the original article, in nearly every case
       | the primary challenge has been to figure out exactly what
       | documentation actually applies to the system at hand because
       | distros seem to change this completely out of sync with any
       | attempt to correct or align the documentation.
       | 
       | I suspect that on an individual user level this leads to a happy
       | path situation where everyone who does the right thing by
       | accident is quiet and the ones who need something slightly
       | different are never heard from because they weren't able to even
       | connect to the internet (probably not quiet that bad).
        
         | vetinari wrote:
         | That's exactly the wrong take home; the only sane way to handle
         | dns is systemd-resolved, as the article said. If you use your
         | machine as anything resebling desktop/laptop, you should
         | configure your network using Network Manager, especially if you
         | have connections that come and go. Switching between Lan and
         | wifi with dhcp is really low bar.
         | 
         | Anything else is just prolonging the agony that the linux
         | networking configuration had to endure for years.
        
       | bkus wrote:
       | But wait, there's more! nsswitch.conf and nscd.conf also affect
       | the stub resolver's behavior.
        
       ___________________________________________________________________
       (page generated 2021-04-15 23:01 UTC)