https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119 Log in / Register [CoF] Ubuntu systemd package * Overview * Code * Bugs * Blueprints * Translations * Answers Update to systemd 237-3ubuntu10.54 broke dns Bug #1988119 reported by Pieter 18 hours ago 212 This bug affects 34 people Affects Status Importance Assigned to Milestone systemd (Ubuntu) Confirmed Undecided Unassigned Bug Description Two servers today that updated systemd to "systemd 237-3ubuntu10.54" https://ubuntu.com/security/notices/USN-5583-1 could not resolve dns anymore. no dns servers, normally set through dhcp. Ubuntu 18.04 Temp fix. 1. Edit /etc/systemd/resolved.conf 1. Add/Uncomment # FallbackDNS=168.63.129.16 1. Restart systemd-resolved sudo systemctl restart systemd-resolved. service 1. Confirm dns working with systemd-resolve google.com Tags: regression-update Revision history for this message [close] Launchpad Janitor (janitor) wrote 17 hours ago: #1 Status changed to 'Confirmed' because the bug affects multiple users. [Status changed to 'C] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Changed in systemd (Ubuntu): status: New - Confirmed Revision history for this message [close] Pieter Lexis (pieter-lexis-tt) wrote 13 hours ago: #2 Download full text (15.3 KiB) We've just had the same problem, on multiple VMs running in Azure. In the dpkg log we can see that systemd was indeed updated (times in UTC): 2022-08-30 06:31:18 status unpacked udev:amd64 237-3ubuntu10.54 2022-08-30 06:31:18 status half-configured udev:amd64 237-3ubuntu10.54 2022-08-30 06:31:19 status installed udev:amd64 237-3ubuntu10.54 2022-08-30 06:31:19 status triggers-pending initramfs-tools:all 0.130ubuntu3.13 2022-08-30 06:31:19 trigproc man-db:amd64 2.8.3-2ubuntu0.1 2022-08-30 06:31:19 status half-configured man-db:amd64 2.8.3-2ubuntu0.1 2022-08-30 06:31:19 status installed man-db:amd64 2.8.3-2ubuntu0.1 2022-08-30 06:31:19 trigproc ureadahead:amd64 0.100.0-21 2022-08-30 06:31:19 status half-configured ureadahead:amd64 0.100.0-21 2022-08-30 06:31:20 status installed ureadahead:amd64 0.100.0-21 2022-08-30 06:31:20 trigproc libc-bin:amd64 2.27-3ubuntu1.5 2022-08-30 06:31:20 status half-configured libc-bin:amd64 2.27-3ubuntu1.5 2022-08-30 06:31:20 status installed libc-bin:amd64 2.27-3ubuntu1.5 2022-08-30 06:31:20 trigproc systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:20 status half-configured systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:20 status installed systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:20 trigproc initramfs-tools:all 0.130ubuntu3.13 2022-08-30 06:31:20 status half-configured initramfs-tools:all 0.130ubuntu3.13 2022-08-30 06:31:34 status installed initramfs-tools:all 0.130ubuntu3.13 2022-08-30 06:31:37 startup archives unpack 2022-08-30 06:31:38 upgrade libnss-systemd:amd64 237-3ubuntu10.53 237-3ubuntu10.54 2022-08-30 06:31:38 status triggers-pending libc-bin:amd64 2.27-3ubuntu1.5 2022-08-30 06:31:38 status half-configured libnss-systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:38 status unpacked libnss-systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:38 status half-installed libnss-systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:38 status triggers-pending man-db:amd64 2.8.3-2ubuntu0.1 2022-08-30 06:31:38 status half-installed libnss-systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:38 status unpacked libnss-systemd:amd64 237-3ubuntu10.54 2022-08-30 06:31:38 status unpacked libnss-systemd:amd64 237-3ubuntu10.54 2022-08-30 06:31:38 upgrade libpam-systemd:amd64 237-3ubuntu10.53 237-3ubuntu10.54 2022-08-30 06:31:38 status half-configured libpam-systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:38 status unpacked libpam-systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:38 status half-installed libpam-systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:38 status half-installed libpam-systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:38 status unpacked libpam-systemd:amd64 237-3ubuntu10.54 2022-08-30 06:31:38 status unpacked libpam-systemd:amd64 237-3ubuntu10.54 2022-08-30 06:31:38 upgrade systemd:amd64 237-3ubuntu10.53 237-3ubuntu10.54 2022-08-30 06:31:38 status half-configured systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:38 status unpacked systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:38 status half-installed systemd:amd64 237-3ubuntu10.53 2022-08-30 06:31:39 status triggers-pending ureadahead:amd64 0.100.0-21 2022-08-30 06:31:39 status triggers-pending dbus:amd64 1.12.2-1ubuntu1.3 2022-08-... [We've just had the s] [ ] [In the dpkg log we c] [ ] [2022-08-30 06:31:18 ] [2022-08-30 06:31:18 ] [2022-08-30 06:31:19 ] [2022-08-30 06:31:19 ] [2022-08-30 06:31:19 ] [2022-08-30 06:31:19 ] [Update] [Cancel] Revision history for this message [close] Lutz Willek (willek) wrote 13 hours ago: #3 Seems to be a duplicate of https://bugs.launchpad.net/ubuntu/+source/ systemd/+bug/1938791 - same symptoms. [Workaround] Reboot the node, DNS should return back to normal. [Seems to be a duplic] [ ] [[Workaround] ] [ ] [Reboot the node, DNS] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Revision history for this message [close] Pieter Lexis (pieter-lexis-tt) wrote 13 hours ago: #4 Microsoft has created an incident for this. https://azure.status. microsoft/en-us/status reports: Azure customers running Canonical Ubuntu 18.04 experiencing DNS errors - Investigating Starting at approximately 06:00 UTC on 30 Aug 2022, a number of customers running Ubuntu 18.04 (bionic) VMs recently upgraded to systemd version 237-3ubuntu10.54 reported experiencing DNS errors when trying to access their resources. Reports of this issue are confined to this single Ubuntu version. This bug and a potential fix have been highlighted on the Canonical / Ubuntu site, which we encourage impacted customers to read: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119 An additional potential workaround customers can consider is to reboot impacted VM instances so that they receive a fresh DHCP lease and new DNS resolver(s). Any Azure service, including AKS, that uses Canonical Ubuntu version 18.04 of Linux may have some impact from this issue. We are working on mitigations across Azure services that are impacted. More information will be provided within 60 minutes, when we expect to know more about the root cause and mitigation workstreams. This message was last updated at 09:20 UTC on 30 August 2022 [Microsoft has create] [ ] [ Azure customers run] [ ] [Starting at approxim] [ ] [This bug and a poten] [ ] [https://bugs.launchp] [ ] [Update] [Cancel] Revision history for this message [close] Iain Lane (laney) wrote 11 hours ago: #5 I've removed the update from bionic-security and bionic-updates, and restored the versions which were previously in there. This won't help anyone that has already received the broken update - I think the advice there is to restart, or there is a workaround in the OP here - but it should prevent any further occurrences. Note that there will be a delay of up to an hour or so for mirrors to receive the deletion. [I've removed the upd] [ ] [This won't help anyo] [ ] [Note that there will] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Revision history for this message [close] Pieter Lexis (pieter-lexis-tt) wrote 11 hours ago: #6 > This won't help anyone that has already received the broken update - I think the advice there is to restart, or there is a workaround in the OP here - but it should prevent any further occurrences. Do note this is not a solution for those using non-Azure resolvers provided via DHCP through their VNET. These users must reboot or manually set the fallback servers to their custom DNS resolver addresses [> This won't help an] [ ] [Do note this is not ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Vasili (vasili.namatov) 10 hours ago no longer affects: systemd Revision history for this message [close] Lee Van Steerthem (leevs) wrote 9 hours ago: #7 Not sure if this is the best place to help people out understanding if nodes are impacted. We already saw 2 different types of impact on our Azure AKS clusters. - Pod not able to Terminate - New images being pulled from ACR (or any container registry Sometimes it was very clear that we saw the nodes where "Not Ready` in order cases it's very hard to detect. We have found a way to detect if your nodes are affected. kubectl logs When you get the following error you know it's impacted: Error from server (InternalError): Internal error occurred: Authorization error (user=masterclient, verb=get, resource=nodes, subresource=proxy) So restarting the node will help and especially if your cluster is sensitive then you can be more granular about the restart. I hope it helps some visitors from the azure status page [Not sure if this is ] [We already saw 2 dif] [- Pod not able to Te] [- New images being p] [ ] [Sometimes it was ver] [ ] [We have found a way ] [ ] [kubectl logs Could this be a related issue, when deployment to aks fails, due to a connection refused when pulling images from azure container registry(ImagePullBackOff). I you look closer at the message accompanying the ImagePullBackOff, you should see something like: dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:36288->[::1]:53: read: connection refused Port 53 is the port a DNS server usually listens on. If this is what you're seeing, then yes: your problems are caused by the issue described in here. [> Could this be a re] [ ] [I you look closer at] [ dial tcp: lookup r] [ ] [Port 53 is the port ] [ ] [If this is what you'] [ ] [ ] [Update] [Cancel] Revision history for this message [close] Mark Lopez (silvenga) wrote 7 hours ago: #14 Yes @richardprammer, it appears ImagePullBackOff is one of the symptoms of this issue. [Yes @richardprammer,] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Revision history for this message [close] William Bergmann Borresen (williambb) wrote 7 hours ago: #15 To temporary mitigate the ImagePullBackOff I scaled up a new functional node (DNS wise) and used this command to reconcile the AKS cluster: az resource update --resource-group --name --namespace Microsoft.ContainerService --resource-type ManagedClusters This recovered CoreDNS in the kube-system namespace, which fixed the ImagePullBackOff [To temporary mitigat] [az resource update -] [ ] [This recovered CoreD] [ ] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Revision history for this message [close] Liam Macgillavry (cjdmax) wrote 6 hours ago: #16 az cli from cmd.exe, something like this for AKS nodes experiencing the issue: az vmss list-instances -g -n vmss --query "[].id" --output tsv | az vmss run-command invoke --scripts "echo FallbackDNS=168.63.129.16 >> /etc/systemd/resolved.conf; systemctl restart systemd-resolved.service" --command-id RunShellScript --ids @- [az cli from cmd.exe,] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Revision history for this message [close] Anton Tykhyi (atykhyy) wrote 6 hours ago: #17 Is it safe to downgrade from systemd 237-3ubuntu10.54 to the previous 237-3ubuntu10.50? [Is it safe to downgr] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Andreas Hasenack (ahasenack) 5 hours ago tags: added: regression-update Revision history for this message [close] James Adler (jamesadler) wrote 5 hours ago: #18 @atykhyy thank you that worked for VMSS! I also had some VMs without scale sets, fixed those with: az vm availability-set list -g --query " [].virtualMachines[].id" --output tsv | az vm run-command invoke --scripts "echo FallbackDNS=168.63.129.16 >> /etc/systemd/resolved. conf; systemctl restart systemd-resolved.service" --command-id RunShellScript --ids @- [@atykhyy thank you t] [ ] [I also had some VMs ] [ ] [az vm availability-s] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Revision history for this message [close] Sebastien Tardif (sebastientardifverituity) wrote 4 hours ago: #19 Microsoft Support provided fix for AKS, which I also tested successfully is: kubectl get no -o json | jq -r '.items[].spec.providerID' | cut -c 9- | az vmss run-command invoke --ids @- \ --command-id RunShellScript \ --scripts 'grep nameserver /etc/resolv.conf || { dhclient -x; dhclient -i eth0; sleep 10; pkill dhclient; grep nameserver /etc/ resolv.conf; }' [Microsoft Support pr] [ ] [ ] [kubectl get no -o js] [ --command-id RunSh] [ --scripts 'grep na] [ ] [ ] [ ] [ ] [Update] [Cancel] Revision history for this message [close] Adrian Joian (ajoian-2) wrote 4 hours ago: #20 I've added a few alternatives how to fix the problem, mainly using az cli for vmss, ansible or running a daemonset in this gist : https:// gist.github.com/naioja/eb8bac307a711e704b7923400b10bc14 [I've added a few alt] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Revision history for this message [close] bob sacamano (bobsacamano) wrote 2 hours ago: #21 this worked for us: https://github.com/joaguas/aksdnsfallback#if-the- above-method-fails-because-dhclient-might-stall-another-alternative- is-to-configure-resolved-to-use-a-fallback-dns-server-which-we- can-hardcode-in-its-configuration [this worked for us: ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] Revision history for this message [close] ForEachToil (foreachtoil) wrote 19 minutes ago: #22 You can find here some simple Python script to run a command to the VMSS instances for all subscriptions [or filtered ones]: https:// github.com/foreachtoil/execute-command-on-all-vmss I still lack threading, so this might take a little bit. [You can find here so] [I still lack threadi] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [Update] [Cancel] See full activity log To post a comment you must log in. * Report a bug This report contains Public information Everyone can see this information. Duplicates of this bug * Bug #1938791 You are not directly subscribed to this bug's notifications. Subscribing... * Edit bug mail Other bug subscribers Subscribe someone else Launchpad * Take the tour * Read the guide [ ] Search Launchpad (c) 2004-2022 Canonical Ltd. * Terms of use * Data privacy * Contact Launchpad Support * Blog * Careers * System status * 852d070 (Get the code!)