[HN Gopher] How Does NTP (Network Time Protocol) Work?
       ___________________________________________________________________
        
       How Does NTP (Network Time Protocol) Work?
        
       Author : aemreunal
       Score  : 107 points
       Date   : 2022-02-20 09:25 UTC (13 hours ago)
        
 (HTM) web link (sookocheff.com)
 (TXT) w3m dump (sookocheff.com)
        
       | elcapitan wrote:
       | I liked this video from a lecture series by Martin Kleppmann
       | (author of the "Data intensive applications" book) on clock
       | synchronisation and NTP:
       | https://www.youtube.com/watch?v=mAyW-4LeXZo
       | 
       | I found especially the last part with some hints on how to
       | correctly measure time passed when doing manual profiling helpful
       | (by not using the wall clock, but the monotonic clock. Probably
       | everybode else knows this, but I didn't :D).
        
       | yoobetrue wrote:
       | For a good history of NTP, check out
       | https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19....
       | I had Dr. Mills for an electronics class in college. Truly a
       | hacker's hacker.
        
       | magicalhippo wrote:
       | I had a momentary struggle to imagine how a digital phase
       | comparator would work, but the first few slides from this
       | lecture[1] made it very clear.
       | 
       | [1]:
       | https://pallen.ece.gatech.edu/Academic/ECE_6440/Summer_2003/...
        
       | addaon wrote:
       | PTP is a higher-performing alternative to NTP. The major downside
       | is that it requires hardware support, but in exchange it gives
       | much better timing precision with less software overhead.
       | Hardware support is nearly ubiquitous in embedded PHYs; I assume
       | it's similar for consumer grade components as well.
       | 
       | One of the products I've been impressed by recently is a GPS
       | receiver and PTP master built into an SFP connector [1]. No
       | affiliation with the company, but a super simple way to get a
       | local GPS-disciplined PTP source into a network.
       | 
       | [1] https://www.oscilloquartz.com/en/products-and-
       | services/ptp-g...
        
         | sdsaga12 wrote:
         | This had been my impression of PTP too, and I think in many
         | circumstances it's true, but I recently listened to this
         | episode from Jane Street's Signals and Threads podcast that
         | gave some interesting explanation of why they chose to base
         | their synchronization system on NTP:
         | https://signalsandthreads.com/clock-synchronization/
         | 
         | Preview: "Yeah. I think that's roughly the conclusion I came
         | to, that that's what makes PTP more accurate than NTP, which
         | was surprising to me. And then I did a bunch of research and
         | was talking to various people in the industry, and at various
         | conferences and stuff, and there was some agreement that you
         | can make NTP also very accurate you just have to control some
         | of these things, so there are... in addition to being able to
         | do hardware timestamping with PTP packets some cards, these
         | days, support the ability to hardware timestamp all the
         | packets, and if your machine is just acting as an NTP server
         | and most of the packets it receives are NTP packets, well then
         | you're effectively timestamping NTP packets. Some cards also
         | will timestamp just NTP packets. They can sort of recognize
         | them and timestamp only those, but it was sort of like "Okay if
         | we have the right hardware, we can get the timestamping bit of
         | it. That's kind of an interesting thing. With the different
         | NTPD implementation, chrony being the other implementation I'm
         | talking about as opposed to the reference one, you can turn
         | that knob for how frequently you should poll your server, I
         | think as much as like 16 times a second. There's a bit of like
         | diminishing returns there, it's not always better to go
         | lower... point being, you can tune it to at least match sort of
         | what PTP's default of once a second.
         | 
         | And the more I dug, and the more I talked to people, the more
         | people told me, "Hey, you definitely do not want to involve
         | your switches in your time distribution. If you can figure out
         | a way to leave them out of it, you should do so." I was happy
         | to hear that in some ways, because right now the reliability.
         | or the sort of, the responsibility of the time distribution
         | kind of lies with one group, and that's fine. When you then
         | have this responsibility shared across multiple groups, right,
         | it becomes a lot more complicated. Every switch upgrade,
         | suddenly, you're concerned. "Well, Is it possible that this new
         | version of the firmware you're putting on that version of that
         | particular switch has a bug related to this PTP stuff and is
         | causing problems?"
         | 
         | Given all of that, I started to believe that it was possible
         | that we could solve this problem of getting within 100
         | microseconds using NTP and I sort of set out to try and see if
         | I could actually do that."
        
           | addaon wrote:
           | My familiarity with PTP is in the context of distributed
           | embedded systems, sometimes using the PTP hardware for
           | relative synchronization without even having an absolute
           | reference; but in that world, PTP precision is an order or
           | magnitude or two better than "within 100 microseconds" -- 1
           | us is a sane target, and 5 us is very comfortable.
        
             | Unklejoe wrote:
             | For what it's worth, we normally aim for < 50 nanoseconds
             | RMS on our systems with PTP. You can get even better if you
             | combine it with synchronous Ethernet.
        
         | jeffbee wrote:
         | Ubiquitous, eh? Definitely not in consumer space. The RTL8125
         | that came on my PC doesn't support it. I added an Intel I210
         | which does support it, after trying an Intel I225 which claims
         | to support it but has a broken implementation.
        
           | hacker_newz wrote:
           | Why do you need microsecond precision on a home PC?
        
           | addaon wrote:
           | I'm obviously not familiar with desktop chipsets, but "the
           | RTL8125BG/RTL8125BGS supports IEEE 1588, IEEE 1588-2008, and
           | IEEE 802.1AS, also known as Precision Time Protocol (PTP)"
           | [1] -- don't know if the -BG suffix is a substantially
           | different part.
           | 
           | [1] https://www.realtek.com/en/products/communications-
           | network-i...
        
             | jeffbee wrote:
             | Mine says it has no hardware clock and consequently no
             | timestamping.
        
               | gsich wrote:
               | Maybe no driver support in mainline kernel.
        
       | PopAlongKid wrote:
       | In the mid 1990s I had to figure out how to set up a radio-
       | based[0] NTP server. There is a radio broadcast from Ft. Collins
       | Colorado (NIST) that is highly accurate. Someone else at the
       | large company I worked for had purchased an antenna (as described
       | in the article, it was "a ferrite bar inside a plastic enclosure"
       | but no one so far had figured out how to use it. I got a telecom
       | tech to temporarily install the antenna on a post in a small
       | outdoor atrium at our computer center, then I put a Sparc station
       | on a cart so I could wheel it out there to connect the antenna to
       | the serial port. (Got some strange looks from co-workers on their
       | coffee breaks). I did a lot of reading, spent some time getting
       | the source code to compile correctly, and finally got it all
       | working, so that a more permanent installation could be made and
       | we would now have another low-stratum NTP server for our internal
       | network spread across the state.
       | 
       | [0]http://www.articlesfactory.com/articles/computers/using-
       | wwvb...
        
         | newman314 wrote:
         | Something that might be of interest:
         | https://github.com/hzeller/txtempus
         | 
         | I live in an area without decent WWVB reception. So one of
         | these days, my plan is to build a house range WWVB antenna so
         | that I can get all the radio clocks in my house working
         | consistently.
        
       | Aachen wrote:
       | Honestly there are so many tangentials in this post that it's
       | really quite hard to get through without losing track of what's
       | what, or what's real and what's hypothetical.
       | 
       | The article mentions network delay filtering algorithms that fall
       | in broad categories x and y, but doesn't mention what NTP
       | actually uses (linking to some pdf presentation elsewhere). Then
       | there's a section about clock selection (which I understand to
       | mean server selection) where it sounds like clocks are selected
       | at random, no matter if they're on another continent, it's just
       | favored if it has a low stratum. Then NTP talks to up to 5
       | servers and averages the resulting clock diff and, uh, applies
       | that "using the PLL/FLL clock control system". This refers to
       | some phase lock loop stuff from what earlier seemed to be a
       | tangent: how quartz clocks work internally. So NTP actually tells
       | the hardware to adjust how fast it counts? That sounds like I'd
       | have heard of before but okay. If the offset is large and it's
       | not enough to slow down or speed up your internal quartz for a
       | time, it'll instead just update the time.
       | 
       | The TL;DR seems to be this picture:
       | https://sookocheff.com/post/time/how-does-ntp-work/assets/nt...
       | 
       | Where you can understand "selection and clustering algorithms" to
       | mean "removing outlier data points" and substitute "combining
       | algorithm" with "average the values". The abbreviation VFO is
       | never mentioned on the page but this must be variable frequency
       | oscillator, commonly referred to as "clock" -- if I understood it
       | correctly. What this "filter" is, that is applied to each peer,
       | is unclear to me.
        
         | ReactiveJelly wrote:
         | The article feels a bit like they just re-wrote the Wikipedia
         | article on NTP:
         | https://en.wikipedia.org/wiki/Network_Time_Protocol
         | 
         | Which... I know editing Wikipedia is hard, because of the
         | moderators, but if you think you can do better than a given
         | Wikipedia article, please consider just fixing the Wikipedia
         | article.
        
       | egberts1 wrote:
       | There is MitM NTP going on so a bit of hardening is needed.
       | 
       | Not commonly discussed but I wrote a script to do Chrony added
       | configuration to mitigate this.
       | 
       | https://github.com/egberts/easy-admin/blob/main/480-ntp-chro...
        
       | drexlspivey wrote:
       | Doesn't this system make the assumption that the latency between
       | the server and the client is symmetric? What happens if the
       | packet takes 100ms to go but 50ms to return?
        
         | josephcsible wrote:
         | Yes, but measuring one-way latency is impossible unless the
         | endpoints already have synchronized clocks, so this isn't
         | avoidable.
        
         | toast0 wrote:
         | The reference implementation does make that assumption.
         | OpenBSD's ntp does too. phk, author of ntimed had written some
         | blog posts exploring asymmetric paths, so I think ntimed
         | doesn't assume symmetry, but assigns it higher probability. I
         | don't remember what chrony does.
         | 
         | The thing is, it's rather difficult to determine the individual
         | components of the path latency without an out of band reference
         | clock. Especially if all of your network clocks are about the
         | same round trip away.
        
         | magicalhippo wrote:
         | If Alice pings Bob first, and then sends the total round-trip
         | time to Bob along with the timestamp, then can't Bob easily
         | discover the asymmetry as a constant offset compared to the
         | adjusted clock?
         | 
         | Say Bob assumes the round-trip is symmetrical and adjusts his
         | local time to the time Alice sends minus half the round-trip
         | time, then any asymmetry should be visible as a consistent
         | offset between the local time and the time from Alice no?
         | 
         | This of course assumes the latency and local clocks are
         | relatively stable over the measurement period.
        
           | josephcsible wrote:
           | No, that doesn't work. See
           | https://cs.stackexchange.com/q/602/86141 for an explanation
           | of why it's impossible. If you tried your method with
           | concrete numbers, you'd see that it would always say the
           | delay is perfectly symmetric, even when it actually isn't.
        
           | toast0 wrote:
           | If all you have is Alice and Bob, there's no way to determine
           | the time from Alice to Bob, or from Bob to Alice, you can
           | only determine the sum of the two. This is because Alice and
           | Bob have different local time scales.
           | 
           | To put it another way, you can't distinguish being in sync
           | and Bob -> Alice taking 4 ms, and Alice -> Bob taking 6 ms,
           | from being out of sync by 1 ms, and both directions taking 5
           | ms.
           | 
           | If you've got an external reference, such as GPS or an
           | adjusted for time in transit radio clock, you can figure this
           | out. If you've got a asymmetric first hop, but afterwards a
           | mostly symmetric path (common for residential customers
           | connecting to commercially hosted ntp servers) and a set of
           | servers that are different round trips away, you can get an
           | estimate. But if all the servers are the same/similar round
           | trips, you likely won't have enough information.
        
         | mmh0000 wrote:
         | NTP is designed to deal with unreliable networks, from
         | wikipedia:
         | 
         | "It uses the intersection algorithm, a modified version of
         | Marzullo's algorithm, to select accurate time servers and is
         | designed to mitigate the effects of variable network
         | latency."[0][1]
         | 
         | [0] https://en.wikipedia.org/wiki/Network_Time_Protocol
         | 
         | [1] https://en.wikipedia.org/wiki/Intersection_algorithm
        
           | toast0 wrote:
           | Variable network latency is different than static assymetry.
           | 
           | Variable latency means sometimes there's extra delay, and is
           | solved by throwing away measurements that are outside the
           | norm. Assymetry in delay is not handled by the reference
           | implementation, which assumes equal delay to and from each
           | server.
        
           | ReactiveJelly wrote:
           | So if one server is asymmetric one way (the uplink is slow)
           | and another server is asymmetric another way (the downlink is
           | slow), you can use that to narrow down the possibility space
           | and get an answer that's more accurate than just talking to
           | one server or the other?
           | 
           | It works as long as all your uplinks aren't too slow or too
           | fast...
        
       ___________________________________________________________________
       (page generated 2022-02-20 23:01 UTC)