[HN Gopher] How Does NTP (Network Time Protocol) Work?
___________________________________________________________________
How Does NTP (Network Time Protocol) Work?
Author : aemreunal
Score : 107 points
Date : 2022-02-20 09:25 UTC (13 hours ago)
(HTM) web link (sookocheff.com)
(TXT) w3m dump (sookocheff.com)
| elcapitan wrote:
| I liked this video from a lecture series by Martin Kleppmann
| (author of the "Data intensive applications" book) on clock
| synchronisation and NTP:
| https://www.youtube.com/watch?v=mAyW-4LeXZo
|
| I found especially the last part with some hints on how to
| correctly measure time passed when doing manual profiling helpful
| (by not using the wall clock, but the monotonic clock. Probably
| everybode else knows this, but I didn't :D).
| yoobetrue wrote:
| For a good history of NTP, check out
| https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19....
| I had Dr. Mills for an electronics class in college. Truly a
| hacker's hacker.
| magicalhippo wrote:
| I had a momentary struggle to imagine how a digital phase
| comparator would work, but the first few slides from this
| lecture[1] made it very clear.
|
| [1]:
| https://pallen.ece.gatech.edu/Academic/ECE_6440/Summer_2003/...
| addaon wrote:
| PTP is a higher-performing alternative to NTP. The major downside
| is that it requires hardware support, but in exchange it gives
| much better timing precision with less software overhead.
| Hardware support is nearly ubiquitous in embedded PHYs; I assume
| it's similar for consumer grade components as well.
|
| One of the products I've been impressed by recently is a GPS
| receiver and PTP master built into an SFP connector [1]. No
| affiliation with the company, but a super simple way to get a
| local GPS-disciplined PTP source into a network.
|
| [1] https://www.oscilloquartz.com/en/products-and-
| services/ptp-g...
| sdsaga12 wrote:
| This had been my impression of PTP too, and I think in many
| circumstances it's true, but I recently listened to this
| episode from Jane Street's Signals and Threads podcast that
| gave some interesting explanation of why they chose to base
| their synchronization system on NTP:
| https://signalsandthreads.com/clock-synchronization/
|
| Preview: "Yeah. I think that's roughly the conclusion I came
| to, that that's what makes PTP more accurate than NTP, which
| was surprising to me. And then I did a bunch of research and
| was talking to various people in the industry, and at various
| conferences and stuff, and there was some agreement that you
| can make NTP also very accurate you just have to control some
| of these things, so there are... in addition to being able to
| do hardware timestamping with PTP packets some cards, these
| days, support the ability to hardware timestamp all the
| packets, and if your machine is just acting as an NTP server
| and most of the packets it receives are NTP packets, well then
| you're effectively timestamping NTP packets. Some cards also
| will timestamp just NTP packets. They can sort of recognize
| them and timestamp only those, but it was sort of like "Okay if
| we have the right hardware, we can get the timestamping bit of
| it. That's kind of an interesting thing. With the different
| NTPD implementation, chrony being the other implementation I'm
| talking about as opposed to the reference one, you can turn
| that knob for how frequently you should poll your server, I
| think as much as like 16 times a second. There's a bit of like
| diminishing returns there, it's not always better to go
| lower... point being, you can tune it to at least match sort of
| what PTP's default of once a second.
|
| And the more I dug, and the more I talked to people, the more
| people told me, "Hey, you definitely do not want to involve
| your switches in your time distribution. If you can figure out
| a way to leave them out of it, you should do so." I was happy
| to hear that in some ways, because right now the reliability.
| or the sort of, the responsibility of the time distribution
| kind of lies with one group, and that's fine. When you then
| have this responsibility shared across multiple groups, right,
| it becomes a lot more complicated. Every switch upgrade,
| suddenly, you're concerned. "Well, Is it possible that this new
| version of the firmware you're putting on that version of that
| particular switch has a bug related to this PTP stuff and is
| causing problems?"
|
| Given all of that, I started to believe that it was possible
| that we could solve this problem of getting within 100
| microseconds using NTP and I sort of set out to try and see if
| I could actually do that."
| addaon wrote:
| My familiarity with PTP is in the context of distributed
| embedded systems, sometimes using the PTP hardware for
| relative synchronization without even having an absolute
| reference; but in that world, PTP precision is an order or
| magnitude or two better than "within 100 microseconds" -- 1
| us is a sane target, and 5 us is very comfortable.
| Unklejoe wrote:
| For what it's worth, we normally aim for < 50 nanoseconds
| RMS on our systems with PTP. You can get even better if you
| combine it with synchronous Ethernet.
| jeffbee wrote:
| Ubiquitous, eh? Definitely not in consumer space. The RTL8125
| that came on my PC doesn't support it. I added an Intel I210
| which does support it, after trying an Intel I225 which claims
| to support it but has a broken implementation.
| hacker_newz wrote:
| Why do you need microsecond precision on a home PC?
| addaon wrote:
| I'm obviously not familiar with desktop chipsets, but "the
| RTL8125BG/RTL8125BGS supports IEEE 1588, IEEE 1588-2008, and
| IEEE 802.1AS, also known as Precision Time Protocol (PTP)"
| [1] -- don't know if the -BG suffix is a substantially
| different part.
|
| [1] https://www.realtek.com/en/products/communications-
| network-i...
| jeffbee wrote:
| Mine says it has no hardware clock and consequently no
| timestamping.
| gsich wrote:
| Maybe no driver support in mainline kernel.
| PopAlongKid wrote:
| In the mid 1990s I had to figure out how to set up a radio-
| based[0] NTP server. There is a radio broadcast from Ft. Collins
| Colorado (NIST) that is highly accurate. Someone else at the
| large company I worked for had purchased an antenna (as described
| in the article, it was "a ferrite bar inside a plastic enclosure"
| but no one so far had figured out how to use it. I got a telecom
| tech to temporarily install the antenna on a post in a small
| outdoor atrium at our computer center, then I put a Sparc station
| on a cart so I could wheel it out there to connect the antenna to
| the serial port. (Got some strange looks from co-workers on their
| coffee breaks). I did a lot of reading, spent some time getting
| the source code to compile correctly, and finally got it all
| working, so that a more permanent installation could be made and
| we would now have another low-stratum NTP server for our internal
| network spread across the state.
|
| [0]http://www.articlesfactory.com/articles/computers/using-
| wwvb...
| newman314 wrote:
| Something that might be of interest:
| https://github.com/hzeller/txtempus
|
| I live in an area without decent WWVB reception. So one of
| these days, my plan is to build a house range WWVB antenna so
| that I can get all the radio clocks in my house working
| consistently.
| Aachen wrote:
| Honestly there are so many tangentials in this post that it's
| really quite hard to get through without losing track of what's
| what, or what's real and what's hypothetical.
|
| The article mentions network delay filtering algorithms that fall
| in broad categories x and y, but doesn't mention what NTP
| actually uses (linking to some pdf presentation elsewhere). Then
| there's a section about clock selection (which I understand to
| mean server selection) where it sounds like clocks are selected
| at random, no matter if they're on another continent, it's just
| favored if it has a low stratum. Then NTP talks to up to 5
| servers and averages the resulting clock diff and, uh, applies
| that "using the PLL/FLL clock control system". This refers to
| some phase lock loop stuff from what earlier seemed to be a
| tangent: how quartz clocks work internally. So NTP actually tells
| the hardware to adjust how fast it counts? That sounds like I'd
| have heard of before but okay. If the offset is large and it's
| not enough to slow down or speed up your internal quartz for a
| time, it'll instead just update the time.
|
| The TL;DR seems to be this picture:
| https://sookocheff.com/post/time/how-does-ntp-work/assets/nt...
|
| Where you can understand "selection and clustering algorithms" to
| mean "removing outlier data points" and substitute "combining
| algorithm" with "average the values". The abbreviation VFO is
| never mentioned on the page but this must be variable frequency
| oscillator, commonly referred to as "clock" -- if I understood it
| correctly. What this "filter" is, that is applied to each peer,
| is unclear to me.
| ReactiveJelly wrote:
| The article feels a bit like they just re-wrote the Wikipedia
| article on NTP:
| https://en.wikipedia.org/wiki/Network_Time_Protocol
|
| Which... I know editing Wikipedia is hard, because of the
| moderators, but if you think you can do better than a given
| Wikipedia article, please consider just fixing the Wikipedia
| article.
| egberts1 wrote:
| There is MitM NTP going on so a bit of hardening is needed.
|
| Not commonly discussed but I wrote a script to do Chrony added
| configuration to mitigate this.
|
| https://github.com/egberts/easy-admin/blob/main/480-ntp-chro...
| drexlspivey wrote:
| Doesn't this system make the assumption that the latency between
| the server and the client is symmetric? What happens if the
| packet takes 100ms to go but 50ms to return?
| josephcsible wrote:
| Yes, but measuring one-way latency is impossible unless the
| endpoints already have synchronized clocks, so this isn't
| avoidable.
| toast0 wrote:
| The reference implementation does make that assumption.
| OpenBSD's ntp does too. phk, author of ntimed had written some
| blog posts exploring asymmetric paths, so I think ntimed
| doesn't assume symmetry, but assigns it higher probability. I
| don't remember what chrony does.
|
| The thing is, it's rather difficult to determine the individual
| components of the path latency without an out of band reference
| clock. Especially if all of your network clocks are about the
| same round trip away.
| magicalhippo wrote:
| If Alice pings Bob first, and then sends the total round-trip
| time to Bob along with the timestamp, then can't Bob easily
| discover the asymmetry as a constant offset compared to the
| adjusted clock?
|
| Say Bob assumes the round-trip is symmetrical and adjusts his
| local time to the time Alice sends minus half the round-trip
| time, then any asymmetry should be visible as a consistent
| offset between the local time and the time from Alice no?
|
| This of course assumes the latency and local clocks are
| relatively stable over the measurement period.
| josephcsible wrote:
| No, that doesn't work. See
| https://cs.stackexchange.com/q/602/86141 for an explanation
| of why it's impossible. If you tried your method with
| concrete numbers, you'd see that it would always say the
| delay is perfectly symmetric, even when it actually isn't.
| toast0 wrote:
| If all you have is Alice and Bob, there's no way to determine
| the time from Alice to Bob, or from Bob to Alice, you can
| only determine the sum of the two. This is because Alice and
| Bob have different local time scales.
|
| To put it another way, you can't distinguish being in sync
| and Bob -> Alice taking 4 ms, and Alice -> Bob taking 6 ms,
| from being out of sync by 1 ms, and both directions taking 5
| ms.
|
| If you've got an external reference, such as GPS or an
| adjusted for time in transit radio clock, you can figure this
| out. If you've got a asymmetric first hop, but afterwards a
| mostly symmetric path (common for residential customers
| connecting to commercially hosted ntp servers) and a set of
| servers that are different round trips away, you can get an
| estimate. But if all the servers are the same/similar round
| trips, you likely won't have enough information.
| mmh0000 wrote:
| NTP is designed to deal with unreliable networks, from
| wikipedia:
|
| "It uses the intersection algorithm, a modified version of
| Marzullo's algorithm, to select accurate time servers and is
| designed to mitigate the effects of variable network
| latency."[0][1]
|
| [0] https://en.wikipedia.org/wiki/Network_Time_Protocol
|
| [1] https://en.wikipedia.org/wiki/Intersection_algorithm
| toast0 wrote:
| Variable network latency is different than static assymetry.
|
| Variable latency means sometimes there's extra delay, and is
| solved by throwing away measurements that are outside the
| norm. Assymetry in delay is not handled by the reference
| implementation, which assumes equal delay to and from each
| server.
| ReactiveJelly wrote:
| So if one server is asymmetric one way (the uplink is slow)
| and another server is asymmetric another way (the downlink is
| slow), you can use that to narrow down the possibility space
| and get an answer that's more accurate than just talking to
| one server or the other?
|
| It works as long as all your uplinks aren't too slow or too
| fast...
___________________________________________________________________
(page generated 2022-02-20 23:01 UTC)