Re: Time keeping while suspended in the presence of persistent clock drift

From: Joel Daniels
Date: Tue Dec 14 2021 - 12:43:31 EST


Thomas,

On Tue, Dec 14, 2021, at 6:57 AM, Thomas Gleixner wrote:
> thanks for making sure that this is really a RTC issue on that machine.

And thank you for taking an interest. I've measured the RTC drift over
a number of days and it is stable at around 3.8 seconds per day (or 44
ppm).

>> The "if" branch does not apply as I have no clock sources flagged as
>> CLOCK_SOURCE_SUSPEND_NONSTOP but the "else if" branch does apply.
>
> Which CPU is in that box?

Intel Celeron N4120. This is a Gemini Lake Refresh (Atom) chip.

The relevant bit from the early_init_intel function
(linux/arch/x86/kernel/cpu/intel.c) is:

/* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
if (c->x86 == 6) {
switch (c->x86_model) {
case INTEL_FAM6_ATOM_SALTWELL_MID:
case INTEL_FAM6_ATOM_SALTWELL_TABLET:
case INTEL_FAM6_ATOM_SILVERMONT_MID:
case INTEL_FAM6_ATOM_AIRMONT_NP:
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC_S3);
break;
default:
break;
}
}

> The kernel does not believe. It relies on the accuracy of the CMOS clock
> which is usually pretty good.

The references I have found on CMOS clock accuracy [1, 2, 3, 4]
indicate that a drift of 1 or 2 seconds per day (10 to 20 ppm) is
typical. Hopefully people on linux-rtc can confirm?

If that is correct then my clock, at +44ppm, is an outlier but I
suspect that people with a consistent drift of only 1 second per day
would still benefit from being able to correct for it. Indeed, people
have been using hwclock and /etc/adjtime to correct for CMOS RTC
drift for decades.

> > I would like to provide a way for user space to inform the kernel
> > that the persistent clock drifts so it can make a corresponding
> > adjustment when resuming from a long suspend period.
> >
> > ...
>
> That needs some thought. The RTC people (cc'ed now) might have opinions
> on that.

I agree that this needs thought. Three issues that I am particularly
worried about:

[A] On machines with a persistent clock how is userspace supposed
to be sure what drift to measure? Can it assume that the drift
of the persistent clock used for sleep time injection is the
same as the drift of /dev/rtc? This seems dangerous.

[B] Sleep time injection can come from the "persistent clock" or,
if there is no persistent clock, from an RTC driver. I'd like
to correct for drift from the perisistant clock but not touch
the RTC driver sleep time injection mechanism. Is this
acceptable or do people feel that any drift correction should
work with both mechanisms in order to ensure a polished
interface?

[C] Some users may want to correct for drift during suspend-to-RAM
but during suspend-to-disk they might boot into some other
operating system which itself sets the CMOS RTC. Hopefully,
this could be solved from userspace by changing the drift
correction parameter to 0 just before a suspend-to-disk
operation.

I suspect that there are other things about which I should also be
worried if only I were less ignorant. That is why I am asking here.

Thanks,
Joel Daniels

[1] http://www.ntp.org/ntpfaq/NTP-s-trbl-spec.htm#AEN5674 :
"A PC used a stratum 1 server with PPS had had a hardware fault,
and it had been powered off for about 18 days. ... when the system
was rebooted the RTC clock was off by 18 seconds. That would be
an error of roughly 12 PPM."

[2] https://www.hindawi.com/journals/jcnc/2008/583162/ :
"In IBM PC compatible computers, the RTC circuit is the Motorola
146818, with a resolution of approximately one second and a
significant drift"

[3] https://www.maximintegrated.com/en/design/
technical-documents/app-notes/5/58.html :
Tables 1, 2 and 3 list 32.768Khz crystals with typical frequency
tolerances of around +/- 20ppm at 25 degrees celsius.

[4] https://www.greyware.com/software/domaintime/technical/
accuracy/pcclocks.asp:
"The resolution of most PC real-time clocks is one full second,
and most RTCs drift considerably over time."