Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
From: Peter Zijlstra
Date: Thu Sep 28 2017 - 07:58:34 EST
On Thu, Sep 28, 2017 at 06:03:05PM +0800, Dou Liyang wrote:
> At 09/28/2017 02:09 AM, Peter Zijlstra wrote:
> > On Wed, Sep 27, 2017 at 08:05:48PM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 27, 2017 at 09:52:36PM +0800, Dou Liyang wrote:
> > > > We do not want to do that. Because, we use "notsc" to support Dynamic
> > > > Reconfiguration[1].
> > > >
> > > > AFAIK, this feature enables hot-add system board which contains CPUs
> > > > and memories. But the CPUs in different board may have different TSCs
> > > > which are not consistent with the TSC from the existing CPUs. If we hot-add
> > > > a board directly, the machine may happen the inconsistency of
> > > > TSC.
> > > >
> > > > We make our effort to specify the same TSC value as existing one through
> > > > hardware and firmware, but it is hard. So we recommend to specify
> > > > "notsc" option in command line for users who want to use Dynamic
> > > > Reconfiguration.
> > >
> > > Oh gawd, that's horrific. And in my book a good reason to kill that
> > > option.
> >
> > That is, even with unsynchronized TSC we're better off using RDTSC. The
> > whole mess in kernel/sched/clock.c is all about getting semi sensible
> > results out of unsynchronized TSC.
> >
>
> It will be best if we can support TSC sync capability in x86, but seems
> is not easy.
Sure, your hardware achieving sync would be best, but even if it does
not, we can still use TSC. Using notsc simple because you fail to sync
TSCs is quite crazy.
The thing is, we need to support unsync'ed TSC in any case, because
older chips (pre Nehalem) didn't have synchronized TSC in any case, and
it still happens on recent chips if the BIOS mucks it up, which happens
surprisingly often :-(
I would suggest you try your reconfigurable setup with "tsc=unstable"
and see if that works for you. That marks the TSC unconditionally
unstable at boot and avoids any further wobbles once the TSC watchdog
notices (although that too _should_ more or less work).
I do however hope you have a custom clocksource driver placed at higher
priority than the HPET.