Re: [GIT PULL] timer fixes
From: Peter Zijlstra
Date: Tue Dec 17 2019 - 15:35:18 EST
On Tue, Dec 17, 2019 at 12:16:52PM -0800, Linus Torvalds wrote:
> On Tue, Dec 17, 2019 at 11:30 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > What alternatives are there? That is, we normally only use HPET to
> > double check nobody messed up the TSC.
>
> The thing is HPET seems to be _less_ reliable than the TSC we're
> checking these days.
>
> If that's the only use-case for HPET, we should just stop doing it.
>
> > We can't just blindly trust TSC across everything x86.
>
> No, but we can trust it when it's a modern CPU.
Pray.. the TSC MSR is still writable from SMM, so BIOS monkeys could
still do what they've been doing for decades. Which is try and 'hide'
SMM latency by taking the TSC timestamp on SMM entry and writing the
timestamp back into the TSC MSR on exit.
Yes, ever since TSC_ADJUST we can better recover from it, but
we still need to first detect it went sideways and by then time has been
observed buggered and any recovery is basically too late :/
Granted, this is happening less (at least, I really do hope so).
Also, what consititutes a 'modern' CPU?
> The HPET seems to get disabled on all the modern platforms, why do we
> even have it enabled by default?
These new ones yeah, cuz they wrecked HPET in PC10 :/
> We should do the HPET cross-check only when we know the TSC might be
> unreliable, I suspect.
But how do we know? Ever since Nehalem TSC has basically been good
hardware wise -- there's a few exception on large (>4) socket machines,
but nobody has those anyway.
It has always been the BIOS messing it up. Now, the reason I think it
has gotten better is because Windows is now also relying on TSC (like
we've been doing forever).
Maybe I'm too scarred by too much TSC wreckage over the years...