Re: [PATCH 2/2] cpu/hotplug: Unfreeze sibling CPU first on resume from S3

From: Thomas Gleixner
Date: Tue Jan 29 2019 - 18:02:52 EST


Jan,

On Tue, 29 Jan 2019, Jan H. SchÃnherr wrote:

> At least one system declares the TSC unstable after resume from S3,
> because the TSC is observed going backwards up to roughly 500 cycles
> every now and then, when bringing secondary CPUs back online.
>
> The system in question is an AMD Ryzen Threadripper 2950X, microcode
> 0x800820b, on an ASRock Fatal1ty X399 Professional Gaming, BIOS P3.30.
>
> This unexplained behavior goes away as soon as the sibling CPU of the
> boot CPU is brought back up. Hence, add a hack to restore the sibling
> CPU before all others on unfreeze. This keeps the TSC stable.

Uurgh, no. As you said that's a hack and I'm pretty sure that it just works
by chance. It makes the underlying wreckage not longer observable.

I'm pretty sure this is a BIOS bug and I'm really not going to make a
special case here just to accomodate with that particular broken
firmware. This would just set precedence for random ordering requests based
on DMI strings and other data to make it work on all kind of broken
motherboard/firmware/microcode combinations.

Surely nice detective work, but we really don't want to open this can of
worms.

Too bad that AMD does not have the TSC_ADJUST register. It would tell you
immediately what's wrong and the code we have for that would probably cure
the mess.

Sigh, it's more than 20 years by now that I'm complaining to both Intel and
AMD about the complete trainwreck they made out of TSC and it's still not
fixed. Though I still have the illusion that by the time I retire I get my
hands on a machine with a sane TSC implementation. Hope dies last ....

Oh well, enough ranted and with that I hand off the further proceedings to
Tom Lendacky who surely can give you more technical help than me in that
particular matter.

Thanks,

tglx