Re: [PATCH v2] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume

From: Josh Poimboeuf
Date: Wed May 29 2019 - 12:14:09 EST


On Wed, May 29, 2019 at 12:32:02PM +0200, Jiri Kosina wrote:
> From: Jiri Kosina <jkosina@xxxxxxx>
>
> As explained in
>
> 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
>
> we always, no matter what, have to bring up x86 HT siblings during boot at
> least once in order to avoid first MCE bringing the system to its knees.
>
> That means that whenever 'nosmt' is supplied on the kernel command-line,
> all the HT siblings are as a result sitting in mwait or cpudile after
> going through the online-offline cycle at least once.
>
> This causes a serious issue though when a kernel, which saw 'nosmt' on its
> commandline, is going to perform resume from hibernation: if the resume
> from the hibernated image is successful, cr3 is flipped in order to point
> to the address space of the kernel that is being resumed, which in turn
> means that all the HT siblings are all of a sudden mwaiting on address
> which is no longer valid.
>
> That results in triple fault shortly after cr3 is switched, and machine
> reboots.
>
> Fix this by always waking up all the SMT siblings before initiating the
> 'restore from hibernation' process; this guarantees that all the HT
> siblings will be properly carried over to the resumed kernel waiting in
> resume_play_dead(), and acted upon accordingly afterwards, based on the
> target kernel configuration.

hibernation_restore() is called by user space at runtime, via ioctl or
sysfs. So I think this still doesn't fix the case where you've disabled
CPUs at runtime via sysfs, and then resumed from hibernation. Or are we
declaring that this is not a supported scenario?

Would it be possible for mwait_play_dead() to instead just monitor a
fixmap address which doesn't change for kaslr?

Is there are reason why maxcpus= doesn't do the CR4.MCE booted_once
dance?

--
Josh