Re: [PATCH] Fix resume on x86-32 machines
From: Andy Lutomirski
Date: Mon Dec 11 2017 - 13:42:09 EST
On Mon, Dec 11, 2017 at 10:31 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Dec 11, 2017 at 6:22 AM, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
>> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>>>
>>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>>
>> I *think* you are right.
>>
>> Anyway, that should be easy enough to verify.
>>
>> Pavel, can you please check if the below change works too?
>
> So Jarkko confirmed this works for him, but the more I look at this
> crap, the less I like it.
>
> Why do we save fs/ds/es/ss at all on x86-32? Don't they all have fixed
> values in the kernel, with %fs being __KERNEL_PERCPU, and the others
> being __USER_DS?
>
> Nothing else can possibly be valid, as far as I can tell.
>
> I think we actually leave the user-space percpu segment in %gs (or the
> stack canary base), so that one we should actually save/restore, but
> I'm getting the feeling that we should just reset the other segment
> registers to known values on 32-bit.
>
> Also, why does the 32-bit code do
>
> loadsegment(es, ctxt->es);
>
> but the 64-bit code does
>
> asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
>
> And look at that confusion between MSR_GS_BASE and MSR_KERNEL_GS_BASE
> all within the 64-bit case.
>
> In particular, note how we reload the %gs segment in between the two -
> wouldn't that mess with the currently active gs base if %gs can be
> non-zero?
>
> Christ, what a mess.
>
> So I think that whole sequence is garbage. It has been written as some
> kind of "save and restore registers", but that's not what it really
> then does - or what it should do.
>
> It should make sure to restore a sane kernel state, not some random
> register state.
>
> And the 32-bit and 64-bit code really should strive to be at least
> _sanely_ different, not this randomly and insanely different mess.
>
> But yes, Rafael's patch looks like the minimal one-liner. But I think
> we should do the %gs load early too for the 32-bit stack canary case,
> kind of like we need to do %fs for percpu base.
I'll try to get to this in a day or so -- is that okay? Or should we
do some trivial fix/revert and fix it for real next time around?