Re: Linux 4.15-rc2: Regression in resume from ACPI S3

From: Andy Lutomirski
Date: Tue Dec 12 2017 - 13:06:13 EST


On Tue, Dec 12, 2017 at 9:27 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Sun, Dec 10, 2017 at 1:28 PM, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> That said, this *all* smells wrong. Why is there a special
>> fix_processor_context() function at all with different 32-bit and
>> 64-bit behavior? This code is all written to be maximally confusing.
>
> Hmm. Looking a bit more at this, I think it should be solved by:
>
> - load the original read-write GDT early, along with the IDT.
>
> We have already saved it off in __save_processor_state(), and it may
> have already gotten loaded very early in at least some of the paths,
> but it definitely hasn't gotten reloaded in all the paths (think
> "suspend/resume testing" etc).
>
> - add the LDT descriptor to the save area too, exactly like we
> already have IDT/GDT.
>
> Then, we can do "load_ldt()" early (along with IDT and GDT).
>
> - now we can just load all the segments early, and get the percpu and
> stack canary stuff without any special cases
>
> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
> and plays with interrupt state. Just load the segment register, and
> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
> need for the swapgs dance.

Using what helper? On x86_64, it can fault, and IIRC we explicitly
don't allow loadsegment(gs, ...).

>
> - now that we have a fully working system, then the
> "fix_processor_context()" code can do the "fancy" stuff like setting
> up the RO fixmap GDT and re-initializing the TLB state. Those want
> percpu stuff.
>
> In other words, why not try to organize this so that we do a simple
> and fairly mindless restore of the low-level state first? Avoid all
> the "system is halfway up" crud.
>
> Would that work for people? Andy?

Other than the above, more or less.

But we should really do all the user segments last. They're not at
all needed for normal kernel execution, so I think they should be the
very last part.

I'll try to get to this today.