Re: [patch 13/16] x86/ldt: Introduce LDT write fault handler
From: Thomas Gleixner
Date: Tue Dec 12 2017 - 16:41:20 EST
On Tue, 12 Dec 2017, Thomas Gleixner wrote:
> On Tue, 12 Dec 2017, Dave Hansen wrote:
>
> > On 12/12/2017 11:21 AM, Thomas Gleixner wrote:
> > > The only critical interaction is the return to user path (user CS/SS) and
> > > we made sure with the LAR touching that these are precached in the CPU
> > > before we go into fragile exit code.
> >
> > How do we make sure that it _stays_ cached?
> >
> > Surely there is weird stuff like WBINVD or SMI's that can come at very
> > inconvenient times and wipe it out of the cache.
>
> This does not look like cache in the sense of memory cache. It seems to be
> CPU internal state and I just stuffed WBINVD and alternatively CLFLUSH'ed
> the entries after the 'touch' via LAR. Still works.
Dave pointed me once more to the following paragraph in the SDM, which
Peter and I looked at before and we tried that w/o success:
If the segment descriptors in the GDT or an LDT are placed in ROM, the
processor can enter an indefinite loop if software or the processor
attempts to update (write to) the ROM-based segment descriptors. To
prevent this problem, set the accessed bits for all segment descriptors
placed in a ROM. Also, remove operating-system or executive code that
attempts to modify segment descriptors located in ROM.
Now that made me go back to the state of the patch series which made us
make that magic 'touch' and write fault handler. The difference to the code
today is that it did not prepopulate the user visible mapping.
We added that later because we were worried about not being able to
populate it in the #PF due to memory pressure without ripping out the magic
cure again.
But I did now and actually removing both the user exit magic 'touch' code
and the write fault handler keeps it working.
Removing the prepopulate code makes it break again with a #GP in
IRET/SYSRET.
What happens there is that the IRET pops SS (with a minimal testcase) which
causes the #PF. That populates the PTE and returns happily. Right after
that the #GP comes in with IP pointing to the user space instruction right
after the syscall.
That simplifies and descaryfies that code massively.
Darn, I should have gone back and check every part again as I usually do,
but my fried brain failed.
Thanks,
tglx