Re: [RFC PATCH 0/3] x86/mm/tlb: Defer TLB flushes with PTI

From: Andy Lutomirski
Date: Tue Aug 27 2019 - 19:18:50 EST


On Fri, Aug 23, 2019 at 11:07 PM Nadav Amit <namit@xxxxxxxxxx> wrote:
>
> INVPCID is considerably slower than INVLPG of a single PTE, but it is
> currently used to flush PTEs in the user page-table when PTI is used.
>
> Instead, it is possible to defer TLB flushes until after the user
> page-tables are loaded. Preventing speculation over the TLB flushes
> should keep the whole thing safe. In some cases, deferring TLB flushes
> in such a way can result in more full TLB flushes, but arguably this
> behavior is oftentimes beneficial.

I have a somewhat horrible suggestion.

Would it make sense to refactor this so that it works for user *and*
kernel tables? In particular, if we flush a *kernel* mapping (vfree,
vunmap, set_memory_ro, etc), we shouldn't need to send an IPI to a
task that is running user code to flush most kernel mappings or even
to free kernel pagetables. The same trick could be done if we treat
idle like user mode for this purpose.

In code, this could mostly consist of changing all the "user" data
structures involved to something like struct deferred_flush_info and
having one for user and one for kernel.

I think this is horrible because it will enable certain workloads to
work considerably faster with PTI on than with PTI off, and that would
be a barely excusable moral failing. :-p

For what it's worth, other than register clobber issues, the whole
"switch CR3 for PTI" logic ought to be doable in C. I don't know a
priori whether that would end up being an improvement.