Re: [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI
From: Valentin Schneider
Date: Wed Nov 20 2024 - 12:25:28 EST
On 20/11/24 16:32, Peter Zijlstra wrote:
> On Wed, Nov 20, 2024 at 04:22:16PM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 19, 2024 at 04:35:00PM +0100, Valentin Schneider wrote:
>>
>> > +void noinstr __flush_tlb_all_noinstr(void)
>> > +{
>> > + /*
>> > + * This is for invocation in early entry code that cannot be
>> > + * instrumented. A RMW to CR4 works for most cases, but relies on
>> > + * being able to flip either of the PGE or PCIDE bits. Flipping CR4.PCID
>> > + * would require also resetting CR3.PCID, so just try with CR4.PGE, else
>> > + * do the CR3 write.
>> > + *
>> > + * XXX: this gives paravirt the finger.
>> > + */
>> > + if (cpu_feature_enabled(X86_FEATURE_PGE))
>> > + __native_tlb_flush_global_noinstr(this_cpu_read(cpu_tlbstate.cr4));
>> > + else
>> > + native_flush_tlb_local_noinstr();
>> > +}
>>
>> Urgh, so that's a lot of ugleh, and cr4 has that pinning stuff and gah.
>>
>> Why not always just do the CR3 write and call it a day? That should also
>> work for paravirt, no? Just make the whole write_cr3 thing noinstr and
>> voila.
>
> Oh gawd, just having looked at xen_write_cr3() this might not be
> entirely trivial to mark noinstr :/
... I hadn't even seen that.
AIUI the CR3 RMW is not "enough" if we have PGE enabled, because then
global pages aren't flushed.
The question becomes: what is held in global pages and do we care about
that when it comes to vmalloc()? I'm starting to think no, but this is x86,
I don't know what surprises are waiting for me.
I see e.g. ds_clear_cea() clears PTEs that can have the _PAGE_GLOBAL flag,
and it correctly uses the non-deferrable flush_tlb_kernel_range().