Re: [tip:efi/core] x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings
From: Andy Lutomirski
Date: Wed Feb 24 2016 - 11:40:29 EST
On Wed, Feb 24, 2016 at 6:10 AM, Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 23 Feb, at 06:43:04PM, Andy Lutomirski wrote:
>> On Tue, Feb 23, 2016 at 4:50 PM, Sai Praneeth Prakhya
>> <sai.praneeth.prakhya@xxxxxxxxx> wrote:
>> >
>> > As you rightly said the code is about making a page GLOBAL if PRESENT is
>> > set and we do set PRESENT bit before mapping so that page is GLOBAL.
>> > This code was taken from the other parts of pageattr.c. The point is
>> > that we don't want differences between whether things were mapped in the
>> > EFI page tables directly (i.e. using populate_pte()) or later split from
>> > large pages via the split_large_page() code path. If this is still
>> > confusing could you please elaborate on it further.
>>
>> At least the comment should say "Set the GLOBAL flag if and only
>> if...". But why is this code here in the first place? What is
>> passing a pgprot with global unset into this code in the first place?
>
> This comes from populate_pgd(),
>
> static int populate_pgd(struct cpa_data *cpa, unsigned long addr)
> {
> pgprot_t pgprot = __pgprot(_KERNPG_TABLE);
which reminds me: aren't we supposed to *not* set GLOBAL on _PAGE_TABLE entries?
>>
>> You're making a choice of whether to set _PAGE_GLOBAL, and I think
>> you've made the wrong choice.
>>
>> Normally, the only pages with are _PAGE_GLOBAL are those that are in
>> the normal kernel mappings (swapper_pg_dir and normal mm_struct pgds).
>> By allowing _PAGE_GLOBAL to be set in EFI mappings, you're breaking
>> that convention, which forces you to use extra-expensive
>> __flush_tlb_all calls in efi_call_virt.
>>
>> I think you should explicitly *clear* _PAGE_GLOBAL in the EFI mappings
>> instead. This would allow you to use write_cr3 by itself, which would
>> make the code simpler and faster.
>
> This is interesting.
>
> When I suggested to Sai that he write this patch my main motivation
> was consistency for all mappings. We've got that now, but perhaps it's
> the wrong consistency ;)
>
> If we go with the no-PAGE_GLOBAL approach, we need changes to ensure
> we never set _PAGE_GLOBAL for the EFI mappings, because before this
> patch was applied sometimes we did and sometimes we didn't, depending
> on whether a page was split or not.
>
> I'm racking my brain to think of how your suggestion might have
> unintended consequences because diagnosing stale TLB entry bugs is
> simply the worst job ever. I can't think of anything. The only
> scenarios where we'd see problems is if a) we have new global mappings
> in the EFI page tables or b) we have different global mappings.
>
> Since we reference swapper_pg_dir from the PMD level downwards b)
> shouldn't be a problem, and the only differences between
> swapper_pg_dir and efi_pgd should be the EFI mappings, which saves us
> from a).
:)
Anyway, there's certainly no need to do this right now.
>
>> > This is a valid point. I know that EFI runtime regions persist during
>> > and after boot if we have a UEFI firmware and other commits made EFI
>> > regions have separate page table but I am not clear about the effect of
>> > global flush. I think Matt/Boris could comment on it.
>>
>> It's straightfoward on existing kernels. If _PAGE_GLOBAL is set, TLB
>> entries persist across cr3 writes. If _PAGE_GLOBAL is clear, then TLB
>> entries are flushed by cr3 writes.
>
> This is safe for EFI right now because of the big __flush_tlb_all() in
> efi_call_virt().
>
>> With PCID enabled (which is only in a not-quite-ready patch set I
>> have), the story is a bit more complicated, but it works essentially
>> the same way unless you explicitly opt out.
>
> Hmm... is series that posted somewhere?
It's living here:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/pcid
at least until someone figures out how to squash the races in the bookkeeping.
I intentionally never load PCID == 0 with the "don't flush" bit set
specifically so that direct PCID-unaware CR3 loads (like the EFI code
does all over the place) keep working.
>
>> > We touch this code path only when mapping EFI runtime regions to VA
>> > space, i.e. we added pgd field in cpa only as a support for mapping efi
>> > runtime regions.
>>
>> populate_pgd is called from non-EFI code as well though, isn't it?
>
> Nope. The "if (cpa->pgd)" guard ensures that we only call that
> function for the EFI mapping code - no one else sets ->pgd.
OK, although a comment might be nice.
--Andy