Re: [tip:efi/core] x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings

From: Matt Fleming
Date: Wed Feb 24 2016 - 09:10:54 EST

On Tue, 23 Feb, at 06:43:04PM, Andy Lutomirski wrote:
> On Tue, Feb 23, 2016 at 4:50 PM, Sai Praneeth Prakhya
> <sai.praneeth.prakhya@xxxxxxxxx> wrote:
> >
> > As you rightly said the code is about making a page GLOBAL if PRESENT is
> > set and we do set PRESENT bit before mapping so that page is GLOBAL.
> > This code was taken from the other parts of pageattr.c. The point is
> > that we don't want differences between whether things were mapped in the
> > EFI page tables directly (i.e. using populate_pte()) or later split from
> > large pages via the split_large_page() code path. If this is still
> > confusing could you please elaborate on it further.
> At least the comment should say "Set the GLOBAL flag if and only
> if...". But why is this code here in the first place? What is
> passing a pgprot with global unset into this code in the first place?

This comes from populate_pgd(),

static int populate_pgd(struct cpa_data *cpa, unsigned long addr)
pgprot_t pgprot = __pgprot(_KERNPG_TABLE);

> > I don't think previous implementation is broken and this is not a bug
> > fix as such. Before this patch some EFI region mappings had GLOBAL bit
> > set (which followed split large page path) and some aren't (which used
> > populate_pte). This patch just aligns the mappings done via two
> > different code paths as mentioned above. As a whole it also maintains
> > consistency with kernel mappings.
> But your code also *clears* GLOBAL if PRESENT is clear, and your
> comment talks about that. Does this ever actually happen in practice?

Not when mapping EFI regions, no (we explicitly set _PAGE_PRESENT in
kernel_map_pages_in_pgd(), but this logic is duplicated from
__split_large_page() which can be called from non-EFI code.

> I'd much rather see WARN_ON_ONCE((pgprot_val(pgprot) & (_PAGE_PRESENT
> | _PAGE_GLOBAL)) == _PAGE_GLOBAL) in here, if that makes sense in the
> context of the callers of the function.

I think that'd be a nice cleanup, along with pulling all the pgprot
twiddling out into a single function.

> > We did this for consistency among EFI mappings. This has some advantages
> > as mentioned in the commit message. It also makes it less confusing when
> > starting at the PGT_DUMP traces if _PAGE_GLOBAL is used consistently.
> >
> > We don't actually do anything special with _PAGE_GLOBAL in EFI.
> You're making a choice of whether to set _PAGE_GLOBAL, and I think
> you've made the wrong choice.
> Normally, the only pages with are _PAGE_GLOBAL are those that are in
> the normal kernel mappings (swapper_pg_dir and normal mm_struct pgds).
> By allowing _PAGE_GLOBAL to be set in EFI mappings, you're breaking
> that convention, which forces you to use extra-expensive
> __flush_tlb_all calls in efi_call_virt.
> I think you should explicitly *clear* _PAGE_GLOBAL in the EFI mappings
> instead. This would allow you to use write_cr3 by itself, which would
> make the code simpler and faster.

This is interesting.

When I suggested to Sai that he write this patch my main motivation
was consistency for all mappings. We've got that now, but perhaps it's
the wrong consistency ;)

If we go with the no-PAGE_GLOBAL approach, we need changes to ensure
we never set _PAGE_GLOBAL for the EFI mappings, because before this
patch was applied sometimes we did and sometimes we didn't, depending
on whether a page was split or not.

I'm racking my brain to think of how your suggestion might have
unintended consequences because diagnosing stale TLB entry bugs is
simply the worst job ever. I can't think of anything. The only
scenarios where we'd see problems is if a) we have new global mappings
in the EFI page tables or b) we have different global mappings.

Since we reference swapper_pg_dir from the PMD level downwards b)
shouldn't be a problem, and the only differences between
swapper_pg_dir and efi_pgd should be the EFI mappings, which saves us
from a).

> > This is a valid point. I know that EFI runtime regions persist during
> > and after boot if we have a UEFI firmware and other commits made EFI
> > regions have separate page table but I am not clear about the effect of
> > global flush. I think Matt/Boris could comment on it.
> It's straightfoward on existing kernels. If _PAGE_GLOBAL is set, TLB
> entries persist across cr3 writes. If _PAGE_GLOBAL is clear, then TLB
> entries are flushed by cr3 writes.

This is safe for EFI right now because of the big __flush_tlb_all() in

> With PCID enabled (which is only in a not-quite-ready patch set I
> have), the story is a bit more complicated, but it works essentially
> the same way unless you explicitly opt out.

Hmm... is series that posted somewhere?

> > We touch this code path only when mapping EFI runtime regions to VA
> > space, i.e. we added pgd field in cpa only as a support for mapping efi
> > runtime regions.
> populate_pgd is called from non-EFI code as well though, isn't it?

Nope. The "if (cpa->pgd)" guard ensures that we only call that
function for the EFI mapping code - no one else sets ->pgd.