Re: [PATCH] x86: use the right protections for split-up pagetables

From: Ingo Molnar
Date: Fri Feb 20 2009 - 12:00:42 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Fri, 20 Feb 2009, Ingo Molnar wrote:
> >
> > Agreed, split_large_page() was just plain confused here - there
> > was no hidden reason for this logic. It makes no sense to bring
> > any pte level protection information to the PMD level because a
> > pmd entry covers a set of 512 ptes so there's no singular
> > protection attribute that can be carried to it.
>
> Btw, I think split_large_page() is confused in another way
> too, although I'm not entirely sure that it matters. I suspect
> that it doesn't, if I read things correctly.
>
> The confusion? When it moves the 'ref_prot' bits from the
> upper level, it doesn't do the right thing for the PAT bit.
> That bit is special, and moves around depending on level. In
> the upper levels, it's bit#12, and in the final 4k pte level
> it's bit#7.
>
> So _if_ the PAT bit ever matters, it looks like
> split_large_page() does the wrong thing.
>
> Now, it looks like we avoid the PAT bit on purpose, and we
> only ever encode four PAT values (ie we use only the PCD/PWT
> bits, and leave the PAT bit clear - we don't need any more
> cases), _but_ we actually do end up looking at the PAT bit
> anyway in cache_attr(). So it looks like at least some of the
> code is _trying_ to handle the PAT bit, but I can pretty much
> guarantee that at least split_large_page() is broken if it is
> ever set.

Yeah. This our current PAT encodings table:

/*
* PTE encoding used in Linux:
* PAT
* |PCD
* ||PWT
* |||
* 000 WB _PAGE_CACHE_WB
* 001 WC _PAGE_CACHE_WC
* 010 UC- _PAGE_CACHE_UC_MINUS
* 011 UC _PAGE_CACHE_UC
* PAT bit unused
*/
pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, UC);

it's intentionally left compressed and the extended PAT bit is
never set, we only need 4 caching types.

( Speculation: in theory it would be possible for some CPU's
TLB-fill fastpath to have some small penalty on having a
non-zero extended-PAT bit. So eliminating weird bits and
compressing pte bit usage is always a good idea. )

Nevertheless you are right that there's a disconnect here and
that were it ever set we'd unconditionally lift the 2MB/1GB PAT
bit [bit 12] over into the 4K level.

If we ever set the PAT bit on a large page then the
split_large_page() behavior would become rather nasty: we'd
corrupt pte bit 12, i.e. we'd lose linear mappings, we'd map
every even page twice (and not map any uneven pages), and we'd
start corrupting memory and would crash in interesting ways.

There's two solutions:

- make the ref_prot opt-in and explicitly enumerate all the
bits we handle correctly today

- add a debug warning for the bits we know we dont handle

I went for the second as the first one would include basically
all the meaningful bits we have:

[_PAGE_BIT_PRESENT]
_PAGE_BIT_RW
_PAGE_BIT_USER
_PAGE_BIT_PWT
_PAGE_BIT_PCD
[_PAGE_BIT_ACCESSED]
[_PAGE_BIT_DIRTY]
_PAGE_BIT_GLOBAL
_PAGE_BIT_NX

( the ones in brackets are not important because we set/clear
them anyway, but they dont hurt either. )

And if we do not include PAT and it gets used in the future the
function could break too - just in different ways. (by not
carrying over the PAT bit.)

So i think it's safest to put in a sharp debug check for the
known-unhandled bit. I've queued up the fix below in tip:x86/mm,
do you think this approach is the best?

Ingo

-------------------->