[PATCH] x86: use the right protections for split-up pagetables

From: Ingo Molnar
Date: Fri Feb 20 2009 - 02:30:02 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> So the whole
>
> ref_prot = pte_pgprot(pte_mkexec(pte_clrhuge(*kpte)));
> pgprot_val(ref_prot) |= _PAGE_PRESENT;
> __set_pmd_pte(kpte, address, mk_pte(base, ref_prot));
>
> sequence is utter crap, I think. The whole "ref_prot" there
> should be just _pgprot(_KERNPG_TABLE), I think. I don't think
> there is any other valid value.

Agreed, split_large_page() was just plain confused here - there
was no hidden reason for this logic. It makes no sense to bring
any pte level protection information to the PMD level because a
pmd entry covers a set of 512 ptes so there's no singular
protection attribute that can be carried to it.

The right solution is what you suggested: to use the most
permissive protection bits for the pmd, i.e. _KERNPG_TABLE.
Since the protection bits get combined, this makes the pte
protections control the final behavior of the mapping - so
subsequent code patching and similar activities will work fine.

The bug was mostly harmless until Steve hacked his kernel to
have the right (large) size of readonly, text and data areas. I
never hit such an ftrace hang even with allyesconfig bzImage
bootups [which has obscenely large text and data sections], so i
think something in Steve's tree was also needed to trigger it:
an unusually large readonly data section.

I've queued up the fix below in tip:x86/urgent and will send a
pull request later today if it passes testing. Steve, does this
solve the bug you've hit?

With this fix i dont think the other bits from Steve's series
(patch 1-4) are needed at all - those patches expose PMD details
in various places that iterate over ptes - that's ugly and
unnecessary as well if the PMD's protection is permissive.

[ Also, since you suggested the fix i've added your Acked-by,
let me know if you dont agree with any aspect of the fix. ]

Ingo

---------------->