Re: [git head] X86_PAT & mprotect

From: Venki Pallipadi
Date: Wed May 07 2008 - 19:23:42 EST


On Wed, May 07, 2008 at 08:18:45PM +0100, Hugh Dickins wrote:
> On Wed, 7 May 2008, Ingo Molnar wrote:
> > the pte_modify() change looks correct at first sight.
> >
> > The _PAGE_PROT_PRESERVE_BITS solution looks a bit ugly (although we do
> > have a couple of other similar #ifndefs in the MM already).
> >
> > at minimum we should add vm_get_page_prot_preserve() as an inline
> > function to mm.h if _PAGE_PROT_PRESERVE_BITS is not defined, and make it
> > call vm_get_page_prot(). Also, vm_get_page_prot() in mm/mmap.c should
> > probably be marked inline so that we'll have only a single function call
> > [vm_get_page_prot() is trivial].
>
> I've tried doing it slightly differently below,
> don't know whether you'll consider it an improvement or not.

Hugh: Thanks for looking into this. Yes. I like your modified patch. Simpler
and smaller.

> > ------------------------------------->
> > Subject: generic, x86, PAT: fix mprotect
> > From: Venki Pallipadi <venkatesh.pallipadi@xxxxxxxxx>
> > Date: Tue, 6 May 2008 15:42:40 -0700
> >
> > There is a hole in mprotect, which lets the user to change the page
> > cache type bits by-passing the kernel reserve_memtype and free_memtype
> > wrappers. Fix the hole by not letting mprotect change the PAT bits.
>
> Maybe say "defect" rather than "hole"? I stupidly thought for a
> while that you were talking about some gap in the address space.

Yes. we can call it a loophole or defect.

> > @@ -289,7 +291,8 @@ static inline pte_t pte_modify(pte_t pte
> > * Chop off the NX bit (if present), and add the NX portion of
> > * the newprot (if present):
> > */
> > - val &= _PAGE_CHG_MASK & ~_PAGE_NX;
>
> (Nothing to do with your patch, but that's a silly line, isn't it?
> _PAGE_NX isn't in _PAGE_CHG_MASK so it doesn't need further masking out.)

Yes. That PAGE_NX part indeed looks bogus.

> > + /* We also preserve PAT bits from existing pte */
> > + val &= (_PAGE_CHG_MASK | _PAGE_PROT_PRESERVE_BITS) & ~_PAGE_NX;
> > val |= pgprot_val(newprot) & __supported_pte_mask;
> >
> > return __pte(val);
>
> An odd thing is that you end up keeping the PAT bits from the old
> pte, and the PAT bits from vm_page_prot. So if they can get out
> of synch, you'd be oring them together; and if they can't get out
> of synch, then do you need a change here?

They cannot get out of sync. So, we can do without the change here. But, I felt
it is safer to take things from old pte.

> I've followed that dup in my version below, but it's probably wrong.
> Yes, you want the PAT bits in vm_page_prot (so various places
> will automatically set them), but I think here it's more correct
> to take the PAT bits from the old pte and mask out those from the
> vm_page_prot?

Yes. We can mask out these bits from vm_page_prot, that way we will not
'or' them if they get out of sync.
If your below modified patch is OK, I can send an incremental patch to
change pte_modify() to inherit PAT bits only from oldpte.

Thanks,
Venki

> Here's my more minimal alternative; but I think we need to clarify
> on that apparent duplication of the PAT bits in pte_modify.
>
> arch/x86/pci/i386.c | 4 +---
> include/asm-x86/pgtable.h | 14 +++++++++++++-
> mm/mprotect.c | 11 ++++++++++-
> 3 files changed, 24 insertions(+), 5 deletions(-)
>
> --- 2.6.26-rc1/arch/x86/pci/i386.c 2008-05-03 21:54:41.000000000 +0100
> +++ linux/arch/x86/pci/i386.c 2008-05-07 17:14:53.000000000 +0100
> @@ -301,15 +301,13 @@ int pci_mmap_page_range(struct pci_dev *
> prot = pgprot_val(vma->vm_page_prot);
> if (pat_wc_enabled && write_combine)
> prot |= _PAGE_CACHE_WC;
> - else if (pat_wc_enabled)
> + else if (pat_wc_enabled || boot_cpu_data.x86 > 3)
> /*
> * ioremap() and ioremap_nocache() defaults to UC MINUS for now.
> * To avoid attribute conflicts, request UC MINUS here
> * aswell.
> */
> prot |= _PAGE_CACHE_UC_MINUS;
> - else if (boot_cpu_data.x86 > 3)
> - prot |= _PAGE_CACHE_UC;
>
> vma->vm_page_prot = __pgprot(prot);
>
> --- 2.6.26-rc1/include/asm-x86/pgtable.h 2008-05-03 21:55:10.000000000 +0100
> +++ linux/include/asm-x86/pgtable.h 2008-05-07 19:09:42.000000000 +0100
> @@ -57,7 +57,8 @@
> #define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \
> _PAGE_DIRTY)
>
> -#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY)
> +#define _PAGE_CHG_MASK (PTE_MASK | _PAGE_PWT | _PAGE_PCD | \
> + _PAGE_ACCESSED | _PAGE_DIRTY)
>
> #define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT)
> #define _PAGE_CACHE_WB (0)
> @@ -294,6 +295,17 @@ static inline pte_t pte_modify(pte_t pte
> return __pte(val);
> }
>
> +/*
> + * mprotect needs to preserve PAT bits when updating vm_page_prot
> + */
> +#define pgprot_modify pgprot_modify
> +static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
> +{
> + pgprotval_t preservebits = pgprot_val(oldprot) & _PAGE_CHG_MASK;
> + pgprotval_t addbits = pgprot_val(newprot);
> + return __pgprot(preservebits | addbits);
> +}
> +
> #define pte_pgprot(x) __pgprot(pte_val(x) & (0xfff | _PAGE_NX))
>
> #define canon_pgprot(p) __pgprot(pgprot_val(p) & __supported_pte_mask)
> --- 2.6.26-rc1/mm/mprotect.c 2008-01-24 22:58:37.000000000 +0000
> +++ linux/mm/mprotect.c 2008-05-07 18:42:07.000000000 +0100
> @@ -26,6 +26,13 @@
> #include <asm/cacheflush.h>
> #include <asm/tlbflush.h>
>
> +#ifndef pgprot_modify
> +static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
> +{
> + return newprot;
> +}
> +#endif
> +
> static void change_pte_range(struct mm_struct *mm, pmd_t *pmd,
> unsigned long addr, unsigned long end, pgprot_t newprot,
> int dirty_accountable)
> @@ -192,7 +199,9 @@ success:
> * held in write mode.
> */
> vma->vm_flags = newflags;
> - vma->vm_page_prot = vm_get_page_prot(newflags);
> + vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
> + vm_get_page_prot(newflags));
> +
> if (vma_wants_writenotify(vma)) {
> vma->vm_page_prot = vm_get_page_prot(newflags & ~VM_SHARED);
> dirty_accountable = 1;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/