Re: [PATCH 1/3] x86: Honour passed pgprot in track_pfn_insert() and track_pfn_remap()

From: Matthew Wilcox
Date: Tue Jan 26 2016 - 23:41:12 EST

On Mon, Jan 25, 2016 at 09:33:35AM -0800, Andy Lutomirski wrote:
> On Mon, Jan 25, 2016 at 9:25 AM, Matthew Wilcox
> <matthew.r.wilcox@xxxxxxxxx> wrote:
> > From: Matthew Wilcox <willy@xxxxxxxxxxxxxxx>
> >
> > track_pfn_insert() overwrites the pgprot that is passed in with a value
> > based on the VMA's page_prot. This is a problem for people trying to
> > do clever things with the new vm_insert_pfn_prot() as it will simply
> > overwrite the passed protection flags. If we use the current value of
> > the pgprot as the base, then it will behave as people are expecting.
> >
> > Also fix track_pfn_remap() in the same way.
> Well that's embarrassing. Presumably it worked for me because I only
> overrode the cacheability bits and lookup_memtype did the right thing.
> But shouldn't the PAT code change the memtype if vm_insert_pfn_prot
> requests it? Or are there no callers that actually need that? (HPET
> doesn't, because there's a plain old ioremapped mapping.)

I'm confused. Here's what I understand:

- on x86, the bits in pgprot can be considered as two sets of bits;
the 'cacheability bits' -- those in _PAGE_CACHE_MASK and the
'protection bits' -- PRESENT, RW, USER, ACCESSED, NX
- The purpose of track_pfn_insert() is to ensure that the cacheability bits
are the same on all mappings of a given page, as strongly advised by the
Intel manuals [1]. So track_pfn_insert() is really only supposed to
modify _PAGE_CACHE_MASK of the passed pgprot, but in fact it ends up
modifying the protection bits as well, due to the bug.

I don't think you overrode the cacheability bits at all. It looks to
me like your patch ends up mapping the HPET into userspace writable.

I don't think the vm_insert_pfn_prot() call gets to change the memtype.
For one, that page may already be mapped into a differet userspace using
the pre-existing memtype, and [1] continues to bite you. Then there
may be outstanding kernel users of the page that's being mapped in.

So I think track_pfn_insert() is doing the right thing with respect to
the cacheability bits (overwrite the ones passed in), it's just doing
an unexpected thing with regard to the protection bits, which my patch
should fix.

[1] "The PAT allows any memory type to be specified in the page tables,
and therefore it is possible to have a single physical page mapped to
two or more different linear addresses, each with different memory
types. Intel does not support this practice because it may lead to
undefined operations that can result in a system failure. In particular,
a WC page must never be aliased to a cacheable page because WC writes
may not check the processor caches."