Re: [PATCH 1/3] x86: Honour passed pgprot in track_pfn_insert() and track_pfn_remap()

From: Andy Lutomirski
Date: Wed Jan 27 2016 - 00:45:26 EST

On Tue, Jan 26, 2016 at 8:40 PM, Matthew Wilcox <willy@xxxxxxxxxxxxxxx> wrote:
> On Mon, Jan 25, 2016 at 09:33:35AM -0800, Andy Lutomirski wrote:
>> On Mon, Jan 25, 2016 at 9:25 AM, Matthew Wilcox
>> <matthew.r.wilcox@xxxxxxxxx> wrote:
>> > From: Matthew Wilcox <willy@xxxxxxxxxxxxxxx>
>> >
>> > track_pfn_insert() overwrites the pgprot that is passed in with a value
>> > based on the VMA's page_prot. This is a problem for people trying to
>> > do clever things with the new vm_insert_pfn_prot() as it will simply
>> > overwrite the passed protection flags. If we use the current value of
>> > the pgprot as the base, then it will behave as people are expecting.
>> >
>> > Also fix track_pfn_remap() in the same way.
>> Well that's embarrassing. Presumably it worked for me because I only
>> overrode the cacheability bits and lookup_memtype did the right thing.
>> But shouldn't the PAT code change the memtype if vm_insert_pfn_prot
>> requests it? Or are there no callers that actually need that? (HPET
>> doesn't, because there's a plain old ioremapped mapping.)
> I'm confused. Here's what I understand:
> - on x86, the bits in pgprot can be considered as two sets of bits;
> the 'cacheability bits' -- those in _PAGE_CACHE_MASK and the
> 'protection bits' -- PRESENT, RW, USER, ACCESSED, NX
> - The purpose of track_pfn_insert() is to ensure that the cacheability bits
> are the same on all mappings of a given page, as strongly advised by the
> Intel manuals [1]. So track_pfn_insert() is really only supposed to
> modify _PAGE_CACHE_MASK of the passed pgprot, but in fact it ends up
> modifying the protection bits as well, due to the bug.
> I don't think you overrode the cacheability bits at all. It looks to
> me like your patch ends up mapping the HPET into userspace writable.

I sure hope not. If vm_page_prot was writable, something was already
broken, because this is the vvar mapping, and the vvar mapping is
VM_READ (and not even VM_MAYREAD).

> I don't think the vm_insert_pfn_prot() call gets to change the memtype.
> For one, that page may already be mapped into a differet userspace using
> the pre-existing memtype, and [1] continues to bite you. Then there
> may be outstanding kernel users of the page that's being mapped in.

So why was remap_pfn_range different? I'm sure there was a reason.

I don't think that whatever_pfn_prot should ever map a page
inconsistently, but I find it surprising that some of the variants
call reserve_memtype to change the memtype and others don't.

Anyway, this is in no way an objection to your patches.