Re: 2.6.29 pat issue

From: Thomas Hellström
Date: Fri Feb 06 2009 - 04:43:36 EST


Eric W. Biederman wrote:
Thomas Hellstrom <thellstrom@xxxxxxxxxx> writes:


Indeed, it's crucial to keep the mappings consistent, but failure to do so is a
kernel driver bug, it should never be the result of invalid user data.

It easily can be. Think of an X server mmaping frame buffers. Or other
device bars.

Hmm, Yes you're right, although I'm still a bit doubtful about RAM pages.

Wait. Now I see what's causing the problems. The code is assuming that VM_PFNMAP vmas never map RAM pages. That's also an invalid assumption. See comments in mm/memory.c

So probably the attribute check should be done for the insert_pfn path of VM_MIXEDMAP as well. That's not done today.

So there are three distinct bugs at this point:

1) VMAs with VM_PFNMAP are incorrectly assumed to be linear if vma->vm_pgoff non-null.
2) VM_PFNMAP VMA PTEs are incorrectly assumed to never point to physical RAM.
3) There is no check for the insert_pfn path of vm_insert_mixed().

IMHO checking each vm_insert_pfn() for caching attribute correctness is not
something that should be enabled by default, due to the CPU overhead. Production
drivers should never violate this.

If it is a problem the implementation should become more efficient. Userspace
as well as drivers can generate these mappings so even with a perfect driver
you cannot guarantee that someone else does not have that area of memory
mapped differently.
OK, So there seems to be a couple of things that can be done for performance here:

1) A fastpath for single pages.
2) RAM pages are tracked with a page bit today.
Why not say "all memory backed by a struct page" should be tracked with a page bit. Then pfn_valid() could be used instead of page_is_ram(). This, combined with 1) should make tracking struct page backed pages extremely fast.
3) If vm_insert_pfn() happens to be used on a linear VMA, it looks like the whole VMA is being validated for each vm_insert_pfn(), which seems extremely inefficient, considering the extensive tests in pagerame_is_ram().

/Thomas

Eric


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/