Re: [PATCH] mm: hugetlb: break COW earlier for resv owner

From: Hugh Dickins
Date: Wed Feb 22 2012 - 03:18:08 EST


On Sat, 18 Feb 2012, Hillf Danton wrote:
> When a process owning a MAP_PRIVATE mapping fails to COW, due to references
> held by a child and insufficient huge page pool, page is unmapped from the
> child process to guarantee the original mappers reliability, and the child
> may get SIGKILLed if it later faults.

I think I understand you there.

>
> With that guarantee, COW is broken earlier on behalf of owners, and they will
> go less page faults.

As usual, I have to guess that here you're describing what (you think)
happens after your patch.

But I don't understand, or it doesn't seem to describe what happens in
your patch. "COW is broken" only in the sense that you are breaking
the way COW is supposed to behave, and I believe your patch is wrong.

>
> Signed-off-by: Hillf Danton <dhillf@xxxxxxxxx>
> ---
>
> --- a/mm/hugetlb.c Tue Feb 14 20:10:46 2012
> +++ b/mm/hugetlb.c Sat Feb 18 13:29:58 2012
> @@ -2145,10 +2145,12 @@ int copy_hugetlb_page_range(struct mm_st
> struct page *ptepage;
> unsigned long addr;
> int cow;
> + int owner;
> struct hstate *h = hstate_vma(vma);
> unsigned long sz = huge_page_size(h);
>
> cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
> + owner = is_vma_resv_set(vma, HPAGE_RESV_OWNER);
>
> for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
> src_pte = huge_pte_offset(src, addr);
> @@ -2164,10 +2166,19 @@ int copy_hugetlb_page_range(struct mm_st
>
> spin_lock(&dst->page_table_lock);
> spin_lock_nested(&src->page_table_lock, SINGLE_DEPTH_NESTING);
> - if (!huge_pte_none(huge_ptep_get(src_pte))) {
> + entry = huge_ptep_get(src_pte);
> + if (!huge_pte_none(entry)) {
> if (cow)
> - huge_ptep_set_wrprotect(src, addr, src_pte);
> - entry = huge_ptep_get(src_pte);
> + if (owner) {
> + /*
> + * Break COW for resv owner to go less
> + * page faults later
> + */
> + entry = huge_pte_wrprotect(entry);

So, the change you are making is that if the vma being copied (and I had
to check with dup_mmap() to see that vma here is indeed the src vma) is
the original "owner", then its pte is left (perhaps) writable, and only
the child's is write-protected.

But that means that modifications made to this page by the parent after
the fork will still be visible to the child, until such time as the child
writes to this area, if ever.

That is not how COW protection behaves for a normal page: it's symmetric,
neither parent nor child sees modifications made by the other after fork.

Now, hugetlb pages are not normal in all kinds of ways, but here you
appear to be changing the semantics of private hugetlb mappings, in
a way that could break applications, and has security implications.

Or am I misunderstanding?

Hugh

> + } else {
> + huge_ptep_set_wrprotect(src, addr, src_pte);
> + entry = huge_ptep_get(src_pte);
> + }
> ptepage = pte_page(entry);
> get_page(ptepage);
> page_dup_rmap(ptepage);
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/