Re: [PATCH] mm/hugetlb: Use the right pte val for compare in hugetlb_cow

From: Aneesh Kumar K.V
Date: Wed Oct 19 2016 - 01:11:49 EST


Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes:

> On Tue, 18 Oct 2016 21:12:45 +0530 "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> wrote:
>
>> We cannot use the pte value used in set_pte_at for pte_same comparison,
>> because archs like ppc64, filter/add new pte flag in set_pte_at. Instead
>> fetch the pte value inside hugetlb_cow. We are comparing pte value to
>> make sure the pte didn't change since we dropped the page table lock.
>> hugetlb_cow get called with page table lock held, and we can take a copy
>> of the pte value before we drop the page table lock.
>>
>> With hugetlbfs, we optimize the MAP_PRIVATE write fault path with no
>> previous mapping (huge_pte_none entries), by forcing a cow in the fault
>> path. This avoid take an addition fault to covert a read-only mapping
>> to read/write. Here we were comparing a recently instantiated pte (via
>> set_pte_at) to the pte values from linux page table. As explained above
>> on ppc64 such pte_same check returned wrong result, resulting in us
>> taking an additional fault on ppc64.
>
> From my reading this is a minor performance improvement and a -stable
> backport isn't needed. But it is unclear whether the impact warrants a
> 4.9 merge.

This patch workaround the issue reported at https://lkml.kernel.org/r/57FF7BB4.1070202@xxxxxxxxxx
The reason for that OOM was a reserve count accounting issue which
happens in the error path of hugetlb_cow. Not this patch avoid us taking
the error path and hence we don't have the reported OOM.

An actual fix for that issue is being worked on by Mike Kravetz.

>
> Please be careful about describing end-user visible impacts when fixing
> bugs, thanks.

-aneesh