Re: [PATCH -mm] mm, hugetlb: Pass fault address to no page handler

From: Mike Kravetz
Date: Wed May 16 2018 - 15:33:37 EST

Next message: Andy Shevchenko: "Re: [PATCH v4 0/8] Introduce the for_each_set_clump macro"
Previous message: Bjorn Helgaas: "Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC"
In reply to: Michal Hocko: "Re: [PATCH -mm] mm, hugetlb: Pass fault address to no page handler"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 05/16/2018 02:12 AM, Michal Hocko wrote:
> On Tue 15-05-18 08:57:56, Huang, Ying wrote:
>> From: Huang Ying <ying.huang@xxxxxxxxx>
>>
>> This is to take better advantage of huge page clearing
>> optimization (c79b57e462b5d, "mm: hugetlb: clear target sub-page last
>> when clearing huge page"). Which will clear to access sub-page last
>> to avoid the cache lines of to access sub-page to be evicted when
>> clearing other sub-pages. This needs to get the address of the
>> sub-page to access, that is, the fault address inside of the huge
>> page. So the hugetlb no page fault handler is changed to pass that
>> information. This will benefit workloads which don't access the begin
>> of the huge page after page fault.
>>
>> With this patch, the throughput increases ~28.1% in vm-scalability
>> anon-w-seq test case with 88 processes on a 2 socket Xeon E5 2699 v4
>> system (44 cores, 88 threads). The test case creates 88 processes,
>> each process mmap a big anonymous memory area and writes to it from
>> the end to the begin. For each process, other processes could be seen
>> as other workload which generates heavy cache pressure. At the same
>> time, the cache miss rate reduced from ~36.3% to ~25.6%, the
>> IPC (instruction per cycle) increased from 0.3 to 0.37, and the time
>> spent in user space is reduced ~19.3%
>
> This paragraph is confusing as Mike mentioned already. It would be
> probably more helpful to see how was the test configured to use hugetlb
> pages and what is the end benefit.
>
> I do not have any real objection to the implementation so feel free to
> add
> Acked-by: Michal Hocko <mhocko@xxxxxxxx>
> I am just wondering what is the usecase driving this. Or is it just a
> generic optimization that always makes sense to do? Indicating that in
> the changelog would be helpful as well.

I just noticed that the optimization was not added for 'gigantic' pages.
Should we consider adding support for gigantic pages as well? It may be
that the cache miss cost is insignificant when added to the time required
to clear a 1GB (for x86) gigantic page.

One more thing, I'm guessing the copy_huge/gigantic_page() routines would
see a similar benefit. Specifically, for copies as a result of a COW.
Is that another area to consider?

That gets back to Michal's question of a specific use case or generic
optimization. Unless code is simple (as in this patch), seems like we should
hold off on considering additional optimizations unless there is a specific
use case.

I'm still OK with this change.
--
Mike Kravetz

Next message: Andy Shevchenko: "Re: [PATCH v4 0/8] Introduce the for_each_set_clump macro"
Previous message: Bjorn Helgaas: "Re: [PATCH v16 8/9] PCI/DPC: Unify and plumb error handling into DPC"
In reply to: Michal Hocko: "Re: [PATCH -mm] mm, hugetlb: Pass fault address to no page handler"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]