Re: [PATCH v2] mm, zone_device: replace {get, put}_zone_device_page() with a single reference

From: Dan Williams
Date: Sun Apr 30 2017 - 21:42:49 EST


On Sun, Apr 30, 2017 at 4:14 PM, Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
> On Sat, Apr 29, 2017 at 01:17:26PM +0300, Kirill A. Shutemov wrote:
>> On Fri, Apr 28, 2017 at 03:33:07PM -0400, Jerome Glisse wrote:
>> > On Fri, Apr 28, 2017 at 12:22:24PM -0700, Dan Williams wrote:
>> > > Are you sure about needing to hook the 2 -> 1 transition? Could we
>> > > change ZONE_DEVICE pages to not have an elevated reference count when
>> > > they are created so you can keep the HMM references out of the mm hot
>> > > path?
>> >
>> > 100% sure on that :) I need to callback into driver for 2->1 transition
>> > no way around that. If we change ZONE_DEVICE to not have an elevated
>> > reference count that you need to make a lot more change to mm so that
>> > ZONE_DEVICE is never use as fallback for memory allocation. Also need
>> > to make change to be sure that ZONE_DEVICE page never endup in one of
>> > the path that try to put them back on lru. There is a lot of place that
>> > would need to be updated and it would be highly intrusive and add a
>> > lot of special cases to other hot code path.
>>
>> Could you explain more on where the requirement comes from or point me to
>> where I can read about this.
>>
>
> HMM ZONE_DEVICE pages are use like other pages (anonymous or file back page)
> in _any_ vma. So i need to know when a page is freed ie either as result of
> unmap, exit or migration or anything that would free the memory. For zone
> device a page is free once its refcount reach 1 so i need to catch refcount
> transition from 2->1
>
> This is the only way i can inform the device that the page is now free. See
>
> https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-v21&id=52da8fe1a088b87b5321319add79e43b8372ed7d
>
> There is _no_ way around that.

Ok, but I need to point out that this not a ZONE_DEVICE requirement.
This is an HMM-specific need. So, this extra reference counting should
be clearly delineated as part of the MEMORY_DEVICE_PRIVATE use case.

Can we hide the extra reference counting behind a static branch so
that the common case fast path doesn't get slower until a HMM device
shows up?