Re: [PATCH] mm, zone_device: replace {get, put}_zone_device_page() with a single reference

From: Dan Williams
Date: Thu Apr 27 2017 - 12:14:19 EST


On Thu, Apr 27, 2017 at 9:11 AM, Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
>
>
> On 26/04/17 06:55 PM, Dan Williams wrote:
>> @@ -277,7 +269,10 @@ struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
>> *
>> * Notes:
>> * 1/ @ref must be 'live' on entry and 'dead' before devm_memunmap_pages() time
>> - * (or devm release event).
>> + * (or devm release event). The expected order of events is that @ref has
>> + * been through percpu_ref_kill() before devm_memremap_pages_release(). The
>> + * wait for the completion of kill and percpu_ref_exit() must occur after
>> + * devm_memremap_pages_release().
>> *
>> * 2/ @res is expected to be a host memory range that could feasibly be
>> * treated as a "System RAM" range, i.e. not a device mmio range, but
>> @@ -379,6 +374,7 @@ void *devm_memremap_pages(struct device *dev, struct resource *res,
>> */
>> list_del(&page->lru);
>> page->pgmap = pgmap;
>> + percpu_ref_get(ref);
>> }
>> devres_add(dev, page_map);
>> return __va(res->start);
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 5dabf444d724..01267dda6668 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -97,6 +97,16 @@ static void __put_compound_page(struct page *page)
>>
>> void __put_page(struct page *page)
>> {
>> + if (is_zone_device_page(page)) {
>> + put_dev_pagemap(page->pgmap);
>> +
>> + /*
>> + * The page belong to device, do not return it to
>> + * page allocator.
>> + */
>> + return;
>> + }
>> +
>> if (unlikely(PageCompound(page)))
>> __put_compound_page(page);
>> else
>>
>
> Forgive me if I'm missing something but this doesn't make sense to me.
> We are taking a reference once when the region is initialized and
> releasing it every time a page within the region's reference count drops
> to zero. That does not seem to be symmetric and I don't see how it
> tracks that pages are in use. Shouldn't get_dev_pagemap be called when
> any page is allocated or something like that (ie. the inverse of
> __put_page)?

You're overlooking that the page reference count 1 after
arch_add_memory(). So at the end of time we're just dropping the
arch_add_memory() reference to release the page and related
dev_pagemap.