Ugh, when did all this HMM specific manipulation sneak into the
generic ZONE_DEVICE path? It used to be gated by pgmap type with its
own put_zone_device_private_page(). For example it's certainly
unnecessary and might be broken (would need to check) to call
mem_cgroup_uncharge() on a DAX page. ZONE_DEVICE users are not a
monolith and the HMM use case leaks pages into code paths that DAX
explicitly avoids.
It's been this way for a while and I did not react previously,
apologies for that. I think __ClearPageActive, __ClearPageWaiters, and
mem_cgroup_uncharge, belong behind a device-private conditional. The
history here is:
Move some, but not all HMM specifics to hmm_devmem_free():
2fa147bdbf67 mm, dev_pagemap: Do not clear ->mapping on final put
Remove the clearing of mapping since no upstream consumers needed it:
b7a523109fb5 mm: don't clear ->mapping in hmm_devmem_free
Add it back in once an upstream consumer arrived:
7ab0ad0e74f8 mm/hmm: fix ZONE_DEVICE anon page mapping reuse
We're now almost entirely free of ->page_free callbacks except for
that weird nouveau case, can that FIXME in nouveau_dmem_page_free()
also result in killing the ->page_free() callback altogether? In the
meantime I'm proposing a cleanup like this:
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index ad8e4df1282b..4eae441f86c9 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -337,13 +337,7 @@ static void pmem_release_disk(void *__pmem)
put_disk(pmem->disk);
}
-static void pmem_pagemap_page_free(struct page *page)
-{
- wake_up_var(&page->_refcount);
-}
-
static const struct dev_pagemap_ops fsdax_pagemap_ops = {
- .page_free = pmem_pagemap_page_free,
.kill = pmem_pagemap_kill,
.cleanup = pmem_pagemap_cleanup,
};
diff --git a/mm/memremap.c b/mm/memremap.c
index 03ccbdfeb697..157edb8f7cf8 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -419,12 +419,6 @@ void __put_devmap_managed_page(struct page *page)
* holds a reference on the page.
*/
if (count == 1) {
- /* Clear Active bit in case of parallel mark_page_accessed */
- __ClearPageActive(page);
- __ClearPageWaiters(page);
-
- mem_cgroup_uncharge(page);
-
/*
* When a device_private page is freed, the page->mapping field
* may still contain a (stale) mapping value. For example, the
@@ -446,10 +440,17 @@ void __put_devmap_managed_page(struct page *page)
* handled differently or not done at all, so there is no need
* to clear page->mapping.
*/
- if (is_device_private_page(page))
- page->mapping = NULL;
+ if (is_device_private_page(page)) {
+ /* Clear Active bit in case of parallel
mark_page_accessed */
+ __ClearPageActive(page);
+ __ClearPageWaiters(page);
- page->pgmap->ops->page_free(page);
+ mem_cgroup_uncharge(page);
+
+ page->mapping = NULL;
+ page->pgmap->ops->page_free(page);
+ } else
+ wake_up_var(&page->_refcount);
} else if (!count)
__put_page(page);
}