Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap

From: Alexander Duyck
Date: Wed Oct 10 2018 - 12:39:11 EST


On 10/10/2018 2:58 AM, Michal Hocko wrote:
On Tue 09-10-18 13:26:41, Alexander Duyck wrote:
[...]
I would think with that being the case we still probably need the call to
__SetPageReserved to set the bit with the expectation that it will not be
cleared for device-pages since the pages are not onlined. Removing the call
to __SetPageReserved would probably introduce a number of regressions as
there are multiple spots that use the reserved bit to determine if a page
can be swapped out to disk, mapped as system memory, or migrated.

PageReserved is meant to tell any potential pfn walkers that might get
to this struct page to back off and not touch it. Even though
ZONE_DEVICE doesn't online pages in traditional sense it makes those
pages available for further use so the page reserved bit should be
cleared.

So from what I can tell that isn't necessarily the case. Specifically if the pagemap type is MEMORY_DEVICE_PRIVATE or MEMORY_DEVICE_PUBLIC both are special cases where the memory may not be accessible to the CPU or cannot be pinned in order to allow for eviction.

The specific case that Dan and Yi are referring to is for the type MEMORY_DEVICE_FS_DAX. For that type I could probably look at not setting the reserved bit. Part of me wants to say that we should wait and clear the bit later, but that would end up just adding time back to initialization. At this point I would consider the change more of a follow-up optimization rather than a fix though since this is tailoring things specifically for DAX versus the other ZONE_DEVICE types.