Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap

From: Alexander Duyck
Date: Wed Oct 10 2018 - 11:28:07 EST




On 10/10/2018 5:52 AM, Yi Zhang wrote:
On 2018-10-09 at 14:19:32 -0700, Dan Williams wrote:
On Tue, Oct 9, 2018 at 1:34 PM Alexander Duyck
<alexander.h.duyck@xxxxxxxxxxxxxxx> wrote:

On 10/9/2018 11:04 AM, Dan Williams wrote:
On Tue, Oct 9, 2018 at 3:21 AM Yi Zhang <yi.z.zhang@xxxxxxxxxxxxxxx> wrote:
[..]
That comment is incorrect, device-pages are never onlined. So I think
we can just skip that call to __SetPageReserved() unless the memory
range is MEMORY_DEVICE_{PRIVATE,PUBLIC}.


When pages are "onlined" via __free_pages_boot_core they clear the
reserved bit, that is the reason for the comment. The reserved bit is
meant to indicate that the page cannot be swapped out or moved based on
the description of the bit.

...but ZONE_DEVICE pages are never onlined so I would expect
memmap_init_zone_device() to know that detail.

I would think with that being the case we still probably need the call
to __SetPageReserved to set the bit with the expectation that it will
not be cleared for device-pages since the pages are not onlined.
Removing the call to __SetPageReserved would probably introduce a number
of regressions as there are multiple spots that use the reserved bit to
determine if a page can be swapped out to disk, mapped as system memory,
or migrated.

Another things, it seems page_init/set_reserved already been done in the
move_pfn_range_to_zone
|-->memmap_init_zone
|-->for_each_page_in_pfn
|-->__init_single_page
|-->SetPageReserved

Why we haven't remove these redundant initial in memmap_init_zone?

Correct me if I missed something.

In this case it isn't redundant as only the vmmemmap pages are initialized in memmap_init_zone now. So all of the pages that are going to be used as device pages are not initialized until the call to memmap_init_zone_device. What I did is split the initialization of the pages into two parts in order to allow us to initialize the pages outside of the hotplug lock.


Right, this is what Yi is working on... the PageReserved flag is
problematic for KVM. Auditing those locations it seems as long as we
teach hibernation to avoid ZONE_DEVICE ranges we can safely not set
the reserved flag for DAX pages. What I'm trying to avoid is a local
KVM hack to check for DAX pages when the Reserved flag is not
otherwise needed.
Thanks Dan. Provide the patch link.

https://lore.kernel.org/lkml/cover.1536342881.git.yi.z.zhang@xxxxxxxxxxxxxxx

So it looks like your current logic is just working around the bit then since it just allows for reserved DAX pages.