Re: [PATCH v4] mm, compaction: Skip all non-migratable pages during scan

From: John Hubbard
Date: Fri May 26 2023 - 22:11:26 EST


On 5/26/23 11:50, David Hildenbrand wrote:
On 26.05.23 20:46, Matthew Wilcox wrote:
On Fri, May 26, 2023 at 06:46:15PM +0200, David Hildenbrand wrote:
On 26.05.23 18:44, Matthew Wilcox wrote:
On Fri, May 26, 2023 at 09:44:34AM -0600, Khalid Aziz wrote:
Oh, I think I found it!  pin_user_pages_remote() is called by
vaddr_get_pfns().  If these are the pages you're concerned about,
then the efficient way to do what you want is simply to call
folio_maybe_dma_pinned().  Far more efficient than the current mess
of total_mapcount().

vfio pinned pages triggered this change. Wouldn't checking refcounts against
mapcount provide a more generalized way of detecting non-migratable pages?

Well, you changed the comment to say that we were concerned about
long-term pins.  If we are, than folio_maybe_dma_pinned() is how to test
for long-term pins.  If we want to skip pages which are short-term pinned,
then we need to not change the comment, and keep using mapcount/refcount
differences.


folio_maybe_dma_pinned() is all about FOLL_PIN, not FOLL_LONGTERM.

But according to our documentation, FOLL_LONGTERM implies FOLL_PIN.

Yes. But folio_maybe_dma_pinned() will indicate both, long-term pins and short-term pins. There really is no way to distinguish both, unfortunately.

Not yet, anyway. :)


Anyway, right now, the code skips any pages which are merely FOLL_GET,
so we'll skip fewer pages if we do only skip the FOLL_PIN ones,
regardless if we'd prefer to only skip the FOLL_LONGTERM ones.

folio_maybe_dma_pinned() would skip migrating any page that has more than
1024 references. (shared libraries?)

True, but maybe we should be skipping any page with that many mappings,
given how disruptive it is to the rest of the system to unmap a page
from >1024 processes.


So any user with 1024 processes can fragment physical memory? :/

Sorry, I'd like to minimize the usage of folio_maybe_dma_pinned().


I was actually thinking that we should minimize any more cases of
fragile mapcount and refcount comparison, which then leads to
Matthew's approach here!


thanks,
--
John Hubbard
NVIDIA