Re: [PATCH RFC v2] mm: Add f_ops->populate()
From: Jarkko Sakkinen
Date: Mon Mar 07 2022 - 10:44:02 EST
On Mon, Mar 07, 2022 at 02:37:48PM +0000, Matthew Wilcox wrote:
> On Sun, Mar 06, 2022 at 03:41:54PM -0800, Dave Hansen wrote:
> > In short: page faults stink. The core kernel has lots of ways of
> > avoiding page faults like madvise(MADV_WILLNEED) or mmap(MAP_POPULATE).
> > But, those only work on normal RAM that the core mm manages.
> >
> > SGX is weird. SGX memory is managed outside the core mm. It doesn't
> > have a 'struct page' and get_user_pages() doesn't work on it. Its VMAs
> > are marked with VM_IO. So, none of the existing methods for avoiding
> > page faults work on SGX memory.
> >
> > This essentially helps extend existing "normal RAM" kernel ABIs to work
> > for avoiding faults for SGX too. SGX users want to enjoy all of the
> > benefits of a delayed allocation policy (better resource use,
> > overcommit, NUMA affinity) but without the cost of millions of faults.
>
> We have a mechanism for dynamically reducing the number of page faults
> already; it's just buried in the page cache code. You have vma->vm_file,
> which contains a file_ra_state. You can use this to track where
> recent faults have been and grow the size of the region you fault in
> per page fault. You don't have to (indeed probably don't want to) use
> the same algorithm as the page cache, but the _principle_ is the same --
> were recent speculative faults actually used; should we grow the number
> of pages actually faulted in, or is this a random sparse workload where
> we want to allocate individual pages.
>
> Don't rely on the user to ask. They don't know.
This sounds like a possibility. I'll need to study it properly first
though. Thank you for pointing this out.
BR, Jarkko