Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE

From: Michal Hocko
Date: Tue May 30 2017 - 06:39:39 EST


On Tue 30-05-17 13:19:22, Mike Rapoport wrote:
> On Tue, May 30, 2017 at 09:44:08AM +0200, Michal Hocko wrote:
> > On Wed 24-05-17 17:27:36, Mike Rapoport wrote:
> > > On Wed, May 24, 2017 at 01:18:00PM +0200, Michal Hocko wrote:
> > [...]
> > > > Why cannot khugepaged simply skip over all VMAs which have userfault
> > > > regions registered? This would sound like a less error prone approach to
> > > > me.
> > >
> > > khugepaged does skip over VMAs which have userfault. We could register the
> > > regions with userfault before populating them to avoid collapses in the
> > > transition period.
> >
> > Why cannot you register only post-copy regions and "manually" copy the
> > pre-copy parts?
>
> We can register only post-copy regions, but this will cause VMA
> fragmentation. Now we register the entire VMA with userfaultfd, no matter
> how many pages were dirtied there since the pre-dump. If we register only
> post-copy regions, we will split out the VMAs for those regions.

Is this really a problem, though?

> > > But then we'll have to populate these regions with
> > > UFFDIO_COPY which adds quite an overhead.
> >
> > How big is the performance impact?
>
> I don't have the numbers handy, but for each post-copy range it means that
> instead of memcpy() we will use ioctl(UFFDIO_COPY).

It would be good to measure that though. You are proposing a new user
API and the THP api is quite convoluted already so there better be a
very good reason to add a new API. So far I can only see that it would
be more convinient to add another madvise command and that is rather
insufficient justification IMHO. Also do you expect somebody else would
use new madvise? What would be the usecase?
--
Michal Hocko
SUSE Labs