Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially

From: Hugh Dickins
Date: Mon Feb 24 2020 - 22:46:26 EST


On Tue, 14 Jan 2020, Yang Shi wrote:
> On 12/4/19 4:15 PM, Hugh Dickins wrote:
> > On Wed, 4 Dec 2019, Yang Shi wrote:
> >
> > > Currently when truncating shmem file, if the range is partial of THP
> > > (start or end is in the middle of THP), the pages actually will just get
> > > cleared rather than being freed unless the range cover the whole THP.
> > > Even though all the subpages are truncated (randomly or sequentially),
> > > the THP may still be kept in page cache. This might be fine for some
> > > usecases which prefer preserving THP.
> > >
> > > But, when doing balloon inflation in QEMU, QEMU actually does hole punch
> > > or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
> > > So, when using shmem THP as memory backend QEMU inflation actually
> > > doesn't
> > > work as expected since it doesn't free memory. But, the inflation
> > > usecase really needs get the memory freed. Anonymous THP will not get
> > > freed right away too but it will be freed eventually when all subpages
> > > are
> > > unmapped, but shmem THP would still stay in page cache.
> > >
> > > Split THP right away when doing partial hole punch, and if split fails
> > > just clear the page so that read to the hole punched area would return
> > > zero.
> > >
> > > Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> > > Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> > > Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
> > > ---
> > > v2: * Adopted the comment from Kirill.
> > > * Dropped fallocate mode flag, THP split is the default behavior.
> > > * Blended Huge's implementation with my v1 patch. TBH I'm not very
> > > keen to
> > > Hugh's find_get_entries() hack (basically neutral), but without
> > > that hack
> > Thanks for giving it a try. I'm not neutral about my find_get_entries()
> > hack: it surely had to go (without it, I'd have just pushed my own patch).
> > I've not noticed anything wrong with your patch, and it's in the right
> > direction, but I'm still not thrilled with it. I also remember that I
> > got the looping wrong in my first internal attempt (fixed in what I sent),
> > and need to be very sure of the try-again-versus-move-on-to-next conditions
> > before agreeing to anything. No rush, I'll come back to this in days or
> > month ahead: I'll try to find a less gotoey blend of yours and mine.
>
> Hi Hugh,
>
> Any update on this one?

I apologize for my dreadful unresponsiveness.

I've spent the last day trying to love yours, and considering how mine
might be improved; but repeatedly arrived at the conclusion that mine is
about as nice as we can manage, just needing more comment to dignify it.

I did willingly call my find_get_entries() stopping at PageTransCompound
a hack; but now think we should just document that behavior and accept it.
The contortions of your patch come from the need to release those 14 extra
unwanted references: much better not to get them in the first place.

Neither of us handle a failed split optimally, we treat every tail as an
opportunity to retry: which is good to recover from transient failures,
but probably excessive. And we both have to restart the pagevec after
each attempt, but at least I don't get 14 unwanted extras each time.

What of other find_get_entries() users and its pagevec_lookup_entries()
wrapper: does an argument to select this "stop at PageTransCompound"
behavior need to be added?

No. The pagevec_lookup_entries() calls from mm/truncate.c prefer the
new behavior - evicting the head from page cache removes all the tails
along with it, so getting the tails a waste of time there too, just as
it was in shmem_undo_range().

Whereas shmem_unlock_mapping() and shmem_seek_hole_data(), as they
stand, are not removing pages from cache, but actually wanting to plod
through the tails. So those two would be slowed a little, while the
pagevec collects 1 instead of 15 pages. However: if we care about those
two at all, it's clear that we should speed them up, by noticing the
PageTransCompound case and accelerating over it, instead of plodding
through the tails. Since we haven't bothered with that optimization
yet, I'm not very worried to slow them down a little until it's done.

I must take a look at where we stand with tmpfs 64-bit ino tomorrow,
then recomment my shmem_punch_compound() patch and post it properly,
probably day after. (Reviewing it, I see I have a "page->index +
HPAGE_PMD_NR <= end" test which needs correcting - I tend to live
in the past, before v4.13 doubled the 32-bit MAX_LFS_FILESIZE.)

I notice that this thread has veered off into QEMU ballooning
territory: which may indeed be important, but there's nothing at all
that I can contribute on that. I certainly do not want to slow down
anything important, but remain convinced that the correct filesystem
implementation for punching a hole is to punch a hole.

Hugh