Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially

From: Yang Shi
Date: Tue Feb 04 2020 - 18:27:36 EST




On 1/14/20 11:28 AM, Yang Shi wrote:


On 12/4/19 4:15 PM, Hugh Dickins wrote:
On Wed, 4 Dec 2019, Yang Shi wrote:

Currently when truncating shmem file, if the range is partial of THP
(start or end is in the middle of THP), the pages actually will just get
cleared rather than being freed unless the range cover the whole THP.
Even though all the subpages are truncated (randomly or sequentially),
the THP may still be kept in page cache. This might be fine for some
usecases which prefer preserving THP.

But, when doing balloon inflation in QEMU, QEMU actually does hole punch
or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
So, when using shmem THP as memory backend QEMU inflation actually doesn't
work as expected since it doesn't free memory. But, the inflation
usecase really needs get the memory freed. Anonymous THP will not get
freed right away too but it will be freed eventually when all subpages are
unmapped, but shmem THP would still stay in page cache.

Split THP right away when doing partial hole punch, and if split fails
just clear the page so that read to the hole punched area would return
zero.

Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
---
v2: * Adopted the comment from Kirill.
ÂÂÂÂ * Dropped fallocate mode flag, THP split is the default behavior.
ÂÂÂÂ * Blended Huge's implementation with my v1 patch. TBH I'm not very keen to
ÂÂÂÂÂÂ Hugh's find_get_entries() hack (basically neutral), but without that hack
Thanks for giving it a try. I'm not neutral about my find_get_entries()
hack: it surely had to go (without it, I'd have just pushed my own patch).
I've not noticed anything wrong with your patch, and it's in the right
direction, but I'm still not thrilled with it. I also remember that I
got the looping wrong in my first internal attempt (fixed in what I sent),
and need to be very sure of the try-again-versus-move-on-to-next conditions
before agreeing to anything. No rush, I'll come back to this in days or
month ahead: I'll try to find a less gotoey blend of yours and mine.

Hi Hugh,

Any update on this one?

Thanks,
Yang

Hi Hugh,

Ping. Any comment on this? I really hope it can make v5.7.

Thanks,
Yang



Hugh

ÂÂÂÂÂÂ we have to rely on pagevec_release() to release extra pins and play with
ÂÂÂÂÂÂ goto. This version does in this way. The patch is bigger than Hugh's due
ÂÂÂÂÂÂ to extra comments to make the flow clear.

 mm/shmem.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 83 insertions(+), 37 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 220be9f..1ae0c7f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
ÂÂÂÂÂ long nr_swaps_freed = 0;
ÂÂÂÂÂ pgoff_t index;
ÂÂÂÂÂ int i;
+ÂÂÂ bool split = false;
+ÂÂÂ struct page *page = NULL;
 Â if (lend == -1)
ÂÂÂÂÂÂÂÂÂ end = -1;ÂÂÂ /* unsigned, so actually very big */
 Â pagevec_init(&pvec);
ÂÂÂÂÂ index = start;
+retry:
ÂÂÂÂÂ while (index < end) {
ÂÂÂÂÂÂÂÂÂ pvec.nr = find_get_entries(mapping, index,
ÂÂÂÂÂÂÂÂÂÂÂÂÂ min(end - index, (pgoff_t)PAGEVEC_SIZE),
@@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
ÂÂÂÂÂÂÂÂÂ if (!pvec.nr)
ÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
ÂÂÂÂÂÂÂÂÂ for (i = 0; i < pagevec_count(&pvec); i++) {
-ÂÂÂÂÂÂÂÂÂÂÂ struct page *page = pvec.pages[i];
+ÂÂÂÂÂÂÂÂÂÂÂ split = false;
+ÂÂÂÂÂÂÂÂÂÂÂ page = pvec.pages[i];
 Â index = indices[i];
ÂÂÂÂÂÂÂÂÂÂÂÂÂ if (index >= end)
@@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
ÂÂÂÂÂÂÂÂÂÂÂÂÂ if (!trylock_page(page))
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
 - if (PageTransTail(page)) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* Middle of THP: zero out the page */
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ clear_highpage(page);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unlock_page(page);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
-ÂÂÂÂÂÂÂÂÂÂÂ } else if (PageTransHuge(page)) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (index == round_down(end, HPAGE_PMD_NR)) {
+ÂÂÂÂÂÂÂÂÂÂÂ if (PageTransCompound(page) && !unfalloc) {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (PageHead(page) &&
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ index != round_down(end, HPAGE_PMD_NR)) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /*
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * Range ends in the middle of THP:
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * zero out the page
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * Fall through when punching whole
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * THP.
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ clear_highpage(page);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unlock_page(page);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ index += HPAGE_PMD_NR - 1;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ i += HPAGE_PMD_NR - 1;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ } else {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /*
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * Split THP for any partial hole
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * punch.
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ get_page(page);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ split = true;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto split;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ }
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ index += HPAGE_PMD_NR - 1;
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ i += HPAGE_PMD_NR - 1;
ÂÂÂÂÂÂÂÂÂÂÂÂÂ }
 Â if (!unfalloc || !PageUptodate(page)) {
@@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
ÂÂÂÂÂÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂÂÂÂÂÂÂ unlock_page(page);
ÂÂÂÂÂÂÂÂÂ }
+split:
ÂÂÂÂÂÂÂÂÂ pagevec_remove_exceptionals(&pvec);
ÂÂÂÂÂÂÂÂÂ pagevec_release(&pvec);
ÂÂÂÂÂÂÂÂÂ cond_resched();
+
+ÂÂÂÂÂÂÂ if (split) {
+ÂÂÂÂÂÂÂÂÂÂÂ /*
+ÂÂÂÂÂÂÂÂÂÂÂÂ * The pagevec_release() released all extra pins
+ * from pagevec lookup. And we hold an extra pin
+ÂÂÂÂÂÂÂÂÂÂÂÂ * and still have the page locked under us.
+ÂÂÂÂÂÂÂÂÂÂÂÂ */
+ÂÂÂÂÂÂÂÂÂÂÂ if (!split_huge_page(page)) {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unlock_page(page);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ put_page(page);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* Re-lookup page cache from current index */
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto retry;
+ÂÂÂÂÂÂÂÂÂÂÂ }
+
+ÂÂÂÂÂÂÂÂÂÂÂ /* Fail to split THP, move to next index */
+ÂÂÂÂÂÂÂÂÂÂÂ unlock_page(page);
+ÂÂÂÂÂÂÂÂÂÂÂ put_page(page);
+ÂÂÂÂÂÂÂ }
+
ÂÂÂÂÂÂÂÂÂ index++;
ÂÂÂÂÂ }
 @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
ÂÂÂÂÂÂÂÂÂ return;
 Â index = start;
+again:
ÂÂÂÂÂ while (index < end) {
ÂÂÂÂÂÂÂÂÂ cond_resched();
 @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
ÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
ÂÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂÂÂ for (i = 0; i < pagevec_count(&pvec); i++) {
-ÂÂÂÂÂÂÂÂÂÂÂ struct page *page = pvec.pages[i];
+ÂÂÂÂÂÂÂÂÂÂÂ split = false;
+ÂÂÂÂÂÂÂÂÂÂÂ page = pvec.pages[i];
 Â index = indices[i];
ÂÂÂÂÂÂÂÂÂÂÂÂÂ if (index >= end)
@@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 Â lock_page(page);
 - if (PageTransTail(page)) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* Middle of THP: zero out the page */
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ clear_highpage(page);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unlock_page(page);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /*
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * Partial thp truncate due 'start' in middle
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * of THP: don't need to look on these pages
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * again on !pvec.nr restart.
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (index != round_down(end, HPAGE_PMD_NR))
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ start++;
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
-ÂÂÂÂÂÂÂÂÂÂÂ } else if (PageTransHuge(page)) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (index == round_down(end, HPAGE_PMD_NR)) {
+ÂÂÂÂÂÂÂÂÂÂÂ if (PageTransCompound(page) && !unfalloc) {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (PageHead(page) &&
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ index != round_down(end, HPAGE_PMD_NR)) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /*
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * Range ends in the middle of THP:
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * zero out the page
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * Fall through when punching whole
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * THP.
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ clear_highpage(page);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unlock_page(page);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ index += HPAGE_PMD_NR - 1;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ i += HPAGE_PMD_NR - 1;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ } else {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /*
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * Split THP for any partial hole
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * punch.
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ get_page(page);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ split = true;
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto rescan_split;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ }
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ index += HPAGE_PMD_NR - 1;
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ i += HPAGE_PMD_NR - 1;
ÂÂÂÂÂÂÂÂÂÂÂÂÂ }
 Â if (!unfalloc || !PageUptodate(page)) {
@@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
ÂÂÂÂÂÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂÂÂÂÂÂÂ unlock_page(page);
ÂÂÂÂÂÂÂÂÂ }
+rescan_split:
ÂÂÂÂÂÂÂÂÂ pagevec_remove_exceptionals(&pvec);
ÂÂÂÂÂÂÂÂÂ pagevec_release(&pvec);
+
+ÂÂÂÂÂÂÂ if (split) {
+ÂÂÂÂÂÂÂÂÂÂÂ /*
+ÂÂÂÂÂÂÂÂÂÂÂÂ * The pagevec_release() released all extra pins
+ * from pagevec lookup. And we hold an extra pin
+ÂÂÂÂÂÂÂÂÂÂÂÂ * and still have the page locked under us.
+ÂÂÂÂÂÂÂÂÂÂÂÂ */
+ÂÂÂÂÂÂÂÂÂÂÂ if (!split_huge_page(page)) {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unlock_page(page);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ put_page(page);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* Re-lookup page cache from current index */
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto again;
+ÂÂÂÂÂÂÂÂÂÂÂ }
+
+ÂÂÂÂÂÂÂÂÂÂÂ /*
+ÂÂÂÂÂÂÂÂÂÂÂÂ * Split fail, clear the page then move to next
+ÂÂÂÂÂÂÂÂÂÂÂÂ * index.
+ÂÂÂÂÂÂÂÂÂÂÂÂ */
+ÂÂÂÂÂÂÂÂÂÂÂ clear_highpage(page);
+
+ÂÂÂÂÂÂÂÂÂÂÂ unlock_page(page);
+ÂÂÂÂÂÂÂÂÂÂÂ put_page(page);
+ÂÂÂÂÂÂÂ }
+
ÂÂÂÂÂÂÂÂÂ index++;
ÂÂÂÂÂ }
 --
1.8.3.1