On Mon, Nov 25, 2019 at 10:24:38AM -0800, Yang Shi wrote:
Okay, but I'm not sure that documention alone will be enough. We need
On 11/25/19 1:36 AM, Kirill A. Shutemov wrote:
On Sat, Nov 23, 2019 at 09:05:32AM +0800, Yang Shi wrote:Yes, it doesn't. Will clarify this in the commit log.
Currently when truncating shmem file, if the range is partial of THPWe need to clarify interaction with khugepaged. This implementation
(start or end is in the middle of THP), the pages actually will just get
cleared rather than being freed unless the range cover the whole THP.
Even though all the subpages are truncated (randomly or sequentially),
the THP may still be kept in page cache. This might be fine for some
usecases which prefer preserving THP.
But, when doing balloon inflation in QEMU, QEMU actually does hole punch
or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
So, when using shmem THP as memory backend QEMU inflation actually doesn't
work as expected since it doesn't free memory. But, the inflation
usecase really needs get the memory freed. Anonymous THP will not get
freed right away too but it will be freed eventually when all subpages are
unmapped, but shmem THP would still stay in page cache.
To protect the usecases which may prefer preserving THP, introduce a
new fallocate mode: FALLOC_FL_SPLIT_HPAGE, which means spltting THP is
preferred behavior if truncating partial THP. This mode just makes
sense to tmpfs for the time being.
doesn't do anything to prevent khugepaged from collapsing the range back
to THP just after the split.
proper design.
Pin the page before pagevec_release() and avoid get_page_unless_zero().If I don't drop the pins the THP can't be split. And, there might be more@@ -976,8 +1022,31 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,Doing get_page_unless_zero() just after you've dropped the pin for the
}
unlock_page(page);
}
+rescan_split:
pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec);
+
+ if (split && PageTransCompound(page)) {
+ /* The THP may get freed under us */
+ if (!get_page_unless_zero(compound_head(page)))
+ goto rescan_out;
+
+ lock_page(page);
+
+ /*
+ * The extra pins from page cache lookup have been
+ * released by pagevec_release().
+ */
+ if (!split_huge_page(page)) {
+ unlock_page(page);
+ put_page(page);
+ /* Re-look up page cache from current index */
+ goto again;
+ }
+ unlock_page(page);
+ put_page(page);
+ }
+rescan_out:
index++;
}
page looks very suboptimal.
than one pins from find_get_entries() if I read the code correctly. For
example, truncate 8K length in the middle of THP, the THP's refcount would
get bumpped twice since two sub pages would be returned.
Current code is buggy. You need to check that the page is still belong to
the file after speculative lookup.