[PATCHv1, RFC 00/33] ext4: support of huge pages

From: Kirill A. Shutemov
Date: Mon Jul 25 2016 - 20:45:35 EST


Here's the first version of my patchset which intended to bring huge pages
to ext4. It's not yet ready for applying or serious use, but good enough
to show the approach.

The basics are the same as with tmpfs[1] which is in -mm tree and ext4
built on of it. The main difference is that we need to handle read out
from and write-back to backing storage.

Head page links buffers for whole huge page. Dirty/writeback tracking
happens on per-hugepage level.

We read out whole huge page at once. It required bumping BIO_MAX_PAGES to
512 to get it work on x86-64, which is hack. I'm not sure how to handle it
properly.

Readahead doesn't play with huge pages well too: 128k max readahead window,
assumption on page size, PageReadahead() to track hit/miss.
I've got it to allocate huge pages, but it doesn't provide any readahead as
such. I don't know how to do this right.

Unlike tmpfs, ext4 makes use of tags in radix-tree. The approach I used
for tmpfs -- 512 entries in radix-tree per-hugepages -- doesn't work well
if we want to have coherent view on tags. So the first 8 patches of the
patchset converts tmpfs to use multi-order entries in radix-tree.
The same infrastructure used for ext4.

Writeback works for simple cases, but xfstests manages to trigger BUG_ON()
eventually. That's what I work on currently. My understanding of writeback
process is still rather limited and any help would be appreciated.

For now I try to make xfstests run smoothly on filesystem with huge=always
and 4k block size. Once it will be done, I'll widen testing to 1k blocks,
encryption and bigalloc.

Any comments?

[1] http://lkml.kernel.org/r/1465222029-45942-1-git-send-email-kirill.shutemov@xxxxxxxxxxxxxxx

TODO:
- stabilize writeback;
- make ext4_move_extents() work with huge pages (split them?);
- check if memory reclaim process is adequate for huge pages with
backing storage (unnecessary split_huge_page() ?);
- handle shadow entries properly;
- encryption, 1k blocks, bigalloc, ...
Kirill A. Shutemov (27):
mm, shmem: swich huge tmpfs to multi-order radix-tree entries
Revert "radix-tree: implement radix_tree_maybe_preload_order()"
page-flags: relax page flag poliry for PG_error and PG_writeback
mm, rmap: account file thp pages
thp: allow splitting non-shmem file-backed THPs
truncate: make sure invalidate_mapping_pages() can discard huge pages
filemap: allocate huge page in page_cache_read(), if allowed
filemap: handle huge pages in do_generic_file_read()
filemap: allocate huge page in pagecache_get_page(), if allowed
filemap: handle huge pages in filemap_fdatawait_range()
HACK: readahead: alloc huge pages, if allowed
HACK: block: bump BIO_MAX_PAGES
mm: make write_cache_pages() work on huge pages
thp: introduce hpage_size() and hpage_mask()
fs: make block_read_full_page() be able to read huge page
fs: make block_write_{begin,end}() be able to handle huge pages
fs: make block_page_mkwrite() aware about huge pages
truncate: make truncate_inode_pages_range() aware about huge pages
ext4: make ext4_mpage_readpages() hugepage-aware
ext4: make ext4_writepage() work on huge pages
ext4: handle huge pages in ext4_page_mkwrite()
ext4: handle huge pages in __ext4_block_zero_page_range()
ext4: handle huge pages in ext4_da_write_end()
ext4: relax assert in ext4_da_page_release_reservation()
WIP: ext4: handle writeback with huge pages
mm, fs, ext4: expand use of page_mapping() and page_to_pgoff()
ext4, vfs: add huge= mount option

Matthew Wilcox (6):
tools: Add WARN_ON_ONCE
radix tree test suite: Allow GFP_ATOMIC allocations to fail
radix-tree: Add radix_tree_join
radix-tree: Add radix_tree_split
radix-tree: Add radix_tree_split_preload()
radix-tree: Handle multiorder entries being deleted by
replace_clear_tags

drivers/base/node.c | 6 +
fs/buffer.c | 89 ++++---
fs/ext4/ext4.h | 5 +
fs/ext4/inode.c | 106 +++++---
fs/ext4/page-io.c | 11 +-
fs/ext4/readpage.c | 38 ++-
fs/ext4/super.c | 19 ++
fs/proc/meminfo.c | 4 +
fs/proc/task_mmu.c | 5 +-
include/linux/bio.h | 2 +-
include/linux/buffer_head.h | 9 +-
include/linux/fs.h | 5 +
include/linux/huge_mm.h | 16 ++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 2 +
include/linux/page-flags.h | 8 +-
include/linux/pagemap.h | 22 +-
include/linux/radix-tree.h | 10 +-
lib/radix-tree.c | 357 ++++++++++++++++++--------
mm/filemap.c | 458 +++++++++++++++++++++++-----------
mm/huge_memory.c | 51 +++-
mm/khugepaged.c | 26 +-
mm/memory.c | 4 +-
mm/page-writeback.c | 19 +-
mm/page_alloc.c | 5 +
mm/readahead.c | 16 +-
mm/rmap.c | 12 +-
mm/shmem.c | 36 +--
mm/truncate.c | 106 +++++++-
mm/vmstat.c | 2 +
tools/include/asm/bug.h | 11 +
tools/testing/radix-tree/Makefile | 2 +-
tools/testing/radix-tree/linux.c | 7 +-
tools/testing/radix-tree/linux/bug.h | 2 +-
tools/testing/radix-tree/linux/gfp.h | 24 +-
tools/testing/radix-tree/linux/slab.h | 5 -
tools/testing/radix-tree/multiorder.c | 82 ++++++
tools/testing/radix-tree/test.h | 9 +
38 files changed, 1162 insertions(+), 430 deletions(-)

--
2.8.1