[PATCHv2 00/28] huge tmpfs implementation using compound pages
From: Kirill A. Shutemov
Date: Thu Feb 11 2016 - 09:31:42 EST
Here is my implementation of huge pages support in tmpfs/shmem. It's more
or less complete. I'm comfortable enough with this to run my workstation.
And it hasn't crashed so far. :)
The main difference with Hugh's approach[1] is that I continue with
compound pages, where Hugh invents new way couple pages: team pages.
I believe THP refcounting rework made team pages unnecessary: compound
page are flexible enough to serve needs of page cache.
Many ideas and some patches were stolen from Hugh's patchset. Having this
patchset around was very helpful.
I will continue with code validation. I would expect mlock require some
more attention.
Please, review and test the code.
Git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git hugetmpfs/v2
== Patchset overview ==
[01/28]
I've posted the patch last night. I stepped on the bug during my
testing of huge tmpfs, but I think DAX has the same problem, so it
should be applied now.
[02-05/28]
These patches also where posted separately. They simplify
split_huge_page() code with speed trade off. I'm not sure if they
should go upstream, but they make my life easier for now.
Patches were slightly adjust to handle file pages too.
[06-11/28]
Rework fault path and rmap to handle file pmd. Unlike DAX with
vm_ops->pmd_fault, we don't need to ask filesystem twice -- first
for huge page and then for small. If ->fault happend to return
huge page and VMA is suitable for mapping it as huge, we would do
so.
[12-20/28]
Various preparation of THP core for file pages.
[21-25/28]
Various preparation of MM core for file pages.
[26-28/28]
And finally, bring huge pages into tmpfs/shmem.
Two of three patches came from Hugh's patchset. :)
[1] http://lkml.kernel.org/g/alpine.LSU.2.11.1502201941340.14414@xxxxxxxxxxxx
Hugh Dickins (2):
shmem: prepare huge=N mount option and /proc/sys/vm/shmem_huge
shmem: get_unmapped_area align huge page
Kirill A. Shutemov (26):
thp, dax: do not try to withdraw pgtable from non-anon VMA
rmap: introduce rmap_walk_locked()
rmap: extend try_to_unmap() to be usable by split_huge_page()
mm: make remove_migration_ptes() beyond mm/migration.c
thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers
mm: do not pass mm_struct into handle_mm_fault
mm: introduce fault_env
mm: postpone page table allocation until do_set_pte()
rmap: support file thp
mm: introduce do_set_pmd()
mm, rmap: account file thp pages
thp, vmstats: add counters for huge file pages
thp: support file pages in zap_huge_pmd()
thp: handle file pages in split_huge_pmd()
thp: handle file COW faults
thp: handle file pages in mremap()
thp: skip file huge pmd on copy_huge_pmd()
thp: prepare change_huge_pmd() for file thp
thp: run vma_adjust_trans_huge() outside i_mmap_rwsem
thp: file pages support for split_huge_page()
vmscan: split file huge pages before paging them out
page-flags: relax policy for PG_mappedtodisk and PG_reclaim
radix-tree: implement radix_tree_maybe_preload_order()
filemap: prepare find and delete operations for huge pages
truncate: handle file thp
shmem: add huge pages support
Documentation/filesystems/Locking | 10 +-
arch/alpha/mm/fault.c | 2 +-
arch/arc/mm/fault.c | 2 +-
arch/arm/mm/fault.c | 2 +-
arch/arm64/mm/fault.c | 2 +-
arch/avr32/mm/fault.c | 2 +-
arch/cris/mm/fault.c | 2 +-
arch/frv/mm/fault.c | 2 +-
arch/hexagon/mm/vm_fault.c | 2 +-
arch/ia64/mm/fault.c | 2 +-
arch/m32r/mm/fault.c | 2 +-
arch/m68k/mm/fault.c | 2 +-
arch/metag/mm/fault.c | 2 +-
arch/microblaze/mm/fault.c | 2 +-
arch/mips/mm/fault.c | 2 +-
arch/mn10300/mm/fault.c | 2 +-
arch/nios2/mm/fault.c | 2 +-
arch/openrisc/mm/fault.c | 2 +-
arch/parisc/mm/fault.c | 2 +-
arch/powerpc/mm/copro_fault.c | 2 +-
arch/powerpc/mm/fault.c | 2 +-
arch/s390/mm/fault.c | 2 +-
arch/score/mm/fault.c | 2 +-
arch/sh/mm/fault.c | 2 +-
arch/sparc/mm/fault_32.c | 4 +-
arch/sparc/mm/fault_64.c | 2 +-
arch/tile/mm/fault.c | 2 +-
arch/um/kernel/trap.c | 2 +-
arch/unicore32/mm/fault.c | 2 +-
arch/x86/mm/fault.c | 2 +-
arch/xtensa/mm/fault.c | 2 +-
drivers/base/node.c | 10 +-
drivers/char/mem.c | 24 ++
drivers/iommu/amd_iommu_v2.c | 2 +-
drivers/iommu/intel-svm.c | 2 +-
fs/proc/meminfo.c | 5 +-
fs/userfaultfd.c | 22 +-
include/linux/huge_mm.h | 29 +-
include/linux/mm.h | 33 +-
include/linux/mmzone.h | 3 +-
include/linux/page-flags.h | 6 +-
include/linux/radix-tree.h | 1 +
include/linux/rmap.h | 8 +-
include/linux/shmem_fs.h | 18 +-
include/linux/userfaultfd_k.h | 8 +-
include/linux/vm_event_item.h | 7 +
ipc/shm.c | 6 +-
kernel/sysctl.c | 12 +
lib/radix-tree.c | 70 +++-
mm/filemap.c | 220 +++++++----
mm/gup.c | 7 +-
mm/huge_memory.c | 714 ++++++++++++++--------------------
mm/internal.h | 20 +-
mm/ksm.c | 3 +-
mm/memory.c | 796 +++++++++++++++++++++-----------------
mm/mempolicy.c | 4 +-
mm/migrate.c | 17 +-
mm/mmap.c | 20 +-
mm/mremap.c | 22 +-
mm/nommu.c | 3 +-
mm/page-writeback.c | 1 +
mm/rmap.c | 125 ++++--
mm/shmem.c | 493 +++++++++++++++++++----
mm/swap.c | 2 +
mm/truncate.c | 22 +-
mm/util.c | 6 +
mm/vmscan.c | 15 +-
mm/vmstat.c | 3 +
68 files changed, 1727 insertions(+), 1104 deletions(-)
--
2.7.0