Re: [RFC v3] Support volatile range for anon vma

From: Minchan Kim
Date: Wed Dec 12 2012 - 03:15:03 EST

Next message: Zhang Rui: "Re: [GIT PULL] thermal management updates for v3.8-rc1"
Previous message: OndÅej BÃlka: "Re: [RFC] AES instead of SHA1 for /dev/urandom"
In reply to: Minchan Kim: "Re: [RFC v3] Support volatile range for anon vma"
Next in thread: Minchan Kim: "Re: [RFC v3] Support volatile range for anon vma"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Dec 12, 2012 at 02:43:49PM +0800, Wanpeng Li wrote:
> On Tue, Dec 11, 2012 at 11:41:04AM +0900, Minchan Kim wrote:
> >Sorry, resending with fixing compile error. :(
> >
> >>From 0cfd3b65e4e90ab59abe8a337334414f92423cad Mon Sep 17 00:00:00 2001
> >From: Minchan Kim <minchan@xxxxxxxxxx>
> >Date: Tue, 11 Dec 2012 11:38:30 +0900
> >Subject: [RFC v3] Support volatile range for anon vma
> >
> >This still is [RFC v3] because just passed my simple test
> >with TCMalloc tweaking.
> >
> >I hope more inputs from user-space allocator people and test patch
> >with their allocator because it might need design change of arena
> >management design for getting real vaule.
> >
> >Changelog from v2
> >
> > * Removing madvise(addr, length, MADV_NOVOLATILE).
> > * add vmstat about the number of discarded volatile pages
> > * discard volatile pages without promotion in reclaim path
> >
> >This is based on v3.6.
> >
> >- What's the madvise(addr, length, MADV_VOLATILE)?
> >
> > It's a hint that user deliver to kernel so kernel can *discard*
> > pages in a range anytime.
> >
> >- What happens if user access page(ie, virtual address) discarded
> > by kernel?
> >
> > The user can see zero-fill-on-demand pages as if madvise(DONTNEED).
> >
> >- What happens if user access page(ie, virtual address) doesn't
> > discarded by kernel?
> >
> > The user can see old data without page fault.
> >
> >- What's different with madvise(DONTNEED)?
> >
> > System call semantic
> >
> > DONTNEED makes sure user always can see zero-fill pages after
> > he calls madvise while VOLATILE can see zero-fill pages or
> > old data.
> >
> > Internal implementation
> >
> > The madvise(DONTNEED) should zap all mapped pages in range so
> > overhead is increased linearly with the number of mapped pages.
> > Even, if user access zapped pages by write, page fault + page
> > allocation + memset should be happened.
> >
> > The madvise(VOLATILE) should mark the flag in a range(ie, VMA).
> > It doesn't touch pages any more so overhead of the system call
> > should be very small. If memory pressure happens, VM can discard
> > pages in VMAs marked by VOLATILE. If user access address with
> > write mode by discarding by VM, he can see zero-fill pages so the
> > cost is same with DONTNEED but if memory pressure isn't severe,
> > user can see old data without (page fault + page allocation + memset)
> >
> > The VOLATILE mark should be removed in page fault handler when first
> > page fault occur in marked vma so next page faults will follow normal
> > page fault path. That's why user don't need madvise(MADV_NOVOLATILE)
> > interface.
> >
> >- What's the benefit compared to DONTNEED?
> >
> > 1. The system call overhead is smaller because VOLATILE just marks
> > the flag to VMA instead of zapping all the page in a range.
> >
> > 2. It has a chance to eliminate overheads (ex, page fault +
> > page allocation + memset(PAGE_SIZE)).
> >
> >- Isn't there any drawback?
> >
> > DONTNEED doesn't need exclusive mmap_sem locking so concurrent page
> > fault of other threads could be allowed. But VOLATILE needs exclusive
> > mmap_sem so other thread would be blocked if they try to access
> > not-mapped pages. That's why I designed madvise(VOLATILE)'s overhead
> > should be small as far as possible.
> >
> > Other concern of exclusive mmap_sem is when page fault occur in
> > VOLATILE marked vma. We should remove the flag of vma and merge
> > adjacent vmas so needs exclusive mmap_sem. It can slow down page fault
> > handling and prevent concurrent page fault. But we need such handling
> > just once when page fault occur after we mark VOLATILE into VMA
> > only if memory pressure happpens so the page is discarded. So it wouldn't
> > not common so that benefit we get by this feature would be bigger than
> > lose.
> >
> >- What's for targetting?
> >
> > Firstly, user-space allocator like ptmalloc, tcmalloc or heap management
> > of virtual machine like Dalvik. Also, it comes in handy for embedded
> > which doesn't have swap device so they can't reclaim anonymous pages.
> > By discarding instead of swap, it could be used in the non-swap system.
> > For it, we have to age anon lru list although we don't have swap because
> > I don't want to discard volatile pages by top priority when memory pressure
> > happens as volatile in this patch means "We don't need to swap out because
> > user can handle the situation which data are disappear suddenly", NOT
> > "They are useless so hurry up to reclaim them". So I want to apply same
> > aging rule of nomal pages to them.
> >
> > Anonymous page background aging of non-swap system would be a trade-off
> > for getting good feature. Even, we had done it two years ago until merge
> > [1] and I believe gain of this patch will beat loss of anon lru aging's
> > overead once all of allocator start to use madvise.
> > (This patch doesn't include background aging in case of non-swap system
> > but it's trivial if we decide)
> >
> >[1] 74e3f3c3, vmscan: prevent background aging of anon page in no swap system
> >
> >Cc: Michael Kerrisk <mtk.manpages@xxxxxxxxx>
> >Cc: Arun Sharma <asharma@xxxxxx>
> >Cc: sanjay@xxxxxxxxxx
> >Cc: Paul Turner <pjt@xxxxxxxxxx>
> >CC: David Rientjes <rientjes@xxxxxxxxxx>
> >Cc: John Stultz <john.stultz@xxxxxxxxxx>
> >Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> >Cc: Christoph Lameter <cl@xxxxxxxxx>
> >Cc: Android Kernel Team <kernel-team@xxxxxxxxxxx>
> >Cc: Robert Love <rlove@xxxxxxxxxx>
> >Cc: Mel Gorman <mel@xxxxxxxxx>
> >Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> >Cc: Dave Hansen <dave@xxxxxxxxxxxxxxxxxx>
> >Cc: Rik van Riel <riel@xxxxxxxxxx>
> >Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> >Cc: Neil Brown <neilb@xxxxxxx>
> >Cc: Mike Hommey <mh@xxxxxxxxxxxx>
> >Cc: Taras Glek <tglek@xxxxxxxxxxx>
> >Cc: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxx>
> >Cc: Christoph Lameter <cl@xxxxxxxxx>
> >Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> >Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
> >---
> > arch/x86/mm/fault.c | 2 +
> > include/asm-generic/mman-common.h | 6 ++
> > include/linux/mm.h | 7 ++-
> > include/linux/rmap.h | 20 ++++++
> > include/linux/vm_event_item.h | 2 +-
> > mm/madvise.c | 19 +++++-
> > mm/memory.c | 32 ++++++++++
> > mm/migrate.c | 6 +-
> > mm/rmap.c | 125 ++++++++++++++++++++++++++++++++++++-
> > mm/vmscan.c | 7 +++
> > mm/vmstat.c | 1 +
> > 11 files changed, 218 insertions(+), 9 deletions(-)
> >
> >diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> >index 76dcd9d..17c1c20 100644
> >--- a/arch/x86/mm/fault.c
> >+++ b/arch/x86/mm/fault.c
> >@@ -879,6 +879,8 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
> > }
> >
> > out_of_memory(regs, error_code, address);
> >+ } else if (fault & VM_FAULT_SIGSEG) {
> >+ bad_area(regs, error_code, address);
> > } else {
> > if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|
> > VM_FAULT_HWPOISON_LARGE))
> >diff --git a/include/asm-generic/mman-common.h b/include/asm-generic/mman-common.h
> >index d030d2c..f07781e 100644
> >--- a/include/asm-generic/mman-common.h
> >+++ b/include/asm-generic/mman-common.h
> >@@ -34,6 +34,12 @@
> > #define MADV_SEQUENTIAL 2 /* expect sequential page references */
> > #define MADV_WILLNEED 3 /* will need these pages */
> > #define MADV_DONTNEED 4 /* don't need these pages */
> >+/*
> >+ * Unlike other flags, we need two locks to protect MADV_VOLATILE.
> >+ * For changing the flag, we need mmap_sem's write lock and volatile_lock
> >+ * while we just need volatile_lock in case of reading the flag.
> >+ */
> >+#define MADV_VOLATILE 5 /* pages will disappear suddenly */
> >
> > /* common parameters: try to keep these consistent across architectures */
> > #define MADV_REMOVE 9 /* remove these pages & resources */
> >diff --git a/include/linux/mm.h b/include/linux/mm.h
> >index 311be90..89027b5 100644
> >--- a/include/linux/mm.h
> >+++ b/include/linux/mm.h
> >@@ -119,6 +119,7 @@ extern unsigned int kobjsize(const void *objp);
> > #define VM_SAO 0x20000000 /* Strong Access Ordering (powerpc) */
> > #define VM_PFN_AT_MMAP 0x40000000 /* PFNMAP vma that is fully mapped at mmap time */
> > #define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
> >+#define VM_VOLATILE 0x100000000 /* Pages in the vma could be discarable without swap */
> >
> > /* Bits set in the VMA until the stack is in its final location */
> > #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ)
> >@@ -143,7 +144,7 @@ extern unsigned int kobjsize(const void *objp);
> > * Special vmas that are non-mergable, non-mlock()able.
> > * Note: mm/huge_memory.c VM_NO_THP depends on this definition.
> > */
> >-#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)
> >+#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP | VM_VOLATILE)
> >
> > /*
> > * mapping from the currently active vm_flags protection bits (the
> >@@ -872,11 +873,11 @@ static inline int page_mapped(struct page *page)
> > #define VM_FAULT_NOPAGE 0x0100 /* ->fault installed the pte, not return page */
> > #define VM_FAULT_LOCKED 0x0200 /* ->fault locked the returned page */
> > #define VM_FAULT_RETRY 0x0400 /* ->fault blocked, must retry */
> >-
> >+#define VM_FAULT_SIGSEG 0x0800 /* -> There is no vma */
> > #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
> >
> > #define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_HWPOISON | \
> >- VM_FAULT_HWPOISON_LARGE)
> >+ VM_FAULT_HWPOISON_LARGE | VM_FAULT_SIGSEG)
> >
> > /* Encode hstate index for a hwpoisoned large page */
> > #define VM_FAULT_SET_HINDEX(x) ((x) << 12)
> >diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> >index 3fce545..735d7a3 100644
> >--- a/include/linux/rmap.h
> >+++ b/include/linux/rmap.h
> >@@ -67,6 +67,9 @@ struct anon_vma_chain {
> > struct list_head same_anon_vma; /* locked by anon_vma->mutex */
> > };
> >
> >+void volatile_lock(struct vm_area_struct *vma);
> >+void volatile_unlock(struct vm_area_struct *vma);
> >+
> > #ifdef CONFIG_MMU
> > static inline void get_anon_vma(struct anon_vma *anon_vma)
> > {
> >@@ -170,6 +173,7 @@ enum ttu_flags {
> > TTU_IGNORE_MLOCK = (1 << 8), /* ignore mlock */
> > TTU_IGNORE_ACCESS = (1 << 9), /* don't age */
> > TTU_IGNORE_HWPOISON = (1 << 10),/* corrupted page is recoverable */
> >+ TTU_IGNORE_VOLATILE = (1 << 11),/* ignore volatile */
> > };
> > #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
> >
> >@@ -194,6 +198,21 @@ static inline pte_t *page_check_address(struct page *page, struct mm_struct *mm,
> > return ptep;
> > }
> >
> >+pte_t *__page_check_volatile_address(struct page *, struct mm_struct *,
> >+ unsigned long, spinlock_t **);
> >+
> >+static inline pte_t *page_check_volatile_address(struct page *page,
> >+ struct mm_struct *mm,
> >+ unsigned long address,
> >+ spinlock_t **ptlp)
> >+{
> >+ pte_t *ptep;
> >+
> >+ __cond_lock(*ptlp, ptep = __page_check_volatile_address(page,
> >+ mm, address, ptlp));
> >+ return ptep;
> >+}
> >+
> > /*
> > * Used by swapoff to help locate where page is expected in vma.
> > */
> >@@ -257,5 +276,6 @@ static inline int page_mkclean(struct page *page)
> > #define SWAP_AGAIN 1
> > #define SWAP_FAIL 2
> > #define SWAP_MLOCK 3
> >+#define SWAP_DISCARD 4
> >
> > #endif /* _LINUX_RMAP_H */
> >diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> >index 57f7b10..3f9a40b 100644
> >--- a/include/linux/vm_event_item.h
> >+++ b/include/linux/vm_event_item.h
> >@@ -23,7 +23,7 @@
> >
> > enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> > FOR_ALL_ZONES(PGALLOC),
> >- PGFREE, PGACTIVATE, PGDEACTIVATE,
> >+ PGFREE, PGVOLATILE, PGACTIVATE, PGDEACTIVATE,
> > PGFAULT, PGMAJFAULT,
> > FOR_ALL_ZONES(PGREFILL),
> > FOR_ALL_ZONES(PGSTEAL_KSWAPD),
> >diff --git a/mm/madvise.c b/mm/madvise.c
> >index 14d260f..53a19d8 100644
> >--- a/mm/madvise.c
> >+++ b/mm/madvise.c
> >@@ -86,6 +86,13 @@ static long madvise_behavior(struct vm_area_struct * vma,
> > if (error)
> > goto out;
> > break;
> >+ case MADV_VOLATILE:
> >+ if (vma->vm_flags & VM_LOCKED) {
> >+ error = -EINVAL;
> >+ goto out;
> >+ }
> >+ new_flags |= VM_VOLATILE;
> >+ break;
> > }
> >
> > if (new_flags == vma->vm_flags) {
> >@@ -118,9 +125,13 @@ static long madvise_behavior(struct vm_area_struct * vma,
> > success:
> > /*
> > * vm_flags is protected by the mmap_sem held in write mode.
> >+ * In caes of MADV_VOLATILE, we need anon_vma_lock additionally.
> > */
> >+ if (behavior == MADV_VOLATILE)
> >+ volatile_lock(vma);
> > vma->vm_flags = new_flags;
> >-
> >+ if (behavior == MADV_VOLATILE)
> >+ volatile_unlock(vma);
> > out:
> > if (error == -ENOMEM)
> > error = -EAGAIN;
> >@@ -310,6 +321,7 @@ madvise_behavior_valid(int behavior)
> > #endif
> > case MADV_DONTDUMP:
> > case MADV_DODUMP:
> >+ case MADV_VOLATILE:
> > return 1;
> >
> > default:
> >@@ -385,6 +397,11 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> > goto out;
> > len = (len_in + ~PAGE_MASK) & PAGE_MASK;
> >
> >+ if (behavior != MADV_VOLATILE)
> >+ len = (len_in + ~PAGE_MASK) & PAGE_MASK;
> >+ else
> >+ len = len_in & PAGE_MASK;
> >+
> > /* Check to see whether len was rounded up from small -ve to zero */
> > if (len_in && !len)
> > goto out;
> >diff --git a/mm/memory.c b/mm/memory.c
> >index 5736170..b5e4996 100644
> >--- a/mm/memory.c
> >+++ b/mm/memory.c
> >@@ -57,6 +57,7 @@
> > #include <linux/swapops.h>
> > #include <linux/elf.h>
> > #include <linux/gfp.h>
> >+#include <linux/mempolicy.h>
> >
> > #include <asm/io.h>
> > #include <asm/pgalloc.h>
> >@@ -3446,6 +3447,37 @@ int handle_pte_fault(struct mm_struct *mm,
> > return do_linear_fault(mm, vma, address,
> > pte, pmd, flags, entry);
> > }
> >+ if (vma->vm_flags & VM_VOLATILE) {
> >+ struct vm_area_struct *prev;
> >+
> >+ up_read(&mm->mmap_sem);
> >+ down_write(&mm->mmap_sem);
> >+ vma = find_vma_prev(mm, address, &prev);
> >+
> >+ /* Someone unmap the vma */
> >+ if (unlikely(!vma) || vma->vm_start > address) {
> >+ downgrade_write(&mm->mmap_sem);
> >+ return VM_FAULT_SIGSEG;
> >+ }
> >+ /* Someone else already hanlded */
> >+ if (vma->vm_flags & VM_VOLATILE) {
> >+ /*
> >+ * From now on, we hold mmap_sem as
> >+ * exclusive.
> >+ */
> >+ volatile_lock(vma);
> >+ vma->vm_flags &= ~VM_VOLATILE;
> >+ volatile_unlock(vma);
> >+
> >+ vma_merge(mm, prev, vma->vm_start,
> >+ vma->vm_end, vma->vm_flags,
> >+ vma->anon_vma, vma->vm_file,
> >+ vma->vm_pgoff, vma_policy(vma));
> >+
> >+ }
> >+
> >+ downgrade_write(&mm->mmap_sem);
> >+ }
> > return do_anonymous_page(mm, vma, address,
> > pte, pmd, flags);
> > }
> >diff --git a/mm/migrate.c b/mm/migrate.c
> >index 77ed2d7..08b009c 100644
> >--- a/mm/migrate.c
> >+++ b/mm/migrate.c
> >@@ -800,7 +800,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> > }
> >
> > /* Establish migration ptes or remove ptes */
> >- try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> >+ try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|
> >+ TTU_IGNORE_ACCESS|TTU_IGNORE_VOLATILE);
> >
> > skip_unmap:
> > if (!page_mapped(page))
> >@@ -915,7 +916,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> > if (PageAnon(hpage))
> > anon_vma = page_get_anon_vma(hpage);
> >
> >- try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> >+ try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|
> >+ TTU_IGNORE_ACCESS|TTU_IGNORE_VOLATILE);
> >
> > if (!page_mapped(hpage))
> > rc = move_to_new_page(new_hpage, hpage, 1, mode);
> >diff --git a/mm/rmap.c b/mm/rmap.c
> >index 0f3b7cd..1a0ab2b 100644
> >--- a/mm/rmap.c
> >+++ b/mm/rmap.c
> >@@ -603,6 +603,57 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
> > return vma_address(page, vma);
> > }
> >
> >+pte_t *__page_check_volatile_address(struct page *page, struct mm_struct *mm,
> >+ unsigned long address, spinlock_t **ptlp)
> >+{
> >+ pgd_t *pgd;
> >+ pud_t *pud;
> >+ pmd_t *pmd;
> >+ pte_t *pte;
> >+ spinlock_t *ptl;
> >+
> >+ swp_entry_t entry = { .val = page_private(page) };
> >+
> >+ if (unlikely(PageHuge(page))) {
> >+ pte = huge_pte_offset(mm, address);
> >+ ptl = &mm->page_table_lock;
> >+ goto check;
> >+ }
> >+
> >+ pgd = pgd_offset(mm, address);
> >+ if (!pgd_present(*pgd))
> >+ return NULL;
> >+
> >+ pud = pud_offset(pgd, address);
> >+ if (!pud_present(*pud))
> >+ return NULL;
> >+
> >+ pmd = pmd_offset(pud, address);
> >+ if (!pmd_present(*pmd))
> >+ return NULL;
> >+ if (pmd_trans_huge(*pmd))
> >+ return NULL;
> >+
> >+ pte = pte_offset_map(pmd, address);
> >+ ptl = pte_lockptr(mm, pmd);
> >+check:
> >+ spin_lock(ptl);
> >+ if (PageAnon(page)) {
> >+ if (!pte_present(*pte) && entry.val ==
> >+ pte_to_swp_entry(*pte).val) {
> >+ *ptlp = ptl;
> >+ return pte;
> >+ }
> >+ } else {
> >+ if (pte_none(*pte)) {
> >+ *ptlp = ptl;
> >+ return pte;
> >+ }
> >+ }
> >+ pte_unmap_unlock(pte, ptl);
> >+ return NULL;
> >+}
> >+
> > /*
> > * Check that @page is mapped at @address into @mm.
> > *
> >@@ -1218,6 +1269,35 @@ out:
> > mem_cgroup_end_update_page_stat(page, &locked, &flags);
> > }
> >
> >+int try_to_zap_one(struct page *page, struct vm_area_struct *vma,
> >+ unsigned long address)
> >+{
> >+ struct mm_struct *mm = vma->vm_mm;
> >+ pte_t *pte;
> >+ pte_t pteval;
> >+ spinlock_t *ptl;
> >+
> >+ pte = page_check_volatile_address(page, mm, address, &ptl);
> >+ if (!pte)
> >+ return 0;
> >+
> >+ /* Nuke the page table entry. */
> >+ flush_cache_page(vma, address, page_to_pfn(page));
> >+ pteval = ptep_clear_flush(vma, address, pte);
> >+
> >+ if (PageAnon(page)) {
> >+ swp_entry_t entry = { .val = page_private(page) };
> >+ if (PageSwapCache(page)) {
> >+ dec_mm_counter(mm, MM_SWAPENTS);
> >+ swap_free(entry);
> >+ }
> >+ }
> >+
> >+ pte_unmap_unlock(pte, ptl);
> >+ mmu_notifier_invalidate_page(mm, address);
> >+ return 1;
> >+}
> >+
> > /*
> > * Subfunctions of try_to_unmap: try_to_unmap_one called
> > * repeatedly from try_to_unmap_ksm, try_to_unmap_anon or try_to_unmap_file.
> >@@ -1494,6 +1574,10 @@ static int try_to_unmap_anon(struct page *page, enum ttu_flags flags)
> > struct anon_vma *anon_vma;
> > struct anon_vma_chain *avc;
> > int ret = SWAP_AGAIN;
> >+ bool is_volatile = true;
> >+
> >+ if (flags & TTU_IGNORE_VOLATILE)
> >+ is_volatile = false;
> >
> > anon_vma = page_lock_anon_vma(page);
> > if (!anon_vma)
> >@@ -1512,17 +1596,40 @@ static int try_to_unmap_anon(struct page *page, enum ttu_flags flags)
> > * temporary VMAs until after exec() completes.
> > */
> > if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
> >- is_vma_temporary_stack(vma))
> >+ is_vma_temporary_stack(vma)) {
> >+ is_volatile = false;
> > continue;
> >+ }
> >
> > address = vma_address(page, vma);
> > if (address == -EFAULT)
> > continue;
> >+ /*
> >+ * A volatile page will only be purged if ALL vmas
> >+ * pointing to it are VM_VOLATILE.
> >+ */
> >+ if (!(vma->vm_flags & VM_VOLATILE))
> >+ is_volatile = false;
> >+
> > ret = try_to_unmap_one(page, vma, address, flags);
> > if (ret != SWAP_AGAIN || !page_mapped(page))
> > break;
> > }
> >
> >+ if (page_mapped(page) || is_volatile == false)
> >+ goto out;
> >+
> >+ list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
> >+ struct vm_area_struct *vma = avc->vma;
> >+ unsigned long address;
> >+
> >+ address = vma_address(page, vma);
> >+ try_to_zap_one(page, vma, address);
> >+ }
> >+ /* We're throwing this page out, so mark it clean */
> >+ ClearPageDirty(page);
> >+ ret = SWAP_DISCARD;
> >+out:
> > page_unlock_anon_vma(anon_vma);
> > return ret;
> > }
> >@@ -1651,6 +1758,7 @@ out:
> > * SWAP_AGAIN - we missed a mapping, try again later
> > * SWAP_FAIL - the page is unswappable
> > * SWAP_MLOCK - page is mlocked.
> >+ * SWAP_DISCARD - page is volatile.
> > */
> > int try_to_unmap(struct page *page, enum ttu_flags flags)
> > {
> >@@ -1665,7 +1773,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> > ret = try_to_unmap_anon(page, flags);
> > else
> > ret = try_to_unmap_file(page, flags);
> >- if (ret != SWAP_MLOCK && !page_mapped(page))
> >+ if (ret != SWAP_MLOCK && !page_mapped(page) &&
> >+ ret != SWAP_DISCARD)
> > ret = SWAP_SUCCESS;
> > return ret;
> > }
> >@@ -1707,6 +1816,18 @@ void __put_anon_vma(struct anon_vma *anon_vma)
> > anon_vma_free(anon_vma);
> > }
> >
> >+void volatile_lock(struct vm_area_struct *vma)
> >+{
> >+ if (vma->anon_vma)
> >+ anon_vma_lock(vma->anon_vma);
> >+}
> >+
> >+void volatile_unlock(struct vm_area_struct *vma)
> >+{
> >+ if (vma->anon_vma)
> >+ anon_vma_unlock(vma->anon_vma);
> >+}
> >+
> > #ifdef CONFIG_MIGRATION
> > /*
> > * rmap_walk() and its helpers rmap_walk_anon() and rmap_walk_file():
> >diff --git a/mm/vmscan.c b/mm/vmscan.c
> >index 99b434b..4e463a4 100644
> >--- a/mm/vmscan.c
> >+++ b/mm/vmscan.c
> >@@ -630,6 +630,9 @@ static enum page_references page_check_references(struct page *page,
> > if (vm_flags & VM_LOCKED)
> > return PAGEREF_RECLAIM;
> >
> >+ if (vm_flags & VM_VOLATILE)
> >+ return PAGEREF_RECLAIM;
> >+
> > if (referenced_ptes) {
> > if (PageSwapBacked(page))
> > return PAGEREF_ACTIVATE;
> >@@ -789,6 +792,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> > */
>
> Hi Minchan,
>
> IIUC, anonymous page has already add to swapcache through add_to_swap called
> by shrink_page_list, but I can't figure out where you remove it from swapache.

I intended to free it in __remove_mapping.

Thanks.
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Zhang Rui: "Re: [GIT PULL] thermal management updates for v3.8-rc1"
Previous message: OndÅej BÃlka: "Re: [RFC] AES instead of SHA1 for /dev/urandom"
In reply to: Minchan Kim: "Re: [RFC v3] Support volatile range for anon vma"
Next in thread: Minchan Kim: "Re: [RFC v3] Support volatile range for anon vma"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]