[PATCH v18 16/32] mm/lru: introduce TestClearPageLRU

From: Alex Shi
Date: Mon Aug 24 2020 - 08:55:42 EST


Currently lru_lock still guards both lru list and page's lru bit, that's
ok. but if we want to use specific lruvec lock on the page, we need to
pin down the page's lruvec/memcg during locking. Just taking lruvec
lock first may be undermined by the page's memcg charge/migration. To
fix this problem, we could clear the lru bit out of locking and use
it as pin down action to block the page isolation in memcg changing.

So now a standard steps of page isolation is following:
1, get_page(); #pin the page avoid to be free
2, TestClearPageLRU(); #block other isolation like memcg change
3, spin_lock on lru_lock; #serialize lru list access
4, delete page from lru list;
The step 2 could be optimzed/replaced in scenarios which page is
unlikely be accessed or be moved between memcgs.

This patch start with the first part: TestClearPageLRU, which combines
PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This
function will be used as page isolation precondition to prevent other
isolations some where else. Then there are may !PageLRU page on lru
list, need to remove BUG() checking accordingly.

There 2 rules for lru bit now:
1, the lru bit still indicate if a page on lru list, just in some
temporary moment(isolating), the page may have no lru bit when
it's on lru list. but the page still must be on lru list when the
lru bit set.
2, have to remove lru bit before delete it from lru list.

Hugh Dickins pointed that when a page is in free path and no one is
possible to take it, non atomic lru bit clearing is better, like in
__page_cache_release and release_pages.
And no need get_page() before lru bit clear in isolate_lru_page,
since it '(1) Must be called with an elevated refcount on the page'.

As Andrew Morton mentioned this change would dirty cacheline for page
isn't on LRU. But the lost would be acceptable with Rong Chen
<rong.a.chen@xxxxxxxxx> report:
https://lkml.org/lkml/2020/3/4/173

Suggested-by: Johannes Weiner <hannes@xxxxxxxxxxx>
Signed-off-by: Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
Cc: cgroups@xxxxxxxxxxxxxxx
Cc: linux-mm@xxxxxxxxx
---
include/linux/page-flags.h | 1 +
mm/mlock.c | 3 +--
mm/swap.c | 5 ++---
mm/vmscan.c | 18 +++++++-----------
4 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6be1aa559b1e..9554ed1387dc 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -326,6 +326,7 @@ static inline void page_init_poison(struct page *page, size_t size)
PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
+ TESTCLEARFLAG(LRU, lru, PF_HEAD)
PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
TESTCLEARFLAG(Active, active, PF_HEAD)
PAGEFLAG(Workingset, workingset, PF_HEAD)
diff --git a/mm/mlock.c b/mm/mlock.c
index 93ca2bf30b4f..3762d9dd5b31 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -107,13 +107,12 @@ void mlock_vma_page(struct page *page)
*/
static bool __munlock_isolate_lru_page(struct page *page, bool getpage)
{
- if (PageLRU(page)) {
+ if (TestClearPageLRU(page)) {
struct lruvec *lruvec;

lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
if (getpage)
get_page(page);
- ClearPageLRU(page);
del_page_from_lru_list(page, lruvec, page_lru(page));
return true;
}
diff --git a/mm/swap.c b/mm/swap.c
index f80ccd6f3cb4..446ffe280809 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -83,10 +83,9 @@ static void __page_cache_release(struct page *page)
struct lruvec *lruvec;
unsigned long flags;

+ __ClearPageLRU(page);
spin_lock_irqsave(&pgdat->lru_lock, flags);
lruvec = mem_cgroup_page_lruvec(page, pgdat);
- VM_BUG_ON_PAGE(!PageLRU(page), page);
- __ClearPageLRU(page);
del_page_from_lru_list(page, lruvec, page_off_lru(page));
spin_unlock_irqrestore(&pgdat->lru_lock, flags);
}
@@ -880,9 +879,9 @@ void release_pages(struct page **pages, int nr)
spin_lock_irqsave(&locked_pgdat->lru_lock, flags);
}

- lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
VM_BUG_ON_PAGE(!PageLRU(page), page);
__ClearPageLRU(page);
+ lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
del_page_from_lru_list(page, lruvec, page_off_lru(page));
}

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7b7b36bd1448..1b3e0eeaad64 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1665,8 +1665,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
page = lru_to_page(src);
prefetchw_prev_lru_page(page, src, flags);

- VM_BUG_ON_PAGE(!PageLRU(page), page);
-
nr_pages = compound_nr(page);
total_scan += nr_pages;

@@ -1763,21 +1761,19 @@ int isolate_lru_page(struct page *page)
VM_BUG_ON_PAGE(!page_count(page), page);
WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");

- if (PageLRU(page)) {
+ if (TestClearPageLRU(page)) {
pg_data_t *pgdat = page_pgdat(page);
struct lruvec *lruvec;
+ int lru = page_lru(page);

- spin_lock_irq(&pgdat->lru_lock);
+ get_page(page);
lruvec = mem_cgroup_page_lruvec(page, pgdat);
- if (PageLRU(page)) {
- int lru = page_lru(page);
- get_page(page);
- ClearPageLRU(page);
- del_page_from_lru_list(page, lruvec, lru);
- ret = 0;
- }
+ spin_lock_irq(&pgdat->lru_lock);
+ del_page_from_lru_list(page, lruvec, lru);
spin_unlock_irq(&pgdat->lru_lock);
+ ret = 0;
}
+
return ret;
}

--
1.8.3.1