Re: [PATCH v9 06/14] mm: multi-gen LRU: minimal implementation

From: Huang, Ying
Date: Wed Mar 16 2022 - 01:55:30 EST


Hi, Yu,

Yu Zhao <yuzhao@xxxxxxxxxx> writes:

[snip]

>
> +static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc)
> +{
> + struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> + struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> +
> + if (!can_demote(pgdat->node_id, sc) &&
> + mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH)
> + return 0;
> +
> + return mem_cgroup_swappiness(memcg);
> +}
> +

We have tested v9 for memory tiering system, the demotion works now even
without swap devices configured. Thanks!

And we found that the demotion (page reclaiming on DRAM nodes) speed is
lower than the original implementation. The workload itself is just a
memory accessing micro-benchmark with Gauss distribution. It is run on
a system with DRAM and PMEM. Initially, quite some hot pages are placed
in PMEM and quite some cold pages are placed in DRAM. Then the page
placement optimizing mechanism based on NUMA balancing will try to
promote some hot pages from PMEM node to DRAM node. If the DRAM node
near full (reach high watermark), kswapd of the DRAM node will be woke
up to demote (reclaim) some cold DRAM pages to PMEM. Because quite some
pages on DRAM is very cold (not accessed for at least several seconds),
the benchmark performance will be better if demotion speed is faster.

Some data comes from /proc/vmstat and perf-profile is as follows.

>From /proc/vmstat, it seems that the page scanned and page demoted is
much less with MGLRU enabled. The pgdemote_kswapd / pgscan_kswapd is
5.22 times higher with MGLRU enabled than that with MGLRU disabled. I
think this shows the value of direct page table scanning.

>From perf-profile, the CPU cycles for kswapd is same. But less pages
are demoted (reclaimed) with MGLRU. And it appears that the total page
table scanning time of MGLRU is longer if we compare walk_page_range
(1.97%, MGLRU enabled) and page_referenced (0.54%, MGLRU disabled)?
Because we only demote (reclaim) from DRAM nodes, but not demote
(reclaim) from PMEM nodes and bloom filter doesn't work well enough?
One thing that may be not friendly for bloom filter is that some virtual
pages may change their resident nodes because of demotion/promotion.

Can you teach me to how interpret these data for MGLRU? Or can you
point me to the other/better data for MGLRU?

MGLRU disabled via: echo -n 0 > /sys/kernel/mm/lru_gen/enabled
--------------------------------------------------------------

/proc/vmstat:

pgactivate 1767172340
pgdeactivate 1740111896
pglazyfree 0
pgfault 583875828
pgmajfault 0
pglazyfreed 0
pgrefill 1740111896
pgreuse 22626572
pgsteal_kswapd 153796237
pgsteal_direct 1999
pgdemote_kswapd 153796237
pgdemote_direct 1999
pgscan_kswapd 2055504891
pgscan_direct 1999
pgscan_direct_throttle 0
pgscan_anon 2055356614
pgscan_file 150276
pgsteal_anon 153798203
pgsteal_file 33
zone_reclaim_failed 0
pginodesteal 0
slabs_scanned 82761
kswapd_inodesteal 0
kswapd_low_wmark_hit_quickly 2960
kswapd_high_wmark_hit_quickly 17732
pageoutrun 21583
pgrotated 0
drop_pagecache 0
drop_slab 0
oom_kill 0
numa_pte_updates 515994024
numa_huge_pte_updates 154
numa_hint_faults 498301236
numa_hint_faults_local 121109067
numa_pages_migrated 152650705
pgmigrate_success 307213704
pgmigrate_fail 39
thp_migration_success 93
thp_migration_fail 0
thp_migration_split 0

perf-profile:

kswapd.kthread.ret_from_fork: 2.86
balance_pgdat.kswapd.kthread.ret_from_fork: 2.86
shrink_node.balance_pgdat.kswapd.kthread.ret_from_fork: 2.85
shrink_lruvec.shrink_node.balance_pgdat.kswapd.kthread: 2.76
shrink_inactive_list.shrink_lruvec.shrink_node.balance_pgdat.kswapd: 1.9
shrink_page_list.shrink_inactive_list.shrink_lruvec.shrink_node.balance_pgdat: 1.52
shrink_active_list.shrink_lruvec.shrink_node.balance_pgdat.kswapd: 0.85
migrate_pages.shrink_page_list.shrink_inactive_list.shrink_lruvec.shrink_node: 0.79
page_referenced.shrink_page_list.shrink_inactive_list.shrink_lruvec.shrink_node: 0.54


MGLRU enabled via: echo -n 7 > /sys/kernel/mm/lru_gen/enabled
-------------------------------------------------------------

/proc/vmstat:

pgactivate 47212585
pgdeactivate 0
pglazyfree 0
pgfault 580056521
pgmajfault 0
pglazyfreed 0
pgrefill 6911868880
pgreuse 25108929
pgsteal_kswapd 32701609
pgsteal_direct 0
pgdemote_kswapd 32701609
pgdemote_direct 0
pgscan_kswapd 83582770
pgscan_direct 0
pgscan_direct_throttle 0
pgscan_anon 83549777
pgscan_file 32993
pgsteal_anon 32701576
pgsteal_file 33
zone_reclaim_failed 0
pginodesteal 0
slabs_scanned 84829
kswapd_inodesteal 0
kswapd_low_wmark_hit_quickly 313
kswapd_high_wmark_hit_quickly 5262
pageoutrun 5895
pgrotated 0
drop_pagecache 0
drop_slab 0
oom_kill 0
numa_pte_updates 512084786
numa_huge_pte_updates 198
numa_hint_faults 494583387
numa_hint_faults_local 129411334
numa_pages_migrated 34165992
pgmigrate_success 67833977
pgmigrate_fail 7
thp_migration_success 135
thp_migration_fail 0
thp_migration_split 0

perf-profile:

kswapd.kthread.ret_from_fork: 2.86
balance_pgdat.kswapd.kthread.ret_from_fork: 2.86
lru_gen_age_node.balance_pgdat.kswapd.kthread.ret_from_fork: 1.97
walk_page_range.try_to_inc_max_seq.lru_gen_age_node.balance_pgdat.kswapd: 1.97
shrink_node.balance_pgdat.kswapd.kthread.ret_from_fork: 0.89
evict_folios.lru_gen_shrink_lruvec.shrink_lruvec.shrink_node.balance_pgdat: 0.89
scan_folios.evict_folios.lru_gen_shrink_lruvec.shrink_lruvec.shrink_node: 0.66

Best Regards,
Huang, Ying

[snip]