Re: mm: kswapd struggles reclaiming the pages on 64GB server

From: Michal Hocko
Date: Wed Aug 17 2016 - 07:43:29 EST


[CCing linux-mm and Johannes]

On Fri 12-08-16 21:52:20, Andriy Tkachuk wrote:
> Hi,
>
> our user-space application uses large amount of anon pages (private
> mapping of the large file, more than 64GB RAM available in the system)
> which are rarely accessible and are supposed to be swapped out.
> Instead, we see that most of these pages are kept in memory while the
> system suffers from the lack of free memory and overall performance
> (especially the disk I/O, vm.swappiness=100 does not help it). kswapd
> scans millions of pages per second but reclames hundreds per sec only.

I haven't looked at your numbers deeply but this smells like the long
standing problem/limitation we have. We are trying really hard to not
swap out and rather reclaim the page cache because the swap refault
tends to be more disruptive in many case. Not all, though, and trashing
like behavior you see is cetainly undesirable.

Johannes has been looking into that area recently. Have a look at
http://lkml.kernel.org/r/20160606194836.3624-1-hannes@xxxxxxxxxxx

> Here are the 5 secs interval snapshots of some counters:
>
> $ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
> proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2
> proc-meminfo-0616-160539.txt Cached: 347936 kB
> proc-meminfo-0616-160549.txt Cached: 316316 kB
> proc-meminfo-0616-160559.txt Cached: 322264 kB
> proc-meminfo-0616-160539.txt SwapCached: 2853064 kB
> proc-meminfo-0616-160549.txt SwapCached: 2853168 kB
> proc-meminfo-0616-160559.txt SwapCached: 2853280 kB
> proc-vmstat-0616-160535.txt nr_active_anon 14508616
> proc-vmstat-0616-160545.txt nr_active_anon 14513725
> proc-vmstat-0616-160555.txt nr_active_anon 14515197
> proc-vmstat-0616-160535.txt nr_inactive_anon 747407
> proc-vmstat-0616-160545.txt nr_inactive_anon 744846
> proc-vmstat-0616-160555.txt nr_inactive_anon 744509
> proc-vmstat-0616-160535.txt nr_vmscan_write 5589095
> proc-vmstat-0616-160545.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160555.txt nr_vmscan_write 5589097
> proc-vmstat-0616-160535.txt pgactivate 246016824
> proc-vmstat-0616-160545.txt pgactivate 246033242
> proc-vmstat-0616-160555.txt pgactivate 246042064
> proc-vmstat-0616-160535.txt pgrefill_normal 22763262
> proc-vmstat-0616-160545.txt pgrefill_normal 22768020
> proc-vmstat-0616-160555.txt pgrefill_normal 22768178
> proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
> proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
> proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
> proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064
> proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
> proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
> proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637
>
> The pgrefill_normal and pgactivate counters show that only few
> hundreds/sec pages move from active to inactive and vice versa lists -
> that is comparable with what was reclaimed. So it looks like kswapd
> scans the pages from inactive list mostly in kind of a loop and does
> not even have a chance to look at the pages from the active list
> (where most of the application's anon pages are located).
>
> The kernel version: linux-3.10.0-229.14.1.el7.
>
> Any ideas? Would be be useful to change inactive_ratio dynamically in
> such a cases so that more pages could be moved from active to inactive
> list and get a chance to be reclaimed? (Note: when application is
> restarted - the problem disappears for a while (days) until the
> correspondent number of privately mapped pages are dirtied again.)
>
> Thank you,
> Andriy

--
Michal Hocko
SUSE Labs