Reclaim regression after 1c30844d2dfe
From: Ivan Babrou
Date: Fri Feb 07 2020 - 17:56:19 EST
This change from 5.5 times:
* https://github.com/torvalds/linux/commit/1c30844d2dfe
> mm: reclaim small amounts of memory when an external fragmentation event occurs
Introduced undesired effects in our environment.
* NUMA with 2 x CPU
* 128GB of RAM
* THP disabled
* Upgraded from 4.19 to 5.4
Before we saw free memory hover at around 1.4GB with no spikes. After
the upgrade we saw some machines decide that they need a lot more than
that, with frequent spikes above 10GB, often only on a single numa
node.
We can see kswapd quite active in balance_pgdat (it didn't look like
it slept at all):
$ ps uax | fgrep kswapd
root 1850 23.0 0.0 0 0 ? R Jan30 1902:24 [kswapd0]
root 1851 1.8 0.0 0 0 ? S Jan30 152:16 [kswapd1]
This in turn massively increased pressure on page cache, which did not
go well to services that depend on having a quick response from a
local cache backed by solid storage.
Here's how it looked like when I zeroed vm.watermark_boost_factor:
* https://imgur.com/a/6IZWicU
IO subsided from 100% busy in page cache population at 300MB/s on a
single SATA drive down to under 100MB/s.
This sort of regression doesn't seem like a good thing.