Fwd: [REGRESSION] [BISECTED] kswapd high CPU usage

From: Nalorokk
Date: Thu Jan 21 2016 - 09:37:42 EST


It appears that kernels newer than 4.1 have kswapd-related bug
resulting in high CPU usage. CPU 100% usage could last for several
minutes or several days, with CPU being busy entirely with serving
kswapd. It happens usually after server being mostly idle, sometimes
after days, sometimes after weeks of uptime. But the issue appears
much sooner if the machine is loaded with something like building a
kernel.

Here are the graphs of CPU load: first [1], second [2]. Perf top
output is here [3] as well.

To find the cause of this problem I've started with the fact that the
issue appeared after 4.1 kernel update. Then I performed longterm test
of 3.18, and discovered that 3.18 is unaffected by this bug. Then I
did some tests of 4.0 to confirm that this version behaves well too.

Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
commits that seem to be reason of high CPU usage.

The first really "bad" commit is
79553da293d38d63097278de13e28a3b371f43c1. 2 previous commits cause
weird behavior as well resulting in kswapd consuming more CPU than
unaffected kernels, but not that much as the commit pointed above. I
believe those commits are related to the same mm tree merge.

I tried to add transparent_hugepage=never to kernel boot parameters,
but it did not change anything. Changing allocator to SLAB from SLUB
alters behavior and makes CPU load lower, but don't solve a problem at
all.

Here [4] is kernel bugzilla bugreport as well.

Ideas?

[1] http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png
[2] http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png
[3] http://pastebin.com/aRzTjb2x
[4] https://bugzilla.kernel.org/show_bug.cgi?id=110501