Re: Zram writeback feature unstable with heavy swap utilization - BUG: Bad page state in process...

From: Tino Lehnig
Date: Wed Jul 25 2018 - 11:12:18 EST


Hi,

On 07/25/2018 03:21 PM, Minchan Kim wrote:
It would be much helpful if you could check more versions with git-bisect.

I started bisecting today, but my results are not conclusive yet. It is certain that the problem started with 4.15 though. I have not encountered the bug message in 4.15-rc1 so far, but the kvm processes always became unresponsive after hitting swap and could not be killed there. I saw the same behavior in rc2, rc3, and other builds in between, but the bad state bug would only trigger occasionally there. The behavior in 4.15.18 is the same as in newer kernels.

I also want to reproduce it.

Today, I downloaded one window iso and execute it as cdrom with my owned
compiled kernel on KVM but I couldn't reproduce.
I also tested some heavy swap workload(kernel build with multiple CPU
on small memory) but I failed to reproduce, too.

Please could you told me your method more detail?

I found that running Windows in KVM really is the only reliable method, maybe because the zero pages are easily compressible. There is actually not a lot of disk utilization on the backing device when running this test.

My operating system is a minimal install of Debian 9. I took the kernel configuration from the default Debian kernel and built my own kernel with "make oldconfig" leaving all settings at their defaults. The only thing I changed in the configuration was enabling the zram writeback feature.

All my tests were done on bare-metal hardware with Xeon processors and lots of RAM. I encounter the bug quite quickly, but it still takes several GBs of swap usage. Below is my /proc/meminfo with enough KVM instances running (3 in my case) to trigger the bug on my test machine.

I will also try to reproduce the problem on some different hardware next.

--

MemTotal: 264033384 kB
MemFree: 1232968 kB
MemAvailable: 0 kB
Buffers: 1152 kB
Cached: 5036 kB
SwapCached: 49200 kB
Active: 249955744 kB
Inactive: 5096148 kB
Active(anon): 249953396 kB
Inactive(anon): 5093084 kB
Active(file): 2348 kB
Inactive(file): 3064 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 1073741820 kB
SwapFree: 938603260 kB
Dirty: 68 kB
Writeback: 0 kB
AnonPages: 255007752 kB
Mapped: 4708 kB
Shmem: 1212 kB
Slab: 88500 kB
SReclaimable: 16096 kB
SUnreclaim: 72404 kB
KernelStack: 5040 kB
PageTables: 765560 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1205758512 kB
Committed_AS: 403586176 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 254799872 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 75136 kB
DirectMap2M: 10295296 kB
DirectMap1G: 260046848 kB

--
Kind regards,

Tino Lehnig