Re: Zram writeback feature unstable with heavy swap utilization - BUG: Bad page state in process...
From: Minchan Kim
Date: Thu Jul 26 2018 - 02:21:25 EST
On Thu, Jul 26, 2018 at 08:10:41AM +0200, Tino Lehnig wrote:
> Hi,
>
> On 07/26/2018 04:03 AM, Minchan Kim wrote:
> > A thing I could imagine is
> > [0bcac06f27d75, skip swapcache for swapin of synchronous device]
> > It was merged into v4.15. Could you check it by bisecting?
>
> Thanks, I will check that.
>
> > > My operating system is a minimal install of Debian 9. I took the kernel
> > > configuration from the default Debian kernel and built my own kernel with
> > > "make oldconfig" leaving all settings at their defaults. The only thing I
> > > changed in the configuration was enabling the zram writeback feature.
> >
> > You mean you changed host kernel configuration?
> >
> > >
> > > All my tests were done on bare-metal hardware with Xeon processors and lots
> > > of RAM. I encounter the bug quite quickly, but it still takes several GBs of
> > > swap usage. Below is my /proc/meminfo with enough KVM instances running (3
> > > in my case) to trigger the bug on my test machine.
> >
> > Aha.. you did writeback feature into your bare-metal host machine and execute
> > kvm with window images as a guest. So, PG_uptodate warning happens on host side,
> > not guest? Right?
>
> Yes, I am only talking about the host kernel. Zram swap is set up on the
> host. I just used Windows guests to fill up the host RAM and force it into
> swap.
>
> > > I will also try to reproduce the problem on some different hardware next.
>
> Just to confirm, I was able to reproduce the problem on another machine
> running Ubuntu 18.04 with the Ubuntu stock kernel (4.15) and no
> modifications to the kernel configuration whatsoever. The host had 8 GB of
That means you could reproduce it without writeback feature?
If so, it would be more reasoanble to check [0bcac06f27d75, skip swapcache for swapin of synchronous device]
> RAM, 32 GB of swap with zram and a 32 GB SSD as backing device. I had to
> start only one Windows VM with "-m 32768" to trigger the bug.
Thanks. I will try it later today.