Hello @all,
first of all, I sent this exact msg also to the lkml a few days ago but
since I received no reaction, I thought this list might be a better
place for this problem -- or I might at least reach the right persons to
get this fixed/debugged/... . :-)
Recently I started seeing freezes while compiling bigger packages that
do require lots of memory (I use Gentoo).
The freezes where in the form that while in Xorg, the system would just
completely hang -- no magic sysrq keys, no mouse movement, nothing.
While in a terminal, one could still issue a magic sysrq command but it
would only echo the command itself but not execute it -- except for the
reboot command. So there was no way to get a backtrace or states or
anything alike.
After debugging this further, it became clear that the system always
froze when it started hitting the encrypted swap. It worked absolutely
fine as soon as you took the encryption out of the picture.
My setup then was: A 8 GiB swap on S/W-RAID5 for my 8 GiB physical ram
that was encrypted with dm-crypt and AES256-CBC-ESSIV.
I debugged this further and changed my setup to several swap partitions
on the physical disks w/o a RAID in-between to isolate the culprit. This
made no difference -- neither did switching ciphers and so forth.
Since this setup had worked for ages, I started looking into what had
changed the weeks before and noticed I had done several kernel upgrades.
To make a long story short, here my findings:
4.3.0, 4.4.0-final, 4.5-rc1 to 4.5-rc2:
No problems, except for the usual sluggishness with encrypted swap that
has been there since forever (it is like the encryption has the highest
priority and takes over the system, e.g. no terminal input is accepted
on a different terminal while high memory pressure is going on which is
in contrast with the encrypted swap, where this still works fine).
4.4.x, >= 4.5-rc3 (incl. 4.6-rcX and master):
The system freezes under memory pressure as soon as it starts swapping
out. 4.6 master is an exception here, it still responds to magic sysrq
commands properly but after some time though completely freezes hard.
I hadn't had the time to test all 4.3.x and 4.4.x releases, I am afraid.
What I can say though is that 4.4.6 is affected as well.
A git bisect between 4.5-rc2 and 4.5-rc3, lead me to the following commit:
564e81a57f9788b1475127012e0fd44e9049e342 is the first bad commit
commit 564e81a57f9788b1475127012e0fd44e9049e342
Author: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Date: Fri Feb 5 15:36:30 2016 -0800
mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any
progress
This is obviously not the real culprit in my opinion but a trigger.
Reverting that commit on 4.5.1 for example, makes the encrypted swap
work flawlessly again (except for the usual system sluggishness).
Reverting it on 4.6 master@c3b46c73264b03000d1e18b22f5caf63332547c9,
does show a different picture though: The system freezes while the sysrq
keys do still work and usually recovers after some while if the
corresponding task that triggered the swapping in the first place, gets
killed. It sometimes does a bit of swapping, and sometimes don't while
it hangs there -- while usually with the other kernels in the "frozen"
state, the swapping stops completely.
I managed to get a bit more information out of 4.6 master though since
it sometimes recovers after quite some time and I can copy backtraces
and such to the disk, which I have attached.
I hope this helps in finding the real issue behind this. I am sorry I
could not provide more information but this has been a rather time
consuming task thus far. :-)
If there is anything else I can do to help or test, please let me know
and I will gladly do so.
Thanks in advance.
So long,
Matthias