System "hangs" on kernels > v4.4-rc1 - mpt2sas problem?

From: Luck, Tony
Date: Thu Jan 07 2016 - 17:55:59 EST


This one has been hard for me to bisect. Partly because it just takes so long to reboot the servers that
have the problem, but also because there may be more than one thing going on. One thing that seems
pretty certain is that things are OK prior to:

commit d83763f4a6adb2f417c3288ee903982985ae949c
Merge: 9aa3d651a919 0a5149ba02bd
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Fri Nov 13 20:35:54 2015 -0800

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Attached is a serial log from a kernel built from a commit within the merge
that shows the problem. Subsequent changes haven't fixed this. 4.4-rc8
hangs at the same point.

commit e0bd0874f2de21613e572669b2de1e4b0c3a97de
Author: sumit.saxena@xxxxxxxxxxxxx <sumit.saxena@xxxxxxxxxxxxx>
Date: Mon Aug 31 17:23:01 2015 +0530

megaraid_sas: Increase timeout to 60 secs for abort frames during shutdown

Stuff is obviously going wrong by this point:
[ 16.100642] mpt2sas 0000:01:00.0: swiotlb buffer is full (sz: 398336 bytes)
[ 16.100662] swiotlb: coherent allocation failed for device 0000:01:00.0 size=398336
followed by a stack dump (see attachment)

I've put "hangs" in quotes because the kernel isn't stuck ... at the end of
the attached serial log you see that the random: nonblocking pool
completed initialization 46 seconds after other boot messages stopped.
So I think some application just failed to complete some I/O
-Tony



Attachment: failmptsas.log
Description: failmptsas.log