Re: RCU stall

From: Bart Van Assche
Date: Tue Mar 22 2016 - 20:09:07 EST


On 03/22/2016 01:45 PM, Paul E. McKenney wrote:
You are getting a soft lockup as well as an RCU CPU stall warning, so
it looks like something is taking a very long time in blk_done_softirq().

You have multiple occurrences at different times, so it looks to be
a long time as opposed to an infinite time. Are you perhaps doing
something that would make a huge amount of work for blk_done_softirq()?

See Documentation/RCU/stallwarn.txt in the kernel source tree for more
info on how to debug this sort of thing.

Hello Paul,

None of the drivers involved in the test I ran contain RCU code that has been changed recently. The block and SCSI subsystems processes I/O completions in softirq context but until last week I hadn't seen any RCU lockup complaints when I ran an SRP test against a kernel with lockdep and several other kernel debugging options enabled. This is why I sent an e-mail to you. I have read Documentation/RCU/stallwarn.txt after I received your reply but this didn't provide me any clue about where to look for the root cause. Any further help would be appreciated.

Thanks,

Bart.