Doubt - facing a cpu softlockup during _raw_spin_unlock_irqrestore

From: Suraj Choudhari
Date: Sat Jan 28 2017 - 09:47:35 EST


Hello,,

I've few queries reg a CPU softlockup issue i am facing with 4.4
kernel on sles12.

Here are details -
1) The thread which causes the softlockup was releasing a spinlock.
The softlockup happens while running -- '_raw_spin_unlock_irqrestore'
The thread which causes the lockup was getting rescheduled again &
faced the soft lockup continuously.

2) At the time of lockup, all the IO threads were sleeping since 1
hour in schedule(),
function, specifically in the get_request() condition


Few queries -

1) What may be reason IO threads not getting scheduled since 1 hour,
however thread causing the lockup re-scheduled again readily ?

2) What may be reason few IO threads were waiting in the get_request()
condition for more than 1 hour ??

>From the get_request() implementation, I could figure that
__get_request() may fail with ENODEV or ENOMEM.

I tried to figure out return value of __get_request() using below
systemtap probe, but it did not print the value of the request
pointer.

probe kernel.statement("get_request@block/blk-core.c:1246")
{
printf("localvars1246:%s \n", $$locals);
}

output - localvars1246:is_sync=? wait={...} rl=? rq=?

so I could not figure out exact cause __get_request may be failing for
IO threads ?

3) Any suggestions how to fix such soft lockup issue during
_raw_spin_unlock_irqrestore? [I was thinking to use 'cond_schedule()'
in the thread facing the soft-lockup]


Thanks & Regards,
Suraj