Re: RT and Cascade interrupts

From: john cooper
Date: Wed Jun 01 2005 - 13:17:33 EST


Oleg Nesterov wrote:
This all is very unlikely of course, but it would be nice to verify
that kernel/timer.c is not the source of the problem.

The problem I am seeing is on a single CPU system.

John, if it is easy to reproduce the problem, could you please retest
with this patch?

I've done so and the second assert is generated
when running the test. So here we have a case
of RPC_TASK_HAS_TIMER set but the associated
timer->base == NULL. It would seem this could
easily be the scenario of executing in:

__run_timers()
timer->base = NULL;
rpc_run_timer()
task->tk_timeout_fn(task)
/* ksoftirqd preempted */
:
/* RPC client */
rpc_execute()
rpc_delete_timer()
del_timer() returns 0
BUG_ON(test_bit(RPC_TASK_HAS_TIMER,
&task->tk_runstate));
:
/* rpc_run_timer() resumes */
clear_bit(RPC_TASK_HAS_TIMER,
&task->tk_runstate);

I don't see how this would imply a kernel/timer.c
problem. It also appears this wouldn't be causing
the timer cascade corruption I've seen as the
end result is deleting an already dequeued timer
which is safe here.

BTW, if preemption is explicitly disabled in rpc_run_timer()
the BUG_ON assert is not generated nor (as reported
earlier) does the cascade corruption occur. I'm still
investigating.

-john



--- 2.6.12-rc5/net/sunrpc/sched.c~ Wed Jun 1 17:49:57 2005
+++ 2.6.12-rc5/net/sunrpc/sched.c Wed Jun 1 18:00:31 2005
@@ -137,8 +137,12 @@ rpc_delete_timer(struct rpc_task *task)
{
if (RPC_IS_QUEUED(task))
return;
- if (test_and_clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate)) {
- del_singleshot_timer_sync(&task->tk_timer);
+ if (test_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate)) {
+ if (del_singleshot_timer_sync(&task->tk_timer)) {
+ BUG_ON(!test_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate));
+ clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate);
+ } else
+ BUG_ON(test_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate));
dprintk("RPC: %4d deleting timer\n", task->tk_pid);
}
}



--
john.cooper@xxxxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/