Re: RT and Cascade interrupts

From: Trond Myklebust
Date: Wed Jun 01 2005 - 13:55:06 EST


on den 01.06.2005 Klokka 14:05 (-0400) skreiv john cooper:
> I've done so and the second assert is generated
> when running the test. So here we have a case
> of RPC_TASK_HAS_TIMER set but the associated
> timer->base == NULL. It would seem this could
> easily be the scenario of executing in:
>
> __run_timers()
> timer->base = NULL;
> rpc_run_timer()
> task->tk_timeout_fn(task)
> /* ksoftirqd preempted */
> :
> /* RPC client */
> rpc_execute()
> rpc_delete_timer()
> del_timer() returns 0
> BUG_ON(test_bit(RPC_TASK_HAS_TIMER,
> &task->tk_runstate));
> :
> /* rpc_run_timer() resumes */
> clear_bit(RPC_TASK_HAS_TIMER,
> &task->tk_runstate);
>
> I don't see how this would imply a kernel/timer.c
> problem. It also appears this wouldn't be causing
> the timer cascade corruption I've seen as the
> end result is deleting an already dequeued timer
> which is safe here.

If timer->base==NULL, then del_timer() returns 0 (as you correctly state
above). What you appear to be missing is that this will trigger a call
to del_timer_sync() inside del_singleshot_timer_sync().

In other words, your scenario above should, if the timer code is working
according to spec instead lead to

__run_timers()
timer->base = NULL;
rpc_run_timer()
task->tk_timeout_fn(task)
/* ksoftirqd preempted */
:
/* RPC client */
rpc_execute()
rpc_delete_timer()
del_singleshot_timer_sync()
del_timer() returns 0
del_timer_sync()
/* wait until timer has
completed */
/*rpc_run_timer() resumes */
clear_bit(RPC_TASK_HAS_TIMER,
&task->tk_runstate);
....
set_running_timer(some_other_value);

BUG_ON(test_bit(RPC_TASK_HAS_TIMER,
&task->tk_runstate));
:

Cheers,
Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/