Re: RT and Cascade interrupts

From: john cooper
Date: Sat May 28 2005 - 12:51:11 EST


Oleg Nesterov wrote:

CPU_0 CPU_1

rpc_release_task(task)
__run_timers:
timer->base = NULL;
rpc_delete_timer()
if (timer->base)
// not taken
del_timer_sync();

mempool_free(task);
timer->function(task):
// task already freed/reused
__rpc_default_timer(task);

Ah ok, I was misreading the above. Now I see your point.
The race if hit loses the synchronization provided by
del_timer_sync() in my case. Thanks for being persistent.

The other possibility to fix the original problem within
the scope of the RPC code was to replace the bit state of
RPC_TASK_HAS_TIMER with a count so we don't obscure the
fact the timer was requeued during the preemption window.
This happens as rpc_run_timer() does an unconditional
clear_bit(RPC_TASK_HAS_TIMER,..) before returning.

Another would be to check timer->base/timer_pending()
in rpc_run_timer() and to omit doing a clear_bit()
if the timer is found to have been requeued.

The former has simpler locking requirements but requires
an rpc_task data structure modification. The latter does
not and is a bit more intuitive but needs additional
locking. Both appear to provide a solution though I
haven't yet attempted either.

Anyone with more ownership of this code care to comment
on a preferred fix?

-john


--
john.cooper@xxxxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/