Re: /proc dcache deadlock in do_exit

From: Eric W. Biederman
Date: Tue Nov 27 2007 - 20:55:05 EST


Andrea Arcangeli <andrea@xxxxxxx> writes:

> On Tue, Nov 27, 2007 at 02:38:52PM -0800, Andrew Morton wrote:
>> I don't see why the schedule() will not return? Because the task has
>> PF_EXITING set? Doesn't TASK_DEAD do that?
>
> Ouch, I assumed you couldn't sleep safely anymore in release_task
> given it's the function that will free the task structure itself and
> there was no preempt related action anywhere close to it!
> delayed_put_task_struct can be called if a quiescent point is reached
> and any scheduling would exactly allow it to run (it requires quite a
> bit of a race, with local irq triggering a reschedule and the timer
> irq invoking the tasklet to run to free the task struct before do_exit
> finishes and all other cpus in quiescent state too).
>
> So a corollary question is how can it be safe to call
> preempt_disable() after call_rcu(delayed_put_task_struct)?


> Back in sles9 preempt_disable was implemented as
> _raw_write_unlock(&tasklist_lock) and it happened _before_
> release_task, and scheduling there wouldn't return because PF_DEAD was
> already set. If mainline can come back, it will crash for a different
> reason because the task struct is long gone by the time
> release_task+schedule() runs. Either ways, still a kernel crashing bug
> there is. Or is there some magic that prevents call_rcu + schedule to
> invoke the rcu callback?

I don't quite see where it comes from but there is another reference
on the task struct held by the scheduler That we don't drop until
finish_task_switch with exit_state == TASK_DEAD.

So since we have one additional reference the task_struct won't
get freed.

We don't set TASK_DEAD until much later.

> So you may need to apply this one too (this one is needed to fix the
> second bug, my previous patch is needed after applying this one):

No. We should be fine.

In fact it looks like this is just a sles9 issue and mainline is
fine, as we can safely schedule until just before the end of do_exit.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/