Re: [PATCH v2] softlockup: decouple hung tasks check fromsoftlockup detection

From: Frederic Weisbecker
Date: Sat Jan 17 2009 - 09:07:47 EST


On Fri, Jan 16, 2009 at 08:13:30PM -0800, Mandeep Singh Baines wrote:
> Hi Frédéric,
>
> Frédéric Weisbecker (fweisbec@xxxxxxxxx) wrote:
> > > - read_lock(&tasklist_lock);
> > > - do_each_thread(g, t) {
> > > - if (!--max_count)
> > > - goto unlock;
> >
> >
> > Instead of having this arbitrary limit of tasks, why not just
> > lurk the need_resched() and then schedule if it needs too.
> >
> > I know that sounds a bit racy, because you will have to release the
> > tasklist_lock and
> > a lot of things can happen in the task list until you become resched.
> > But you can do a get_task_struct() on g and t before your thread is
> > going to sleep and then put them
> > when it is awaken.
> > Perhaps some tasks will disappear or be appended in the list before g
> > and t, but that doesn't really matter:
> > if they disappear, they didn't lockup, and if they were appended, they
> > are not enough cold to be analyzed :-)
> >
> > This way you can drop the arbitrary limit of task number given by the user....
> >
> > Frederic.
> >
>
> Would be nice to remove the limit. But I don't think get_task_struct()
> can be used to prevent a task from being unlinked from the task list. It
> only prevents the task_struct from being freed. So hung_task could end up
> holding a reference to an unlinked task after it returns from schedule().
>
> That doesn't mean what you are suggesting can't be implemented. Just means
> that the case of the held task being unlinked needs to be handled.
>
> Regards,
> Mandeep

Hmm, you're right.
Why not testing 1024 tasks, then check need_resched and if you sleep
and the task becomes unlinked (there are few chances) so... that's not
a big deal actually, you will have better chances on the next check :-)

I think that's a bit important since you are more likely to see a
soft-lockup if you have a lot of tasks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/