Re: [BUG] long freezes on thinkpad t60

From: Ingo Molnar
Date: Mon Jun 18 2007 - 14:01:17 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> That code does:
>
> if (unlikely(p->array || task_running(rq, p))) {
>
> to decide if it needs to just unlock and repeat, but then to decide if
> it need to *yield* it only uses *one* of those tests (namely
>
> preempted = !task_running(rq, p);
> ..
> if (preempted)
> yield();
>
> and I think that's just broken. It basically says:
>
> - if the task is running, I will busy-loop on getting/releasing the
> task_rq_lock
>
> and that is the _real_ bug here.
>
> Trying to make the spinlocks do somethign else than what they do is
> just papering over the real bug. The real bug is that anybody who just
> busy-loops getting a lock is wasting resources so much that we should
> not be at all surprised that some multi-core or NUMA situations will
> get starvation.
>
> Blaming some random Core 2 hardware implementation issue that just
> makes it show up is wrong. It's a software bug, plain and simple.

yeah, agreed. wait_task_inactive() is butt-ugly, and Roland i think
found a way to get rid of it in utrace (but it's not implemented yet,
boggle) - but nevertheless this needs fixing for .22.

> So how about this diff? The diff looks big, but the *code* is actually
> simpler and shorter, I just added tons of comments, which is what
> blows it up.

>
> The new *code* looks like this:
>
> repeat:
> /* Unlocked, optimistic looping! */
> rq = task_rq(p);
> while (task_running(rq, p))
> cpu_relax();

ok. Do we have an guarantee that cpu_relax() is also an smp_rmb()?

>
> /* Get the *real* values */
> rq = task_rq_lock(p, &flags);
> running = task_running(rq, p);
> array = p->array;
> task_rq_unlock(rq, &flags);
>
> /* Check them.. */
> if (unlikely(running)) {
> cpu_relax();
> goto repeat;
> }
>
> if (unlikely(array)) {
> yield();
> goto repeat;
> }

hm, this might still go into a non-nice busy loop on SMP: one cpu runs
the strace, another one runs two tasks, one of which is runnable but not
on the runqueue (the one we are waiting for). In that case we'd call
yield() on this CPU in a loop (and likely wont pull that task over from
that CPU). And yield() itself is a high-frequency rq-lock touching thing
too, just a bit heavier than the other path in the wait function.

> Hmm? Untested, I know. Maybe I overlooked something. But even the
> generated assembly code looks fine (much better than it looked
> before!)

it looks certainly better and cleaner than what we had before!

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/