Re: PATCH? hrtimer_wakeup: fix a theoretical race wrt rt_mutex_slowlock()

From: Linus Torvalds
Date: Sun Nov 05 2006 - 17:30:13 EST




On Sun, 5 Nov 2006, Oleg Nesterov wrote:
>
> When task->array != NULL, try_to_wake_up() just goes to "out_running" and sets
> task->state = TASK_RUNNING.
>
> In that case hrtimer_wakeup() does:
>
> timeout->task = NULL; <----- [1]
>
> spin_lock(runqueues->lock);
>
> task->state = TASK_RUNNING; <----- [2]
>
> from Documentation/memory-barriers.txt
>
> Memory operations that occur before a LOCK operation may appear to
> happen after it completes.
>
> This means that [2] may be completed before [1], and

Yes. On x86 (and x86-64) you'll never see this, because writes are always
seen in order regardless, and in addition, the spin_lock is actually
totally serializing anyway. On most other architectures, the spin_lock
will serialize all the writes too, but it's not guaranteed, so in theory
you're right. I suspect no actual architecture will do this, but hey,
when talking memory ordering, safe is a lot better than sorry.

That said, since "task->state" in only tested _inside_ the runqueue lock,
there is no race that I can see. Since we've gotten the runqueue lock in
order to even check task-state, the processor that _sets_ task state must
not only have done the "spin_lock()", it must also have done the
"spin_unlock()", and _that_ will not allow either the timeout or the task
state to haev leaked out from under it (because that would imply that the
critical region leaked out too).

So I don't think the race exists anyway - the schedule() will return
immediately (because it will see TASK_RUNNING), and we'll just retry.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/