Re: [PATCH] iommu/amd: Fix amd_iommu_free_device()
From: Linus Torvalds
Date: Tue Feb 03 2015 - 14:23:50 EST
On Tue, Feb 3, 2015 at 9:34 AM, Joerg Roedel <joro@xxxxxxxxxx> wrote:
>
> Hmm, have you seen spurious wakeups happening? The wakeup only comes
> from put_device_state() and only when the reference count goes to zero.
Even if that were true - and it isn't - the patch in question making
it simpler and more robust, removing more lines than it adds.
Maybe the callers have other events pending that might wake that
process up? Are you going to guarantee that no caller ever might be
doing their own wakeup thing?
In fact, even if you do *(not* have anything in your call chain that
has other wait queues, active, it's still wrong to do the
single-wakeup thing, because we've also traditionally had cases where
we do directed wakeups of threads
Basically, you should always assume you can get spurious wakeups due
to multiple independent wait-queues, the same way you should always
assume you can get spurious hardware interrupts due to shared
interrupt lines among multiple idnependent devices.
Because there are also non-wait-queue wakeup events, although they are
relatively rare. The obvious one that people are aware of is signals,
which only affects TASK_INTERRUPTIBLE - or TASK_KILLABLE - waits, but
there can actually be spurious wakeups even for TASK_UNINTERRUPTIBLE
cases.
In particular, there are things like "wake_up_process()" which
generally doesn't wake up random tasks, but we do have cases where we
use it locklessly (taking a reference to a task). See for example
"kernel/locking/rwsem-add.c", which uses this model for waking up a
process *without* necessarily holding a lock, which can result in the
process actually being woken up *after* it already started running and
doing something else.
The __rwsem_do_wake() thing does
smp_mb();
waiter->task = NULL;
wake_up_process(tsk);
put_task_struct(tsk);
to basically say "I can wake up the process even after it has released
the "waiter" structure, using the "waiter->task" thing as a release.
The waiter, in turn, waits for waiter.task to become NULL, but what
this all means is that you can actually get a "delayed wakeup" from
having done a successful down_read() some time ago.
All of these are really really unlikely to hit you, but basically you
should *never* assume anything about "single wakeup".
The linux wakeup model has always been that there can be multiple
sources of wakeup events, and the proper way to wait for something is
generally a variation of
prepare_to_wait(..);
for (;;) {
set_task_state(..);
.. test for condition and break out ..
schedule()
}
finish_wait()
although obviously these days we *heavily* favor the "wait_event()"
interfaces that encapsulate something that looks like that loop and
makes for a much simpler programming model. We should basically never
favor the open-coded version, because it's so easy to get it wrong.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/