Re: [PATCH] driver core: fix shutdown races with probe/remove

From: Alan Stern
Date: Wed Jun 06 2012 - 10:44:03 EST


On Wed, 6 Jun 2012, Ming Lei wrote:

> On Wed, Jun 6, 2012 at 1:09 AM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>
> >> After device_shutdown() has been called, the whole system will enter power down
> >> or reset later, so it doesn't matter if who should manage the device.
> >
> > Maybe.  But there might be quite some time between the shutdown call
> > and the eventual power-off or reboot.
>
> Between device_shutdown and machine_halt, there is only syscore_shutdown left
> to complete, so no much time is involved.

Not much time from the user's point of view, but computers can do a lot
in a short time.

> > On the whole, it might be easier just to hold the device lock during
> > the shutdown call.
>
> Yes, it is easier, but it is not efficient because there are very less
> drivers which implemented .shutdown callback. Suppose there are some
> drivers which have no .shutdown but are holding device lock, extra
> time will be consumed in the lock acquiring, which may slow down 'power
> down'.

The main reasons for holding the device lock are probe and remove
callbacks. In your patch, synchronize_srcu has to wait for those too.
So there won't be much extra time.

> >> Because the global variable of device_shutdown_started is not changed inside
> >> the function, the compiler may put the above line after the line of
> >> 'dev->driver = drv',
> >
> > No, it can't do that.  If the compiler moved this read farther down,
> > and as a result the CPU wrote to dev->driver when it shouldn't have,
> > the result would be incorrect (i.e., different from what would have
> > happened if the statement hadn't been moved).  Hence the compiler is
> > prohibited from making such an optimization.
5B5B>
> There are some similar examples(check on global variable is moved) with
> ACCESS_ONCE usage in Documentation/atomic_ops.txt. (from line 88)

Most of those examples involve repeatedly reading a global variable.
In your case there is no repetition, and you are writing rather than
reading.

Furthermore, I think some of those examples go a little too far.
Here's an extract from the file:

-------------------------------------------------------------------
For a final example, consider the following code, assuming that the
variable a is set at boot time before the second CPU is brought online
and never changed later, so that memory barriers are not needed:

if (a)
b = 9;
else
b = 42;

The compiler is within its rights to manufacture an additional store
by transforming the above code into the following:

b = 42;
if (a)
b = 9;

This could come as a fatal surprise to other code running concurrently
that expected b to never have the value 42 if a was zero. To prevent
the compiler from doing this, write something like:

if (a)
ACCESS_ONCE(b) = 9;
else
ACCESS_ONCE(b) = 42;
-------------------------------------------------------------------

That just seems wrong. By the same reasoning, the compiler is within
its rights to transform either the original code or the code using
ACCESS_ONCE into:

b = 999;
if (a)
b = 9;
else
b = 42;

and again, other code would be confused. The simple fact is that
SMP-safe code is not likely to be produced by a compiler that assumes
everything is single-threaded.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/