kexec: device shutdown vs. remove

From: Benjamin Herrenschmidt
Date: Sat Jul 23 2016 - 16:52:17 EST


Hi !

This is somewhat of a recurring issue, some of my previous attempts on
lkml, I suspect, were just drowned in the noise. Eric, we had a quick
discussion about this a while back but I don't think we reached a
conclusion.

A bit of context: On OpenPOWER machines, we have a Linux based
bootloader, so we rely heavily on kexec to boot distro kernels, and
this has been causing us grief, mostly in the device driver space.

Device drivers need to be quiesced before kexec. More specifically
the device *hardware* needs that, ie we want DMAs to stop and the
device to be put into a state where it can reliably be picked up by the
driver in the new kernel.

Today, kexec calls device_shutdown() to achieve that. I argue that this
is the wrong thing to do and instead we should do someting that causes
the various drivers ->remove() function to be called (whether that
implies actually unbinding the driver or not).

I believe we do this for historical reasons, as ->remove() used to
depend on CONFIG_HOTPLUG while ->shutdown() was always around but that
is no longer the case.

The most visible issue with ->shutdown() that we encouter is that a lot
of drivers simply don't implement it.

The *real* issue however is that it's the wrong thing to do anyway. It
is a call intended to be called when the machine will be shutdown, as
such not only it is very much optional (and rarely implemented), but it
can also (and will in some cases) power bits of hardware off which is
not what you want to do if a new driver will try to pick up the pieces.

Arguably, the most correct semantic is provided by ->remove() since
that corresponds to removing a driver and binding a new one to the
device. IE. the same flow as doing rmmod/insmod of a new driver.

In practice, we obseve that a lot more drivers implement ->remove(). A
few were "fixed" to have ->shutdown() for kexec stake over time, but in
many case it's a duplication of ->remove() (ugh...).

So I would like to discuss this or at least get feedback and an overall
agreement. I can provide patches to test fairly soon.

Cheers,
Ben.