On Thu, 30 Sep 2021 00:48:55 +0300
Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote:
On 9/29/2021 7:14 PM, Jason Gunthorpe wrote:The _STOP state has a specific meaning regardless of whether userspace
On Wed, Sep 29, 2021 at 06:28:44PM +0300, Max Gurtovoy wrote:Alex said that there is a problem of compatibility.
vfio-pci provides no way to observe the migration state. It isn'tSo you have a device that's actively modifying its internal state,I don't see how is it different from vfio-pci situation.
performing I/O, including DMA (thereby dirtying VM memory), all while
in the _STOP state? And you don't see this as a problem?
"000b"
I migration SW is not involved, nobody will read this migration state.
reads the device state value. I think what you're suggesting is that
the device reports itself as _STOP'd but it's actually _RUNNING. Is
that the compatibility workaround, create a self inconsistency?
We cannot impose on userspace to move a device from _STOP to _RUNNING
simply because the device supports the migration region, nor should we
report a device state that is inconsistent with the actual device state.
_RESUMING to _SAVING seems like a good way to test round trip migrationI guess we can do it for this reason. There is no functional problem norMaybe we need to rename STOP state. We can call it READY or LIVE orIt was a poor choice to use 000b as stop, but it doesn't really
NON_MIGRATION_STATE.
matter. The mlx5 driver should just pre-init this readable to running.
compatibility issue here as was mentioned.
But still we need the kernel to track transitions. We don't want to
allow moving from RESUMING to SAVING state for example. How this
transition can be allowed ?
In this case we need to fail the request from the migration SW...
without running the device to modify the state. Potentially it's a
means to update a saved device migration data stream to a newer format
using an intermediate driver version.
If a driver is written such that it simply sees clearing the _RESUME
bit as an indicator to de-serialize the data stream to the device, and
setting the _SAVING flag as an indicator to re-serialize that data
stream from the device, then this is just a means to make use of
existing data paths.
The uAPI specifies a means for drivers to reject a state change, but
that risks failing to support a transition which might find mainstream
use cases. I don't think common code should be responsible for
filtering out viable transitions. Thanks,
Alex