Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity

From: Jason Gunthorpe
Date: Thu Sep 30 2021 - 10:47:58 EST


On Thu, Sep 30, 2021 at 12:34:19PM +0300, Max Gurtovoy wrote:

> > When we add the migration extension this cannot change, so after
> > open_device() the device should be operational.
>
> if it's waiting for incoming migration blob, it is not running.

It cannot be waiting for a migration blob after open_device, that is
not backwards compatible.

Just prior to open device the vfio pci layer will generate a FLR to
the function so we expect that post open_device has a fresh from reset
fully running device state.

> > The reported state in the migration region should accurately reflect
> > what the device is currently doing. If the device is operational then
> > it must report running, not stopped.
>
> STOP in migration meaning.

As Alex and I have said several times STOP means the internal state is
not allowed to change.

> > driver will see RESUMING toggle off so it will trigger a
> > de-serialization
>
> You mean stop serialization ?

No, I mean it will take all the migration data that has been uploaded
through the migration region and de-serialize it into active device
state.

> > driver will see SAVING toggled on so it will serialize the new state
> > (either the pre-copy state or the post-copy state dpending on the
> > running bit)
>
> lets leave the bits and how you implement the state numbering aside.

You've missed the point. This isn't a FSM. It is a series of three
control bits that we have assigned logical meaning their combinatoins.

The algorithm I gave is a control centric algorithm not a state
centric algorithm and matches the direction Alex thought this was
being designed for.

> If you finish resuming you can move to a new state (that we should add) =>
> RESUMED.

It is not a state machine. Once you stop prentending this is
implementing a FSM Alex's position makes perfect sense.

Jason