Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity
From: Jason Gunthorpe
Date: Thu Sep 30 2021 - 13:01:51 EST
On Thu, Sep 30, 2021 at 07:51:22PM +0300, Max Gurtovoy wrote:
>
> On 9/30/2021 7:24 PM, Jason Gunthorpe wrote:
> > On Thu, Sep 30, 2021 at 06:32:07PM +0300, Max Gurtovoy wrote:
> > > > Just prior to open device the vfio pci layer will generate a FLR to
> > > > the function so we expect that post open_device has a fresh from reset
> > > > fully running device state.
> > > running also mean that the device doesn't have a clue on its internal state
> > > ? or running means unfreezed and unquiesced ?
> > The device just got FLR'd and it should be in a clean state and
> > operating. Think the VM is booting for the first time.
>
> During the resume phase in the dst, the VM is paused and not booting.
> Migration SW is waiting to get memory and state from SRC. The device will
> start from the exact point that was in the src.
>
> it's exactly "000b => Device Stopped, not saving or resuming"
For this case qmeu should open the VFIO device and immediately issue a
command to go to resuming. The kernel cannot know at open_device time
which case userspace is trying to do. Due to backwards compat we
assume userspace is going to boot a fresh VM.
> Well, this is your design for the driver implementation. Nobody is
> preventing other drivers to start deserializing device state into the device
> during RESUMING bit on.
It is a logical model. Devices can stream the migration data directly
into the internal state if they like. It just creates more conditions
where they have report an error state.
> So if we moved from 100b to 010b somehow, one should deserialized its buffer
> to the device, and then serialize it to migration region again ?
Yes.
> I guess its doable since the device is freeze and quiesced. But moving from
> 100b to 011b is not possible, right ?
Why not?
100b to 011b is no different than going indirectly 100b -> 001b -> 011b
The time spent in 001b is just negligable.
Jason