Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration

From: Jason Gunthorpe
Date: Tue Mar 01 2022 - 19:03:40 EST


On Tue, Mar 01, 2022 at 03:44:31PM -0700, Alex Williamson wrote:
> On Tue, 1 Mar 2022 16:39:38 -0400
> Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> > On Tue, Mar 01, 2022 at 12:30:47PM -0700, Alex Williamson wrote:
> > > Wouldn't it make more sense if initial-bytes started at QM_MATCH_SIZE
> > > and dirty-bytes was always sizeof(vf_data) - QM_MATCH_SIZE? ie. QEMU
> > > would know that it has sizeof(vf_data) - QM_MATCH_SIZE remaining even
> > > while it's getting ENOMSG after reading QM_MATCH_SIZE bytes of data.
> >
> > The purpose of this ioctl is to help userspace guess when moving on to
> > STOP_COPY is a good idea ie when the device has done almost all the
> > work it is going to be able to do in PRE_COPY. ENOMSG is a similar
> > indicator.
> >
> > I expect all devices to have some additional STOP_COPY trailer_data in
> > addition to their PRE_COPY initial_data and dirty_data
> >
> > There is a choice to make if we report the trailer_data during
> > PRE_COPY or not. As this is all estimates, it doesn't matter unless
> > the trailer_data is very big.
> >
> > Having all devices trend toward a 0 dirty_bytes to say they are are
> > done all the pre-copy they can do makes sense from an API
> > perspective. If one device trends toward 10MB due to a big
> > trailer_data and one trends toward 0 bytes, how will qemu consistently
> > decide when best to trigger STOP_COPY? It makes the API less useful.
> >
> > So, I would not include trailer_data in the dirty_bytes.
>
> That assumes that it's possible to keep up with the device dirty
> rate.

It keeps options open so we have this choice someday.

We already see that implementations are using vCPU throttling as part
of their migration strategy, and we are seriously looking at DMA
throttling. It is not a big leap to imagine that
internal-state-dirtying throttling will happne someday.

With throttling iterations would ratchet up the throttle until they
reach an absolute small amount of dirty then cut over to STOP_COPY

> It seems like a better approach for userspace would be to look at how
> dirty_bytes is trending.

It may be biw, but this approach doesn't care if the trailing_bytes
are included or not, so lets leave them out and preserve the other
operating model.

> If we exclude STOP_COPY trailing data from the VFIO_DEVICE_MIG_PRECOPY
> ioctl, it seems even more of a disconnect that when we enter the
> STOP_COPY state, suddenly we start getting new data out of a PRECOPY
> ioctl.

Why? That amounts can go up at any time, how does it matter if it goes
up after STOP_COPY or instantly before?

> BTW, "VFIO_DEVICE" should be reserved for ioctls and data structures
> relative to the device FD, appending it with _MIG is too subtle for me.
> This is also a GET operation for INFO, so I'd think for consistency
> with the existing vfio uAPI we'd name this something like
> VFIO_MIG_GET_PRECOPY_INFO where the structure might be named
> vfio_precopy_info.

Sure

> So if we don't think this is the right approach for STOP_COPY, then why
> are we pushing that it has any purpose outside of PRECOPY or might be
> implemented by a non-PRECOPY driver for use in STOP_COPY?

It is just simpler and more consistent to implement the math under
this ioctl in all cases then to try and artificially restrict it.

But I don't have a use case for it, so lets block it if you prefer.

Shameerali will you make these adjustments to the PRE_COPY patch?

Thanks,
Jason