Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration
From: Jason Gunthorpe
Date: Tue Mar 01 2022 - 08:15:42 EST
On Mon, Feb 28, 2022 at 09:41:10PM -0700, Alex Williamson wrote:
> > + * returning readable. ENOMSG may not be returned in STOP_COPY. Support
> > + * for this ioctl is required when VFIO_MIGRATION_PRE_COPY is set.
>
> This entire ioctl on the data_fd seems a bit strange given the previous
> fuss about how difficult it is for a driver to estimate their migration
> data size. Now drivers are forced to provide those estimates even if
> they only intend to use PRE_COPY as an early compatibility test?
Well, yes. PRE_COPY is designed to be general, not just to serve for
compatability. Qemu needs data to understand how the internal dirty
accumulation in the device is progressing. So everything has to
provide estimates, and at least for acc this is trivial.
> Obviously it's trivial for the acc driver that doesn't support dirty
> tracking and only has a fixed size migration structure, but it seems to
> contradict your earlier statements.
mlx5 knows exactly this data size once it completes entering
STOP_COPY, it has a migf->total_size just like acc, so no problem to
generate this ioctl. We just don't have a use case for it and qemu
would never call it, so trying not to add dead things to the kernel.
Are you are talking about the prior discussion about getting this data
before reaching STOP_COPY?
> For instance, can mlx5 implement a PRE_COPY solely for compatibility
> testing or is it blocked by an inability to provide data estimates
> for this ioctl?
I expect it can, it works very similar to acc. It just doesn't match
where we are planning for compatability. mlx5 has a more dynamic
compatability requirement, it needs to be exposed to orchestration not
hidden in pre_copy. acc looks like it is static, so 'have acc' is
enough info for orchestration.
> Now if we propose that this ioctl is useful during the STOP_COPY phase,
> how does a non-PRE_COPY driver opt-in to that beneficial use case?
Just implement it - userspace will learn if the driver supports it on
the first ioctl = ENOTTY means no support.
> Do we later add a different, optional ioctl for non-PRE_COPY and
> then require userspace to support two different methods of getting
> remaining data estimates for a device in STOP_COPY?
I wouldn't add a new ioctl unless we discover a new requirement when
an implementation is made.
> If our primary goal is to simplify the FSM, I'm actually a little
> surprised we support the PRE_COPY* -> STOP_COPY transition directly
> versus passing through STOP.
A FSM should not have a hidden 'memory' of its past states.
STOP->STOP_COPY should always do exactly the same thing, regardless of
how the FSM reached STOP
The goal is to make the driver easy to implement, which has been done
by mapping the FSM arcs onto the logical actions a driver needs to
take.
We can't avoid that the driver does different things on
PRE_COPY->STOP_COPY vs STOP->STOP_COPY, so the input arcs should
reflect these situations directly. Notice the drivers are not deciding
what to do in these arcs based on hidden internal state.
> It seems this exists due to our policy that we can only generate one
> data_fd as a result of any sequence of state transitions, but I
> think there might also be an option to achieve similar if the
> PRE_COPY* states are skipped if they aren't the ultimate end state
> of the arc.
Are you talking about the other path? The FSM today requires drivers
to implement RUNNING->STOP_COPY - the FSM could also do:
RUNNING->PRE_COPY->STOP_COPY
And just carry forward the data_fd. This would eliminate one arc from
the driver at the cost of always having to turn on internal dirty
tracking/etc.
> I'm sure that raises questions about how we correlate a
> PRE_COPY* session to a STOP_COPY session though, but this PRE_COPY*
> specific but ongoing usage in STOP_COPY ioctl seems ad-hoc.
I do not think it is "pre_copy specific" - the ioctl returns the
estimated length of the data_fd, this is always a valid concept.
Jason