Re: [PATCH v1 00/23] s390/vfio-ap: Implement live guest migration of guests using AP devices

From: Alex Williamson

Date: Tue Mar 31 2026 - 13:46:40 EST

On Tue, 31 Mar 2026 08:07:06 -0400
Anthony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:

> On 3/30/26 12:27 PM, Alex Williamson wrote:
> > On Wed, 25 Mar 2026 17:00:47 -0400
> > Anthony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
> >
> >> This patch series implements live guest migration of a guest to which AP
> >> devices have been passed through. To better comprehend this design, one has
> >> to understand that VFIO AP mediated device is not used to provide userspace
> >> with direct access to a device as is the case with other devices that use
> >> the VFIO framework to pass them through to a guest. The sole purpose of the
> >> VFIO AP mediated device is to manage an AP configuration for a guest. An AP
> >> configuration is comprised of the AP adapter IDs (APID), AP queue
> >> indexes (APQI) and domain numbers of the control domains to which a guest
> >> will be granted access. Once the VFIO AP mediated device is attached to a
> >> guest, its AP configuration is set by the vfio_ap device driver. Once set,
> >> all access to the AP devices is handled by the s390 Interpretive Execution
> >> facility; in other words, the vfio_ap device driver plays no role in
> >> providing direct access to the AP devices in the guest's AP configuration.
> >>
> >> The only role that the vfio_ap device driver plays in the migration
> >> process is to verify that the AP configuration for the source guest is
> >> compatible with the AP configuration of the destination guest.
> >> Incompatibility will result in a live guest migration failure.
> >> In order to be compatible, the following requirements must be met:
> >>
> >> 1. The destination guest will be started with the same QEMU command line
> >> as the source guest, so the mediated device supplying the AP
> >> configuration on both guests must have the same name (UUID).
> > AFAIK, same UUID is not a requirement for out-of-tree mdev drivers
> > supporting migration. You're really concerned more with the
> > configuration/composition of the mdev device, so requiring the same
> > UUID seems a bit arbitrary.

Combining replies:

On Tue, 31 Mar 2026 07:17:08 -0400
Anthony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
>
> As stated above, the destination guest will be started with the same
> QEMU command line as the source guest. Within that command line
> will be a '-device' parameter like the following:
>
> -device
> '{"driver":"vfio-ap","id":"hostdev0","sysfsdev":"/sys/bus/mdev/devices/62177883-f1bb-47f0-914d-32a22e3a8804"}
>
> Note that sysfsdev is the path to the mdev named
> 62177883-f1bb-47f0-914d-32a22e3a8804;
> therefore, the mdev with that name must exist on the destination guest or
> the migration will fail with the following error:
>
> error: device not found: mediated device
> '62177883-f1bb-47f0-914d-32a22e3a8804' not found

Then this is a requirement of your tooling, not a kernel requirement, not
something the kernel should care about. QEMU matches devices by their
virtual bus path, not the sysfsdev or host attributes. In the case of
VF migration with vfio-pci variant drivers we cannot require that the
source and target devices exist at the same bus address. Ideally the
pre-copy data from the source device to the target will include relevant
configuration information to validate that the source and target are
compatible, regardless of the uuid.

> >> 2. The AP configuration assigned via the VFIO AP mediated device on both
> >> guests must be compatible. As such, each AP configuration must meet
> >> the following requirements:
> >>
> >> * Both guests must have the same number of APQNs
> >>
> >> * Each APQN assigned to the source guest must also be assigned to the
> >> destination guest
> >>
> >> * Each APQN assigned to both guests must reference an AP queue with the
> >> same hardware capabilities
> > Why isn't this sufficient vs also requiring the same UUID?
>
> I explained why in my previous response.

See above, userspace tooling requirements don't imply kernel
requirement.

> >> Note: There is a forthcoming consumer of this series which will be a QEMU
> >> patch series is entitled:
> >> 'hw/vfio/ap: Implement live guest migration of guests using AP
> >> devices'
> >>
> >> This design also adds a use case for enabling and disabling
> >> migration of guests to which AP devices have been passed through. To
> >> facilitate this, a new read/write sysfs 'migratable' attribute is added to
> >> the mediated device. This attribute specifies whether the vfio device is
> >> migratable (1) or not (0). When the value of this attribute is changed, the
> >> vfio_ap device driver will signal an eventfd to userspace. It is up to
> >> userspace to respond to the change by enabling or disabling migration of
> >> the guest to which the mediated device is attached. The operation will be
> >> rejected with a 'Device or resource busy' message if a migration is in
> >> progress.
> > This seems inherently racy. What happens if the device becomes
> > unmigratable while it's being migrated?
> >
> > If userspace is deciding that the device is now unmigratable, why does
> > it need to signal this through the kernel driver rather than with the
> > userspace orchestration agent? The entire path seems unnecessary.
>
> I am not familiar with what a userspace orchestration agent is, so
> I can't address that. Can you please describe how that would work?

Something in userspace, perhaps libvirt, is managing the VM. It needs
to coordinate with a counterpart on another host managing and
configuring the target VM to accept the migration data stream. There's
likely also another entity that's responsible for deciding this
migration should occur and where to place the target. All of this is
what I'm referring to as orchestration.

The path of a userspace agent writing into a sysfs attribute to mark
the device as becoming non-migratable so that the host driver can send
an interrupt to the VM to poll an INFO ioctl to block migration is an
over-engineered path at the wrong layer versus some sort of RPC to the
managing process, or even to QEMU, to block migrations.

Additionally, if the device actually itself becomes non-migratable, it
can simply fail any migration state transition other than returning to
RUNNING and can generate errors in the data stream if it needs to abort
an in-progress migration.

> Maybe it would help to provide the reason for this. For certain types
> of crypto operations, a master key must be configured for the crypto
> card domain being used. This master key must be synchronized
> between the source and destination crypto device so that in-flight
> crypto operations can be completed during migration. If these master
> keys must be changed, migration must be blocked until the master
> key changes can be synchronized between the source and destination
> system(s).

This sounds like a userspace orchestration problem, not a kernel
problem. It might be a valid choice to use the mechanisms I outline
above to abort an in progress migration if a new master key is
configured during migration, but it's not the kernel's problem to
provide a synchronization point for this through the kernel.

> >> Userspace must also have a means for retrieving the value of the sysfs
> >> 'migratable' attribute when the guest is started to initialize whether it
> >> can be migrated. For this, The VFIO_DEVICE_GET_INFO ioctl is used. The
> >> struct vfio_device_info object passed to the ioctl will be extended with a
> >> capability specifying the vfio device attributes. One of the attributes
> >> will contain the value of the mediated device's 'migratable' attribute.
> > This is just broken, it's redundant to our current device feature
> > mechanism for exposing migration support. If you want the capability
> > to create unmigratable devices statically, can't that be encompassed
> > within the definition of the mdev type? Dynamic migration support just
> > seems like it's involving the kernel in orchestration it shouldn't be a
> > part of.
>
> So, it appears you are suggesting the creation of a new mdev type
> for unmigratable crypto devices. I don't see the value in that.
> As I stated above, there is a valid reason for wanting to prevent
> migration while master key synchronization is taking place.

Then prevent it in userspace.

> If this feature violates the implicit rules of vfio device migration,
> then so be it. Maybe we have to figure out another way to ensure
> migration is not initiated during master key synchronization.

If there's a software entity that has the ability to write to sysfs to
declare that a device is not currently available for migration, give it
the ability to notify whatever entity is coordinating and scheduling,
ie. orchestrating, the migration rather than creating a channel through
the device. Combine that with using the existing mechanisms to abort a
migration if its already in progress.

> If we can't find an acceptable means to do this programmatically,
> then maybe it will come down to a matter of documenting the
> need to ensure migration is not initiated while master key
> synchronization is taking place. This would put the onus on the
> various system administrators responsible for host, guest and
> master key administration to communicate out of band to
> ensure they are all on the same page with regard to migration.
>
> It would be preferable to be able to do this with a userspace
> interface, so any suggestions would be greatly appreciated.

A userspace interface can still exist, I just don't find this path
through the driver to the VM acceptable with this justification.
Mechanisms already exist for the device to refuse a state transition or
generate an error for a migration already in progress. IMHO, it would
be acceptable for the device to block a key change if the migration is
already in progress. If the key change cannot be represented in the
migration data stream, then it's up to the orchestration of the
migration to make sure they stay synchronized, but I don't see that
the vfio uAPI needs to be involved. Thanks,

Alex