Re: [PATCH v4 3/4] vfio-pci/zdev: Add VFIO FMB device features
From: Omar Elghoul
Date: Mon Jun 22 2026 - 17:30:43 EST
On 6/22/26 4:22 PM, Alex Williamson wrote:
On Fri, 12 Jun 2026 14:10:47 -0400
Omar Elghoul <oelghoul@xxxxxxxxxxxxx> wrote:
+int vfio_pci_zdev_feature_fmb_read(struct vfio_pci_core_device *vdev, u32 flags,
+ void __user *arg, size_t argsz)
+{
+ struct zpci_dev *zdev;
+ struct vfio_device_feature_zpci_fmb_read fmb_read;
+ int ret;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, sizeof(fmb_read));
+ if (ret != 1)
+ return ret;
+
+ zdev = to_zpci(vdev->pdev);
+ if (!zdev)
+ return -ENODEV;
+
+ guard(mutex)(&zdev->fmb_lock);
+
+ if (!zdev->fmb)
+ return -ENOMSG;
+ if (copy_from_user(&fmb_read, arg, sizeof(fmb_read)))
No need to do this or the test below under mutex.
+ return -EFAULT;
+ if (!fmb_read.data)
+ return -EINVAL;
+
+ if (copy_to_user((struct zpci_fmb __user *) fmb_read.data, zdev->fmb, zdev->fmb_length))
The v3 comment noted we could do this, but really keeping the buffer
and doing the copy_to_user after dropping the mutex is really the
better option. Sashiko also notes[1] this as a high severity issue.
Should also use a u64_to_user_ptr() on the user data pointer.
[1]https://sashiko.dev/#/message/20260612182854.97E641F000E9%40smtp.kernel.org
Acked.
+ return -EFAULT;
+
+ return 0;
+}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5de618a3a5ee..97e0f857fe4f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,35 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
These next feature indexes are in contention, so we need to think about
how this gets merged; the whole thing through the vfio tree, the s390
bits through s390 tree with a branch exposed for me to merge to vfio
before applying this change, or the whole series applied to a clean
branch and merged into both the s390 and vfio next branches. The first
two options give me the most leniency in adjusting feature indexes
based on what's already been merged at the time.
I'm flexible on the merging. Carrying the assumption that the VFIO patch
depends on the s390 patches (after I make the amendments below), I
believe option 2 would work best, but I'm happy to work with whichever
approach you prefer.
+/**
+ * Upon VFIO_DEVICE_FEATURE_SET, enable or disable FMB for the VFIO zPCI device.
+ *
+ * enabled is treated as a bool, so any non-zero value evaluates to true. This
+ * feature fails on attempt to double enable/disable.
Same inconsistency noted on patch 2, it seems that it starts out
enabled.
Acked, will amend it back to allowing the re-enable and appropriately
document the behavior.
+ *
+ * Returns: 0 on success, -1 and errno set appropriately on error.
+ */
+#define VFIO_DEVICE_FEATURE_ZPCI_FMB_ENABLE 13
+
+struct vfio_device_feature_zpci_fmb_enable {
+ __u8 enabled;
+};
+
+/**
+ * Upon VFIO_DEVICE_FEATURE_GET, provide FMB passthrough for VFIO zPCI devices.
+ *
+ * The user-provided buffer must be at least fmb_length large, where fmb_length
+ * is reported in VFIO_DEVICE_INFO_CAP_ZPCI_BASE.
Is there a spec reference for the opaque data blob provided, or at
least reference to kernel structure documenting the layout defined by
some firmware reference?
The structure (struct zpci_fmb) is defined in the kernel under
arch/s390/include/asm/pci.h.
+ *
+ * Returns: 0 on success, -1 and errno set appropriately on error. errno==ENOMSG
+ * when the FMB is not enabled.
This sounds like a user sequencing error, so should it simply be EINVAL
or ENXIO? ENOMSG almost sounds like we're tracking the samples field
to make sure the user hasn't already read the current sample. Along
The user should be allowed to read the same sample more than once. I
also think ENOMSG makes sense here because its semantics are reasonable
(i.e., there's no FMB "message" to read because the FMB is disabled.)
That said ENXIO probably makes for a good second choice here, more so
than EINVAL.
those same lines, should this document some mechanism for dealing with
torn samples since we might be relaying the sample structure mid-update?
There is an "update-in-progress" bit in the structure, but I don't think
that's within the scope of the uAPI since we're already treating it like
an opaque blob.
Thanks.
Thanks,
Alex