Re: [PATCH v8 2/3] vfio/ism: Implement vfio_pci driver for ISM devices
From: Niklas Schnelle
Date: Mon Mar 30 2026 - 14:19:22 EST
On Mon, 2026-03-30 at 09:36 -0600, Alex Williamson wrote:
> On Fri, 27 Mar 2026 15:53:30 +0100
> Niklas Schnelle <schnelle@xxxxxxxxxxxxx> wrote:
>
> > On Thu, 2026-03-26 at 12:05 -0700, Farhan Ali wrote:
> > > On 3/26/2026 6:03 AM, Niklas Schnelle wrote:
> > > > On Wed, 2026-03-25 at 14:31 +0100, Julian Ruess wrote:
> > > > > Add a vfio_pci variant driver for the s390-specific Internal Shared
> > > > > Memory (ISM) devices used for inter-VM communication.
> > > > >
> > > > > This enables the development of vfio-pci-based user space drivers for
> > > > > ISM devices.
> > > > >
> > > > > On s390, kernel primitives such as ioread() and iowrite() are switched
> > > > > over from function-handle-based PCI load/stores instructions to PCI
> > > > > memory-I/O (MIO) loads/stores when these are available and not
> > > > > explicitly disabled. Since these instructions cannot be used with ISM
> > > > > devices, ensure that classic function-handle-based PCI instructions are
> > > > > used instead.
> > > > >
> > > > > The driver is still required even when MIO instructions are disabled, as
> > > > > the ISM device relies on the PCI store block (PCISTB) instruction to
> > > > > perform write operations.
> > > > >
> > > > > Stores are not fragmented, therefore one ioctl corresponds to exactly
> > > > > one PCISTB instruction. User space must ensure to not write more than
> > > > > 4096 bytes at once to an ISM BAR which is the maximum payload of the
> > > > > PCISTB instruction.
> > > > >
> > > > > Reviewed-by: Alexandra Winter <wintera@xxxxxxxxxxxxx>
> > > > > Reviewed-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
> > > > > Signed-off-by: Julian Ruess <julianr@xxxxxxxxxxxxx>
> > > > > ---
> > > > > drivers/vfio/pci/Kconfig | 2 +
> > > > > drivers/vfio/pci/Makefile | 2 +
> > > > > drivers/vfio/pci/ism/Kconfig | 10 ++
> > > > > drivers/vfio/pci/ism/Makefile | 3 +
> > > > > drivers/vfio/pci/ism/main.c | 408 ++++++++++++++++++++++++++++++++++++++++++
> > > > > 5 files changed, 425 insertions(+)
> > > > >
> > > > --- snip ---
> > > > > +
> > > > > +static int ism_vfio_pci_ioctl_get_region_info(struct vfio_device *core_vdev,
> > > > > + struct vfio_region_info *info,
> > > > > + struct vfio_info_cap *caps)
> > > > > +{
> > > > > + struct vfio_pci_core_device *vdev =
> > > > > + container_of(core_vdev, struct vfio_pci_core_device, vdev);
> > > > > + struct pci_dev *pdev = vdev->pdev;
> > > > > +
> > > > > + switch (info->index) {
> > > > > + case VFIO_PCI_CONFIG_REGION_INDEX:
> > > > > + info->offset = ISM_VFIO_PCI_INDEX_TO_OFFSET(info->index);
> > > > > + info->size = pdev->cfg_size;
> > > > > + info->flags = VFIO_REGION_INFO_FLAG_READ |
> > > > > + VFIO_REGION_INFO_FLAG_WRITE;
> > > > > + break;
> > > > > + case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
> > > > > + info->offset = ISM_VFIO_PCI_INDEX_TO_OFFSET(info->index);
> > > > > + info->size = pci_resource_len(pdev, info->index);
> > > > > + if (!info->size) {
> > > > > + info->flags = 0;
> > > > > + break;
> > > > > + }
> > > > > + info->flags = VFIO_REGION_INFO_FLAG_READ |
> > > > > + VFIO_REGION_INFO_FLAG_WRITE;
> > > > > + break;
> > > > > + default:
> > > > > + info->offset = 0;
> > > > > + info->size = 0;
> > > > > + info->flags = 0;
> > > > > + return -EINVAL;
> > > > Thinking more about this, the above default handling actually breaks
> > > > additional regions such as the one added by Farhan for the error
> > > > events. I think this needs to instead call the generic
> > > > vfio_pci_ioctl_get_region_info() for other regions.
> > >
> > > Note for error events we are using a VFIO device feature and not a
> > > region. At this point we actually don't have additional regions. I do
> > > agree that if we were to add a zpci specific region we would need to
> > > update here and ism_vfio_pci_rw() to allow region read/write ops. This
> > > ISM driver would also need to support all the possible callbacks in
> > > struct vfio_pci_regops.
> > >
> > > Given that we don't have any additional regions yet, I am conflicted if
> > > it makes sense to add all the code to support additional regions without
> > > a good way to test it. Maybe its something we can defer when if we add
> > > additional regions? Thanks
> > >
> > > Farhan
> >
> > @Alex, we experimented a bit and it turns out with the custom
> > ISM_VFIO_PCI_OFFSET_SHIFT it becomes quite tricky to reliably call into
> > common vfio-pci code for handling other regions as that again relies on
> > the normal VFIO_PCI_OFFSET_SHIFT.
> >
> > So we came up with two options:
> >
> > 1) vfio-pci-ism will only support this basic set of regions. It seems
> > to us that new extensions like Farhan's error events should use device
> > features anyway, so this to us seems like an acceptable limitation and
> > would essentially mean we take this patch as is.
> >
> > 2) We revisit the original idea of adjusting VFIO_PCI_OFFSET_SHIFT
> > globally but do so only for CONFIG_64BIT so as not to break 32 bit
> > platforms. This would simplify vfio-pci-ism and cause less code
> > duplication but also affects everyone else.
> >
> > Either option works for us, so I'm hoping for your input.
>
> There's risk involved with changing the default shift. The fear is
> there's userspace drivers that hard code the shift. DPDK was even such
> a user at one point, iirc. Maybe it's ok to break such users, maybe
> there are actually no such users left and it's all FUD at this point.
> Either way, I have a hard time justifying that risk for a single,
> obscure S390 device.
>
> Rather than using CONFIG_64BIT as a basis for using a different
> default, should we start out with a more narrow scope of CONFIG_S390?
>
> It's likely inevitable that we'll hit this wall in general, but maybe
> S390 can be the "crash test dummy" to prove to userspace that they
> really need to use the uAPI rather than hard coded values.
I fear we'd not be a very good crash test dummy / canary in the coal
mine. We don't have DPDK support, nor a good selection of devices.
Also, apart from us developers almost our entire user base is on slow
moving enterprise distributions running critical but also slowly
updated enterprise workloads. If we want to do this long term maybe a
better approach would be a specific experimental/sanitizing Kconfig
option? As a user of a fast moving rolling distribution on my private
systems I'm all for trying things out in the right environments and
maybe such a distro would be willing to play canary ;)
>
> Farhan's series is a bit of a straw-man since it doesn't actually
> convey the error codes via regions, but I can certainly agree that
> using a unique region shift becomes an ongoing burden relative to
> vfio-pci-core helpers. I can accept an S390 specific (for now) change
> to the region spacing if that's the most sensible path. Thanks,
>
> Alex
Yes, as Farhan noted this was a false alarm. We did actually have a
patch in internal review that did use an extra region but we already
decided that this should instead take the same approach as Farhan's
work.
So overall I think I think we should go with the first option.
We don't know of any extra regions we'll have to support and we can
document that this is not supported for vfio-pci-ism. Then if we do
need it we can still revisit option 2.
Thanks,
Niklas