Re: [PATCH v6 2/3] vfio/ism: Implement vfio_pci driver for ISM devices
From: Julian Ruess
Date: Fri Mar 20 2026 - 09:39:43 EST
On Fri Mar 20, 2026 at 1:54 PM CET, Niklas Schnelle wrote:
> On Thu, 2026-03-19 at 16:42 +0100, Julian Ruess wrote:
>> Add a vfio_pci variant driver for the s390-specific Internal Shared
>> Memory (ISM) devices used for inter-VM communication.
>>
>> This enables the development of vfio-pci-based user space drivers for
>> ISM devices.
>>
>> On s390, kernel primitives such as ioread() and iowrite() are switched
>> over from function handle based PCI load/stores instructions to PCI
>> memory-I/O (MIO) loads/stores when these are available and not
>> explicitly disabled. Since these instructions cannot be used with ISM
>> devices, ensure that classic function handle-based PCI instructions are
>> used instead.
>>
>> The driver is still required even when MIO instructions are disabled, as
>> the ISM device relies on the PCI store block (PCISTB) instruction to
>> perform write operations.
>>
>> Stores are not fragmented, therefore one ioctl corresponds to exactly
>> one PCISTB instruction. User space must ensure to not write more than
>> 4096 bytes at once to an ISM BAR which is the maximum payload of the
>> PCISTB instruction.
>>
>> Signed-off-by: Julian Ruess <julianr@xxxxxxxxxxxxx>
>> ---
>> drivers/vfio/pci/Kconfig | 2 +
>> drivers/vfio/pci/Makefile | 2 +
>> drivers/vfio/pci/ism/Kconfig | 10 ++
>> drivers/vfio/pci/ism/Makefile | 3 +
>> drivers/vfio/pci/ism/main.c | 400 ++++++++++++++++++++++++++++++++++++++++++
>> 5 files changed, 417 insertions(+)
>>
>
> As a general note, ISM devices do not support any low power/sleep modes
> which is why there is no PM handling done.
>
>> +
>> +/*
>> + * ism_vfio_pci_do_io_w()
>> + *
>> + * Ensure that the PCI store block (PCISTB) instruction is used as required by the
>> + * ISM device. The ISM device also uses a 256 TiB BAR 0 for write operations,
>> + * which requires a 48bit region address space (ISM_VFIO_PCI_OFFSET_SHIFT).
>> + */
>> +static ssize_t ism_vfio_pci_do_io_w(struct vfio_pci_core_device *vdev,
>> + char __user *buf, loff_t off, size_t count,
>> + int bar)
>> +{
>> + struct zpci_dev *zdev = to_zpci(vdev->pdev);
>> + struct ism_vfio_pci_core_device *ivpcd;
>> + ssize_t ret;
>> + void *data;
>> + u64 req;
>> +
>> + if (count > zdev->maxstbl)
>> + return -EINVAL;
>> + if (((off % PAGE_SIZE) + count) > PAGE_SIZE)
>> + return -EINVAL;
>> +
>> + ivpcd = container_of(vdev, struct ism_vfio_pci_core_device,
>> + core_device);
>> + data = kmem_cache_alloc(ivpcd->store_block_cache, GFP_KERNEL);
>> + if (!data)
>> + return -ENOMEM;
>> +
>> + if (copy_from_user(data, buf, count)) {
>> + ret = -EFAULT;
>> + goto out_free;
>> + }
>> +
>> + req = ZPCI_CREATE_REQ(READ_ONCE(zdev->fh), bar, count);
>> + ret = __zpci_store_block(data, req, off);
>
> Note: There is a Sashiko finding that PCI Store Block needs 8 byte
> alignment and we don't ensure that off is 8 byte aligned here. While
> generally true the ISM device has relaxed alignment rules and this
> requirement does not apply. That said we should set PAGE_SIZE alignment
> on the kmem_cache such that data is guaranteed PAGE_SIZE aligned and
> the page crossing check works correctly.
>
>> + if (ret)
>> + goto out_free;
>> +
>> + ret = count;
>> +
>> +out_free:
>> + kmem_cache_free(ivpcd->store_block_cache, data);
>> + return ret;
>> +}
>> +
> --- snip ---
>
>> +
>> +static int ism_vfio_pci_init_dev(struct vfio_device *core_vdev)
>> +{
>> + struct zpci_dev *zdev = to_zpci(to_pci_dev(core_vdev->dev));
>> + struct ism_vfio_pci_core_device *ivpcd;
>> + char cache_name[20];
>> + int ret;
>> +
>> + ivpcd = container_of(core_vdev, struct ism_vfio_pci_core_device,
>> + core_device.vdev);
>> +
>> + snprintf(cache_name, sizeof(cache_name), "ism_sb_fid_%08x", zdev->fid);
>> + ivpcd->store_block_cache =
>> + kmem_cache_create(cache_name, zdev->maxstbl, 0, 0, NULL);
>
> Sashiko notes here that the cache_name is stack allocated and says that
> kmem_cache_create() doesn't copy it. Looking at
> __kmem_cache_create_args() however this seems to be not true as that
> does krstrdup_const() on the passed-in name.
I think we should change this to:
ivpcd->store_block_cache = kmem_cache_create(
cache_name, zdev->maxstbl, PAGE_SIZE,
(SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT | SLAB_PANIC), NULL);
The alignment ensures that we do not cross the integral boundary, and as
Sashiko said, memcg accounting is also useful here.
-- snip --