Re: [RFC PATCH] vfio/pci: add PCIe TPH to device feature ioctl
From: Alex Williamson
Date: Mon Mar 03 2025 - 22:13:15 EST
On Mon, 3 Mar 2025 04:20:24 +0000
Wathsala Wathawana Vithanage <wathsala.vithanage@xxxxxxx> wrote:
> Hi Alex,
>
> > > Linux v6.13 introduced the PCIe TLP Processing Hints (TPH) feature for
> > > direct cache injection. As described in the relevant patch set [1],
> > > direct cache injection in supported hardware allows optimal platform
> > > resource utilization for specific requests on the PCIe bus. This
> > > feature is currently available only for kernel device drivers.
> > > However, user space applications, especially those whose performance
> > > is sensitive to the latency of inbound writes as seen by a CPU core,
> > > may benefit from using this information (E.g., DPDK cache stashing RFC
> > > [2] or an HPC application running in a VM).
> > >
> > > This patch enables configuring of TPH from the user space via
> > > VFIO_DEVICE_FEATURE IOCLT. It provides an interface to user space
> > > drivers and VMMs to enable/disable the TPH feature on PCIe devices and
> > > set steering tags in MSI-X or steering-tag table entries using
> > > VFIO_DEVICE_FEATURE_SET flag or read steering tags from the kernel
> > > using VFIO_DEVICE_FEATURE_GET to operate in device-specific mode.
> >
> > Unless I'm missing it, the RFC in [2] doesn't make use of this proposal. Is there
> > published code anywhere that does use this interface?
> >
> > > [1]
> > > lore.kernel.org/linux-pci/20241002165954.128085-1-wei.huang2@xxxxxxx
> > > [2]
> > > inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@xxxxxxx
> > >
>
> The DPDK RFC in question is on hold for now because there is no way to get steering-tags
> from the user space.
> Consequently, I had to stop working on [2] and start working on the kernel to get VFIO ready
> for it. This was discussed in a DPDK community call sometime back.
> https://mails.dpdk.org/archives/dev/2024-October/305408.html
I think the userspace and kernel support need to be co-developed, we
don't want to be adding kernel code and uAPIs that don't have users.
> > > Signed-off-by: Wathsala Vithanage <wathsala.vithanage@xxxxxxx>
> > > ---
> > > drivers/vfio/pci/vfio_pci_core.c | 163 +++++++++++++++++++++++++++++++
> > > include/uapi/linux/vfio.h | 68 +++++++++++++
> > > 2 files changed, 231 insertions(+)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci_core.c
> > > b/drivers/vfio/pci/vfio_pci_core.c
> > > index 586e49efb81b..d6dd0495b08b 100644
> > > --- a/drivers/vfio/pci/vfio_pci_core.c
> > > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > > @@ -29,6 +29,7 @@
> > > #include <linux/nospec.h>
> > > #include <linux/sched/mm.h>
> > > #include <linux/iommufd.h>
> > > +#include <linux/pci-tph.h>
> > > #if IS_ENABLED(CONFIG_EEH)
> > > #include <asm/eeh.h>
> > > #endif
> > > @@ -1510,6 +1511,165 @@ static int vfio_pci_core_feature_token(struct
> > vfio_device *device, u32 flags,
> > > return 0;
> > > }
> > >
> > > +static ssize_t vfio_pci_tph_uinfo_dup(struct vfio_pci_tph *tph,
> > > + void __user *arg, size_t argsz,
> > > + struct vfio_pci_tph_info **info) {
> > > + size_t minsz;
> > > +
> > > + if (tph->count > VFIO_TPH_INFO_MAX)
> > > + return -EINVAL;
> > > + if (!tph->count)
> > > + return 0;
> > > +
> > > + minsz = tph->count * sizeof(struct vfio_pci_tph_info);
> > > + if (minsz < argsz)
> > > + return -EINVAL;
> > > +
> > > + *info = memdup_user(arg, minsz);
> > > + if (IS_ERR(info))
> > > + return PTR_ERR(info);
> > > +
> > > + return minsz;
> > > +}
> > > +
> > > +static int vfio_pci_feature_tph_st_op(struct vfio_pci_core_device *vdev,
> > > + struct vfio_pci_tph *tph,
> > > + void __user *arg, size_t argsz) {
> > > + int i, mtype, err = 0;
> > > + u32 cpu_uid;
> > > + struct vfio_pci_tph_info *info = NULL;
> > > + ssize_t data_size = vfio_pci_tph_uinfo_dup(tph, arg, argsz, &info);
> > > +
> > > + if (data_size <= 0)
> > > + return data_size;
> > > +
> > > + for (i = 0; i < tph->count; i++) {
> > > + if (!(info[i].cpu_id < nr_cpu_ids && cpu_present(info[i].cpu_id)))
> > > +{
> >
> > Not intuitive logic, imo. This could easily be:
> >
> > if (info[i].cpu_id >= nr_cpu_ids || !cpu_present(info[i].cpu_id))
> >
>
> Agree, looks more intuitive.
>
> > > + info[i].err = -EINVAL;
> > > + continue;
> > > + }
> > > + cpu_uid = topology_core_id(info[i].cpu_id);
> > > + mtype = (info[i].flags & VFIO_TPH_MEM_TYPE_MASK) >>
> > > + VFIO_TPH_MEM_TYPE_SHIFT;
> > > +
> > > + /* processing hints are always ignored */
> > > + info[i].ph_ignore = 1;
> > > +
> > > + info[i].err = pcie_tph_get_cpu_st(vdev->pdev, mtype, cpu_uid,
> > > + &info[i].st);
> > > + if (info[i].err)
> > > + continue;
> > > +
> > > + if (tph->flags & VFIO_DEVICE_FEATURE_TPH_SET_ST) {
> > > + info[i].err = pcie_tph_set_st_entry(vdev->pdev,
> > > + info[i].index,
> > > + info[i].st);
> > > + }
> > > + }
> > > +
> > > + if (copy_to_user(arg, info, data_size))
> > > + err = -EFAULT;
> > > +
> > > + kfree(info);
> > > + return err;
> > > +}
> > > +
> > > +
> > > +static int vfio_pci_feature_tph_enable(struct vfio_pci_core_device *vdev,
> > > + struct vfio_pci_tph *arg)
> > > +{
> > > + int mode = arg->flags & VFIO_TPH_ST_MODE_MASK;
> > > +
> > > + switch (mode) {
> > > + case VFIO_TPH_ST_NS_MODE:
> > > + return pcie_enable_tph(vdev->pdev, PCI_TPH_ST_NS_MODE);
> > > +
> > > + case VFIO_TPH_ST_IV_MODE:
> > > + return pcie_enable_tph(vdev->pdev, PCI_TPH_ST_IV_MODE);
> > > +
> > > + case VFIO_TPH_ST_DS_MODE:
> > > + return pcie_enable_tph(vdev->pdev, PCI_TPH_ST_DS_MODE);
> > > +
> > > + default:
> > > + return -EINVAL;
> > > + }
> > > +
> > > +}
> > > +
> > > +static int vfio_pci_feature_tph_disable(struct vfio_pci_core_device
> > > +*vdev) {
> > > + pcie_disable_tph(vdev->pdev);
> > > + return 0;
> > > +}
> > > +
> > > +static int vfio_pci_feature_tph_prepare(struct vfio_pci_tph __user *arg,
> > > + size_t argsz, u32 flags,
> > > + struct vfio_pci_tph *tph)
> > > +{
> > > + u32 op;
> > > + int err = vfio_check_feature(flags, argsz,
> > > + VFIO_DEVICE_FEATURE_SET |
> > > + VFIO_DEVICE_FEATURE_GET,
> > > + sizeof(struct vfio_pci_tph));
> > > + if (err != 1)
> > > + return err;
> >
> > We don't seem to account for a host booted with pci=notph.
> >
>
> pcie_enable_tph() looks for pci=notph and fails if it's set.
> If pcie_enable_tph() fails, pcie_tph_get_cpu_st() and
> pcie_tph_set_st_entry() fail too.
>
> Is it required to check pci=notph here as well?
Does it make sense to probe affirmatively for a feature that can't be
enabled? I think I previously also overlooked that we never actually
check that the device supports TPH at probe time.
> > > +
> > > + if (copy_from_user(tph, arg, sizeof(struct vfio_pci_tph)))
> > > + return -EFAULT;
> > > +
> > > + op = tph->flags & VFIO_DEVICE_FEATURE_TPH_OP_MASK;
> > > +
> > > + switch (op) {
> > > + case VFIO_DEVICE_FEATURE_TPH_ENABLE:
> > > + case VFIO_DEVICE_FEATURE_TPH_DISABLE:
> > > + case VFIO_DEVICE_FEATURE_TPH_SET_ST:
> > > + return (flags & VFIO_DEVICE_FEATURE_SET) ? 0 : -EINVAL;
> > > +
> > > + case VFIO_DEVICE_FEATURE_TPH_GET_ST:
> > > + return (flags & VFIO_DEVICE_FEATURE_GET) ? 0 : -EINVAL;
> >
> > This is a convoluted mangling of an ioctl into a vfio feature.
> >
>
> Do you prefer all four operations to fold under a single ioctl? Or split them
> such that enabling and disabling TPH remains a VFIO SET feature while
> SET_ST and GET_ST moved to a regular ioctl?
Splitting the functionality between a feature and a new ioctl doesn't
make any sense. As I noted, I can imagine this interface being two
related features where one allows the TPH state to be set enabled or
disabled and the other allows setting steering tags.
> > > +
> > > + default:
> > > + return -EINVAL;
> > > + }
> > > +}
> > > +
> > > +static int vfio_pci_core_feature_tph(struct vfio_device *device, u32 flags,
> > > + struct vfio_pci_tph __user *arg,
> > > + size_t argsz)
> > > +{
> > > + u32 op;
> > > + struct vfio_pci_tph tph;
> > > + void __user *uinfo;
> > > + size_t infosz;
> > > + struct vfio_pci_core_device *vdev =
> > > + container_of(device, struct vfio_pci_core_device, vdev);
> > > + int err = vfio_pci_feature_tph_prepare(arg, argsz, flags, &tph);
> > > +
> > > + if (err)
> > > + return err;
> > > +
> > > + op = tph.flags & VFIO_DEVICE_FEATURE_TPH_OP_MASK;
> > > +
> > > + switch (op) {
> > > + case VFIO_DEVICE_FEATURE_TPH_ENABLE:
> > > + return vfio_pci_feature_tph_enable(vdev, &tph);
> > > +
> > > + case VFIO_DEVICE_FEATURE_TPH_DISABLE:
> > > + return vfio_pci_feature_tph_disable(vdev);
> > > +
> > > + case VFIO_DEVICE_FEATURE_TPH_GET_ST:
> > > + case VFIO_DEVICE_FEATURE_TPH_SET_ST:
> > > + uinfo = (u8 *)(arg) + offsetof(struct vfio_pci_tph, info);
> > > + infosz = argsz - sizeof(struct vfio_pci_tph);
> > > + return vfio_pci_feature_tph_st_op(vdev, &tph, uinfo, infosz);
> >
> > This is effectively encoding a regular ioctl as a feature. See below.
> >
>
> Addressed this in the above comment.
>
> > > +
> > > + default:
> > > + return -EINVAL;
> > > + }
> > > +}
> > > +
> > > int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
> > > void __user *arg, size_t argsz)
> > > {
> > > @@ -1523,6 +1683,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device
> > *device, u32 flags,
> > > return vfio_pci_core_pm_exit(device, flags, arg, argsz);
> > > case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
> > > return vfio_pci_core_feature_token(device, flags, arg, argsz);
> > > + case VFIO_DEVICE_FEATURE_PCI_TPH:
> > > + return vfio_pci_core_feature_tph(device, flags,
> > > + arg, argsz);
> > > default:
> > > return -ENOTTY;
> > > }
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index c8dbf8219c4f..608d57dfe279 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -1458,6 +1458,74 @@ struct vfio_device_feature_bus_master { };
> > > #define VFIO_DEVICE_FEATURE_BUS_MASTER 10
> > >
> > > +/*
> > > + * Upon VFIO_DEVICE_FEATURE_SET, enable or disable PCIe TPH or set
> > > +steering tags
> > > + * on the device. Data provided when setting this feature is a __u32
> > > +with the
> > > + * following flags. VFIO_DEVICE_FEATURE_TPH_ENABLE enables PCIe TPH
> > > +in
> > > + * no-steering-tag, interrupt-vector, or device-specific mode when
> > > +feature flags
> > > + * VFIO_TPH_ST_NS_MODE, VFIO_TPH_ST_IV_MODE, and
> > VFIO_TPH_ST_DS_MODE
> > > +are set
> > > + * respectively.
> > > + * VFIO_DEVICE_FEATURE_TPH_DISABLE disables PCIe TPH on the device.
> > > + * VFIO_DEVICE_FEATURE_TPH_SET_ST set steering tags on a device at an
> > > +index in
> > > + * MSI-X or ST-table depending on the VFIO_TPH_ST_x_MODE flag used
> > > +and device
> > > + * capabilities. The caller can set multiple steering tags by passing
> > > +an array
> > > + * of vfio_pci_tph_info objects containing cpu_id, cache_level, and
> > > + * MSI-X/ST-table index. The caller can also set the intended memory
> > > +type and
> > > + * the processing hint by setting VFIO_TPH_MEM_TYPE_x and
> > > +VFIO_TPH_HINT_x flags,
> > > + * respectively. The return value for each vfio_pci_tph_info object
> > > +is stored in
> > > + * err, with the steering-tag set on the device and the ph_ignore
> > > +status bit
> > > + * resulting from the steering-tag lookup operation. If err < 0, the
> > > +values
> > > + * stored in the st and ph_ignore fields should be considered invalid.
> > > + *
> >
> > Sorry, I don't understand ph_ignore as described here. It's only ever set to 1. I
> > guess we assume the incoming state is zero. But what does it mean?
> >
>
> Agree, it's confusing and it has something to do with the current implementation
> of TPH in the Kernel.
> PCIe firmware specification states root-port ACPI _DSM as the single source of
> truth for steering-tags. Its job is to maintain a table of steering-tags indexed by
> a tuple of CPU/Container UID, Cache-id and a processing-hint>. Each tuple is mapped
> to two steering-tags for persistent or volatile memory (either or both depending on
> the platform).
> A caller looking to acquire a steering tag for a cache closer to a CPU will pass above
> tuple in the format defined in the PCIe firmware specification.
> But PCIe spec also allows implementing TPH without processing-hints, in such cases
> the _DSM is supposed to set ph_ignore bit in the return structure.
> Current TPH implementation in the Linux kernel ignores some of these details,
> it sets cache-id and the processing-hint in the above tuple to zeros. It also ignores the
> ph_ignore bit return by the _DSM as well.
> However, I wrote this RFC to match the PCI firmware specification, therefore it sets
> ph_ignore bit to 1 to inform the caller that it is ignored so that caller can make an
> informed decision to stick with the default (bidirectional hint) or exit with error.
> This is why the cache_level is ignored as well, ideally cache levels (0 for L1D, 1 for L2D,
> Etc.) should be converted to a PPTT cache_id and passed into the _DSM.
> I'm working on a separate PCI patch to fix above issues.
>
> > > + * Upon VFIO_DEVICE_FEATURE_GET, return steering tags to the caller.
> > > + * VFIO_DEVICE_FEATURE_TPH_GET_ST returns steering tags to the caller.
> > > + * The return values per vfio_pci_tph_info object are stored in the
> > > + st,
> > > + * ph_ignore, and err fields.
> >
> > Why do we need to provide an interface to return the steering tags set by the
> > user? Seems like this could be a write-only, SET, interface.
> >
>
> VFIO_DEVICE_FEATURE_TPH_GET_ST calls the _DSM to read the steering-tag
> for a vector of tuples of a cpu-id, cache-id, processing-hint. It is for devices that operate
> in device specific mode where they don't set steering-tags in the MSI-X or ST tables but
> in another construct like a queue context accessible from the user/kernel space.
> For instance, this is what will be used by DPDK poll mode drivers as [2] fleshes
> out.
>
> So, VFIO_DEVICE_FEATURE_TPH_GET_ST doesn't return steering-tags set by the
> user. It's there to read the platform database of steering-tags which is an ACPI _DSM
> bound each PCIe root port.
>
> > > + */
> > > +struct vfio_pci_tph_info {
> >
> > This seems more of an _entry than an _info. Note that vfio has various INFO
> > ioctls which make this nomenclature confusing.
> >
>
> It is, it's more of a struct that describes the request.
> Perhaps vfio_pci_tph_request_descriptor, but that's too long.
> Suggestions are welcome.
>
>
> > > + /* in */
> > > + __u32 cpu_id;
> >
> > The logical ID?
> >
> Yes, this is logical ID.
>
> > > + __u32 cache_level;
> >
> > Zero based? 1-based? How many cache levels are there? What's valid here?
> >
> It's Zero based. This is currently ignored due to limitations in the kernel TPH.
> One way to validate would be iterating through the topology to find it, decrementing
> a copy_of_cache_level as we move further away from the cpu_id until the value
> reaches zero. If loop terminates before copy_of_cache_level reaching zero, -EINVAL.
> Is that a good approach?
Sounds like a try and hope for the best interface. How would userspace
realistically infer the cache levels? I don't think we want vfio
describing that.
> > > + __u8 flags;
> > > +#define VFIO_TPH_MEM_TYPE_MASK 0x1
> > > +#define VFIO_TPH_MEM_TYPE_SHIFT 0
> > > +#define VFIO_TPH_MEM_TYPE_VMEM 0 /* Request
> > volatile memory ST */
> > > +#define VFIO_TPH_MEM_TYPE_PMEM 1 /* Request
> > persistent memory ST */
> >
> > Is there a relation to the cache_level here? Spec references here and below,
> > assuming those are relevant to these values.
> >
> There is no relation to cache level. Return from the _DSM has PMEM and VMEM
> sections. If either of those are not supported a valid bit is set to 0 in the return.
>
> PCIe spec has four processing-hints, VFIO_TPH_HINT_* are there to set them
> in the flags field. Hints are forced to VFIO_TPH_HINT_BIDIR by current TPH
> implementation as I described above.
>
> > > +
> > > +#define VFIO_TPH_HINT_MASK 0x3
> > > +#define VFIO_TPH_HINT_SHIFT 1
> > > +#define VFIO_TPH_HINT_BIDIR 0
> > > +#define VFIO_TPH_HINT_REQSTR (1 << VFIO_TPH_HINT_SHIFT)
> > > +#define VFIO_TPH_HINT_TARGET (2 << VFIO_TPH_HINT_SHIFT)
> > > +#define VFIO_TPH_HINT_TARGET_PRIO (3 << VFIO_TPH_HINT_SHIFT)
> >
> > There needs to be a __u8 padding in here somewhere or flags extended to
> > __u16.
> >
> > > + __u16 index; /* MSI-X/ST-table index to set ST */
> > > + /* out */
> > > + __u16 st; /* Steering-Tag */
> >
> > Sorry if I'm just not familiar with TPH, but why do we need to return the ST?
> > Doesn't hardware make use of the ST index and the physical value gets applied
> > automatically in HW?
> >
>
> Device specific mode requires this. For instance, intel E810 NIC can set
> Steering-tags in queue context which is in user-space when used with DPDK.
> For such use cases we must return steering-tags read from the _DSM.
> Going back to DPDK RFC mentioned here, if TPH gets enabled in VFIO, the E810
> poll mode driver in the DPDK will ask the kernel a steering-tag for a combination
> of a cpu-id, a cache-level and a processing-hint. In response the kernel will
> invoke the ACPI _DSM for the root port of the device via pcie_tph_get_cpu_st()
> and return the steering-tag back to the user. E810 driver will set the returned tag
> in appropriate queue context.
>
> For cases where steering-tag is not required in user-space (VMMs), caller asks the
> kernel to set steering-tag that corresponds the cpu-id, cache-level, and PH
> combination at a given MSI-X vector entry or ST table in config space. For such
> cases too, the kernel will read the steering-tag from the _DSM and set the tag
> in the corresponding MSI-X or ST table entry.
We need to consider that there are vfio-pci variant drivers that
support migration and make use of the vfio-pci-core feature interface.
TPH programming appears to be very non-migration friendly, the
attributes are very specific to the current host system. Should
migration and TPH be mutually exclusive? This may be a factor in the
decision to use a feature or ioctl.
> > > + __u8 ph_ignore; /* Processing hint was ignored by */
> >
> > Padding/alignment, same as above. "ignored by..." By what? Is that an error?
> >
>
> An error. "Processing hint was ignored by the platform"
>
> > > + __s32 err; /* Error on getting/setting Steering-
> > Tag*/
> > > +};
> >
> > Generally we'd expect a system call either works or fails. Why do we need per
> > entry error report? Can't we validate and prepare the entire operation before
> > setting any of it into the device? Ultimately we're just writing hints to config
> > space or MSI-X table space, so the write operation itself is not likely to be the
> > point of failure.
> >
>
> Write to MSI-X won't fail but reading the steering-tag from the _DSM for an
> invalid <cpu_id, cach_id, ph_hint> combo can fail in both GET_ST and SET_ST
> cases.
> We can change this to an all or nothing interface, such that success if all the
> entries are valid and otherwise if at least one is invalid. But in that case the
> caller may not know which entries were invalid. If you think that's ok, I can
> change it.
It's not exactly uncommon for an ioctl to fail for a single bad arg
among many. I think it's harder for userspace to deal with recovering
from a partially implemented call.
> > > +
> > > +struct vfio_pci_tph {
> > > + __u32 argsz; /* Size of vfio_pci_tph and info[] */
> > > + __u32 flags;
> > > +#define VFIO_DEVICE_FEATURE_TPH_OP_MASK 0x7
> > > +#define VFIO_DEVICE_FEATURE_TPH_OP_SHIFT 3
> > > +#define VFIO_DEVICE_FEATURE_TPH_ENABLE 0 /*
> > Enable TPH on device */
> > > +#define VFIO_DEVICE_FEATURE_TPH_DISABLE 1 /* Disable TPH
> > on device */
> > > +#define VFIO_DEVICE_FEATURE_TPH_GET_ST 2 /* Get
> > steering-tags */
> > > +#define VFIO_DEVICE_FEATURE_TPH_SET_ST 4 /* Set
> > steering-rags */
> >
> > s/rags/tags/
> >
> > vfio device features already have GET and SET as part of their base structure, why
> > are they duplicated here? It really seems like there are two separate features
> > here, one that allows enabling with a given mode or disable, and another that
> > allows writing specific steering tags. Both seem like they could be SET-only
> > features. It's also not clear to me that there's a significant advantage to providing
> > an array of steering tags. Isn't updating STs an infrequent operation and likely
> > bound to at most 2K tags in the case of MSI-X?
> >
>
> VFIO_DEVICE_FEATURE_TPH_ENABLE and VFIO_DEVICE_FEATURE_TPH_DISABLE
> are SET features.
> Since, VFIO_DEVICE_FEATURE_TPH_SET_ST requires writing to ST table, so that too
> falls under SET features.
> Therefore, these flags are only valid when VFIO_DEVICE_FEATURE_SET flag is set.
> The only GET features use case here is VFIO_DEVICE_FEATURE_TPH_GET_ST that
> reads the steering-tags from the _DSM and returns it back to caller. Only valid
> with VFIO_DEVICE_FEATURE_GET flag.
> These are checked in vfio_pci_feature_tph_prepare().
>
> As I mentioned earlier, does it make sense to leave enable/disable under VFIO
> Feature and move GET_ST and SET_ST to a separate ioctl?
Splitting TPH between a feature and new ioctls doesn't seem like a
logical interface.
> There isn't much rational to providing an array of tuples other than following
> the format VFIO_DEVICE_SET_IRQS that sets eventfds.
>
> > > +
> > > +#define VFIO_TPH_ST_MODE_MASK (0x3 <<
> > VFIO_DEVICE_FEATURE_TPH_OP_SHIFT)
> > > +#define VFIO_TPH_ST_NS_MODE (0 <<
> > VFIO_DEVICE_FEATURE_TPH_OP_SHIFT)
> > > +#define VFIO_TPH_ST_IV_MODE (1 <<
> > VFIO_DEVICE_FEATURE_TPH_OP_SHIFT)
> > > +#define VFIO_TPH_ST_DS_MODE (2 <<
> > VFIO_DEVICE_FEATURE_TPH_OP_SHIFT)
> > > + __u32 count; /* Number of entries in info[]
> > */
> > > + struct vfio_pci_tph_info info[];
> > > +#define VFIO_TPH_INFO_MAX 64 /* Max entries allowed
> > in info[] */
> >
> > This seems to match the limit if the table is located in extended config space, but
> > it's not particularly relevant if the table is located in MSI-X space. Why is this a
> > good limit?
>
> Yes, for MSI-X the caller may have to invoke the ioctl multiple times if more
> than 64 entries need to be updated.
>
> >
> > Also, try to keep these all in 80 column. Thanks,
> >
>
> I will, try to keep these under 80 columns.
>
> Thanks Alex, for your input. Let me know what you think about moving SET_ST
> and GET_ST to an ioctl and keep ENABLE/DISABLE TPH as a vfio feature.
It seems like kind of a hodgepodge to split the interface like that,
imo. Sorry for a less that complete reply, I'm on PTO this week.
Thanks,
Alex