Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

From: Reinette Chatre
Date: Wed Feb 07 2024 - 18:30:38 EST


Hi Alex,

On 2/6/2024 3:19 PM, Alex Williamson wrote:
> On Tue, 6 Feb 2024 14:22:04 -0800
> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
>> On 2/6/2024 2:03 PM, Alex Williamson wrote:
>>> On Tue, 6 Feb 2024 13:46:37 -0800
>>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
>>>> On 2/5/2024 2:35 PM, Alex Williamson wrote:
>>>>> On Thu, 1 Feb 2024 20:57:09 -0800
>>>>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
>>>>
>>>> ..
>>>>
>>>>>> @@ -715,13 +724,13 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
>>>>>> if (is_intx(vdev))
>>>>>> return vfio_irq_set_block(vdev, start, count, fds, index);
>>>>>>
>>>>>> - ret = vfio_intx_enable(vdev);
>>>>>> + ret = vfio_intx_enable(vdev, start, count, index);
>>>>>
>>>>> Please trace what happens when a user calls SET_IRQS to setup a trigger
>>>>> eventfd with start = 0, count = 1, followed by any other combination of
>>>>> start and count values once is_intx() is true. vfio_intx_enable()
>>>>> cannot be the only place we bounds check the user, all of the INTx
>>>>> callbacks should be an error or nop if vector != 0. Thanks,
>>>>>
>>>>
>>>> Thank you very much for catching this. I plan to add the vector
>>>> check to the device_name() and request_interrupt() callbacks. I do
>>>> not think it is necessary to add the vector check to disable() since
>>>> it does not operate on a range and from what I can tell it depends on
>>>> a successful enable() that already contains the vector check. Similar,
>>>> free_interrupt() requires a successful request_interrupt() (that will
>>>> have vector check in next version).
>>>> send_eventfd() requires a valid interrupt context that is only
>>>> possible if enable() or request_interrupt() succeeded.
>>>
>>> Sounds reasonable.
>>>
>>>> If user space creates an eventfd with start = 0 and count = 1
>>>> and then attempts to trigger the eventfd using another combination then
>>>> the changes in this series will result in a nop while the current
>>>> implementation will result in -EINVAL. Is this acceptable?
>>>
>>> I think by nop, you mean the ioctl returns success. Was the call a
>>> success? Thanks,
>>
>> Yes, I mean the ioctl returns success without taking any
>> action (nop).
>>
>> It is not obvious to me how to interpret "success" because from what I
>> understand current INTx and MSI/MSI-x are behaving differently when
>> considering this flow. If I understand correctly, INTx will return
>> an error if user space attempts to trigger an eventfd that has not
>> been set up while MSI and MSI-x will return 0.
>>
>> I can restore existing INTx behavior by adding more logic and a return
>> code to the send_eventfd() callback so that the different interrupt types
>> can maintain their existing behavior.
>
> Ah yes, I see the dilemma now. INTx always checked start/count were
> valid but MSI/X plowed through regardless, and with this series we've
> standardized the loop around the MSI/X flow.
>
> Tricky, but probably doesn't really matter. Unless we break someone.
>
> I can ignore that INTx can be masked and signaling a masked vector
> doesn't do anything, but signaling an unconfigured vector feels like an
> error condition and trying to create verbiage in the uAPI header to
> weasel out of that error and unconditionally return success makes me
> cringe.
>
> What if we did this:
>
> uint8_t *bools = data;
> ...
> for (i = start; i < start + count; i++) {
> if ((flags & VFIO_IRQ_SET_DATA_NONE) ||
> ((flags & VFIO_IRQ_SET_DATA_BOOL) && bools[i - start])) {
> ctx = vfio_irq_ctx_get(vdev, i);
> if (!ctx || !ctx->trigger)
> return -EINVAL;
> intr_ops[index].send_eventfd(vdev, ctx);
> }
> }
>

This looks good. Thank you very much. Will do.

I studied the code more and have one more observation related to this portion
of the flow:
>From what I can tell this change makes the INTx code more robust. If I
understand current implementation correctly it seems possible to enable
INTx but not have interrupt allocated. In this case the interrupt context
(ctx) will exist but ctx->trigger will be NULL. Current
vfio_pci_set_intx_trigger()->vfio_send_intx_eventfd() only checks if
ctx is valid. It looks like it may call eventfd_signal(NULL) where
pointer is dereferenced.

If this is correct then I think a separate fix that can easily be
backported may be needed. Something like:

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 237beac83809..17ec46d8ab29 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -92,7 +92,7 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
struct vfio_pci_irq_ctx *ctx;

ctx = vfio_irq_ctx_get(vdev, 0);
- if (WARN_ON_ONCE(!ctx))
+ if (WARN_ON_ONCE(!ctx || !ctx->trigger))
return;
eventfd_signal(ctx->trigger);
}

> And we note the behavior change for MSI/X in the commit log and if
> someone shouts that we broke them, we can make that an -errno or
> continue based on is_intx(). Sound ok? Thanks,

I'll be sure to highlight the impact on MSI/MSI-x. Please do expect this
in the final patch "vfio/pci: Remove duplicate interrupt management flow"
though since that is where the different flows are merged.

I am not familiar with how all user space interacts with this flow and if/how
this may break things. I did look at Qemu code and I was not able to find
where it intentionally triggers MSI/MSI-x interrupts, I could only find it
for INTx.

If this does break things I would like to also consider moving the
different behavior into the interrupt type's respective send_eventfd()
callback instead of adding interrupt type specific code (like
is_intx()) into the shared flow.

Thank you.

Reinette