Re: [PATCH V2 7/8] vfio/pci: Support dynamic MSI-x

From: Reinette Chatre
Date: Mon Apr 03 2023 - 18:51:07 EST


Hi Alex,

On 4/3/2023 1:22 PM, Alex Williamson wrote:
> On Mon, 3 Apr 2023 10:31:23 -0700
> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
>
>> Hi Alex,
>>
>> On 3/31/2023 3:24 PM, Alex Williamson wrote:
>>> On Fri, 31 Mar 2023 10:49:16 -0700
>>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
>>>> On 3/30/2023 3:42 PM, Alex Williamson wrote:
>>>>> On Thu, 30 Mar 2023 16:40:50 -0600
>>>>> Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:
>>>>>
>>>>>> On Tue, 28 Mar 2023 14:53:34 -0700
>>>>>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
>>>>>>
>>
>> ...
>>
>>>>>>> + msix_map.index = vector;
>>>>>>> + msix_map.virq = irq;
>>>>>>> + pci_msix_free_irq(pdev, msix_map);
>>>>>>> + }
>>>>>>> + vfio_pci_memory_unlock_and_restore(vdev, cmd);
>>>>>>> out_put_eventfd_ctx:
>>>>>>> eventfd_ctx_put(trigger);
>>>>>>> out_free_name:
>>>>>>> kfree(ctx->name);
>>>>>>> ctx->name = NULL;
>>>>>>> +out_free_ctx:
>>>>>>> + if (allow_dyn_alloc && new_ctx)
>>>>>>> + vfio_irq_ctx_free(vdev, ctx, vector);
>>>>>>> return ret;
>>>>>>> }
>>>>>>>
>>>>>>
>>>>>> Do we really need the new_ctx test in the above cases? Thanks,
>>>>
>>>> new_ctx is not required for correctness but instead is used to keep
>>>> the code symmetric.
>>>> Specifically, if the user enables MSI-X without providing triggers and
>>>> then later assign triggers then an error path without new_ctx would unwind
>>>> more than done in this function, it would free the context that
>>>> was allocated within vfio_msi_enable().
>>>
>>> Seems like we already have that asymmetry, if a trigger is unset we'll
>>> free the ctx allocated by vfio_msi_enable(). Tracking which are
>>
>> Apologies, but could you please elaborate on where the asymmetry is? I am
>> not able to see a flow in this solution where the ctx allocated by
>> vfio_msi_enable() is freed if the trigger is unset.
>
> The user first calls SET_IRQS to enable MSI-X with some number of
> vectors with (potentially) an eventfd for each vector. The user later
> calls SET_IRQS passing a -1 eventfd for one or more of the vectors with
> an eventfd initialized in the prior step. Given that we find the ctx,
> the ctx has a trigger, and assuming dynamic allocation is supported, the
> ctx is freed and vfio_msi_set_vector_signal() returns w/o allocating a
> new ctx. We've de-allocated both the irq and context initialized from
> vfio_msi_enable().

This is correct. The comment I responded to was in regards to an unset
trigger. The flow you describe is when a trigger is set. Not that
it changes your point though, which is that vfio_msi_set_vector_signal()
frees memory allocated by vfio_msi_enable(). This is clear to me. This
is intended behavior. My concern is/was with the error path where a function
failing may not be expected to change state, you address that concern below.

>>> allocated where is unnecessarily complex, how about a policy that
>>
>> I do not see this as tracking where allocations are made. Instead I
>> see it as containing/compartmentalizing state changes with the goal of
>> making the code easier to understand and maintain. Specifically, new_ctx
>> is used so that if vfio_msi_set_vector_signal() fails, the state
>> before and after vfio_msi_set_vector_signal() will be the same.
>
> That's not really possible given how we teardown the existing ctx
> before configuring the new one and unwind to disable contexts in
> vfio_msi_set_block()

Very unlikely indeed. I agree.

>> I do agree that it makes vfio_msi_set_vector_signal() more complex
>> and I can remove new_ctx if you find that this is unnecessary after
>> considering the motivations behind its use.
>
> If the goal is to allow the user to swap one eventfd for another, where
> the result will always be the new eventfd on success or the old eventfd
> on error, I don't see that this code does that, or that we've ever
> attempted to make such a guarantee. If the ioctl errors, I think the
> eventfds are generally deconfigured. We certainly have the unwind code
> that we discussed earlier that deconfigures all the vectors previously
> touched in the loop (which seems to be another path where we could
> de-allocate from the set of initial ctxs).

Thank you for your patience in hearing and addressing my concerns. I plan
to remove new_ctx in the next version.

>>> devices supporting vdev->has_dyn_msix only ever have active contexts
>>> allocated? Thanks,
>>
>> What do you see as an "active context"? A policy that is currently enforced
>> is that an allocated context always has an allocated interrupt associated
>> with it. I do not see how this could be expanded to also require an
>> enabled interrupt because interrupt enabling requires a trigger that
>> may not be available.
>
> A context is essentially meant to track a trigger, ie. an eventfd
> provided by the user. In the static case all the irqs are necessarily
> pre-allocated, therefore we had no reason to consider a dynamic array
> for the contexts. However, a given context is really only "active" if
> it has a trigger, otherwise it's just a placeholder. When the
> placeholder is filled by an eventfd, the pre-allocated irq is enabled.

I see.

>
> This proposal seems to be a hybrid approach, pre-allocating some
> initial set of irqs and contexts and expecting the differentiation to
> occur only when new vectors are added, though we have some disagreement
> about this per above. Unfortunately I don't see an API to enable MSI-X
> without some vectors, so some pre-allocation of irqs seems to be
> required regardless.

Right. pci_alloc_irq_vectors() or equivalent continues to be needed to
enable MSI-X. Even so, it does seem possible (within vfio_msi_enable())
to just allocate one vector using pci_alloc_irq_vectors()
and then immediately free it using pci_msix_free_irq(). What do you think?
If I understand correctly this can be done without allocating any context
and leave MSI-X enabled without any interrupts allocated. This could be a
way to accomplish the "active context" policy for dynamic allocation.
This is not a policy that can be applied broadly to interrupt contexts though
because MSI and non-dynamic MSI-X could still have contexts with allocated
interrupts without eventfd.

> But if non-active contexts were only placeholders in the pre-dynamic
> world and we now manage them via a dynamic array, why is there any
> pre-allocation of contexts without knowing the nature of the eventfd to
> fill it? We could have more commonality between cases if contexts are
> always dynamically allocated, which might simplify differentiation of
> the has_dyn_msix cases largely to wrappers allocating and freeing irqs.
> Thanks,

Thank you very much for your guidance. I will digest this some more and
see how wrappers could be used. In the mean time while trying to think how
to unify this code I do think there is an issue in this patch in that
the get_cached_msi_msg()/pci_write_msi_msg()
should not be in an else branch.

Specifically, I think it needs to be:
if (msix) {
if (irq == -EINVAL) {
/* dynamically allocate interrupt */
}
get_cached_msi_msg(irq, &msg);
pci_write_msi_msg(irq, &msg);
}


Thank you very much

Reinette