Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism
From: Jason Wang
Date: Tue Mar 28 2023 - 02:08:53 EST
On Tue, Mar 28, 2023 at 12:05 PM Yongji Xie <xieyongji@xxxxxxxxxxxxx> wrote:
>
> On Tue, Mar 28, 2023 at 11:44 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >
> > On Tue, Mar 28, 2023 at 11:33 AM Yongji Xie <xieyongji@xxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Mar 28, 2023 at 11:14 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie <xieyongji@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji <xieyongji@xxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > To support interrupt affinity spreading mechanism,
> > > > > > > this makes use of group_cpus_evenly() to create
> > > > > > > an irq callback affinity mask for each virtqueue
> > > > > > > of vdpa device. Then we will unify set_vq_affinity
> > > > > > > callback to pass the affinity to the vdpa device driver.
> > > > > > >
> > > > > > > Signed-off-by: Xie Yongji <xieyongji@xxxxxxxxxxxxx>
> > > > > >
> > > > > > Thinking hard of all the logics, I think I've found something interesting.
> > > > > >
> > > > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
> > > > > > pass irq_affinity to transport specific find_vqs(). This seems a
> > > > > > layer violation since driver has no knowledge of
> > > > > >
> > > > > > 1) whether or not the callback is based on an IRQ
> > > > > > 2) whether or not the device is a PCI or not (the details are hided by
> > > > > > the transport driver)
> > > > > > 3) how many vectors could be used by a device
> > > > > >
> > > > > > This means the driver can't actually pass a real affinity masks so the
> > > > > > commit passes a zero irq affinity structure as a hint in fact, so the
> > > > > > PCI layer can build a default affinity based that groups cpus evenly
> > > > > > based on the number of MSI-X vectors (the core logic is the
> > > > > > group_cpus_evenly). I think we should fix this by replacing the
> > > > > > irq_affinity structure with
> > > > > >
> > > > > > 1) a boolean like auto_cb_spreading
> > > > > >
> > > > > > or
> > > > > >
> > > > > > 2) queue to cpu mapping
> > > > > >
> > > > >
> > > > > But only the driver knows which queues are used in the control path
> > > > > which don't need the automatic irq affinity assignment.
> > > >
> > > > Is this knowledge awarded by the transport driver now?
> > > >
> > >
> > > This knowledge is awarded by the device driver rather than the transport driver.
> > >
> > > E.g. virtio-scsi uses:
> > >
> > > struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control
> > > queue, vq1 is event queue
> >
> > Ok, but it only works as a hint, it's not a real affinity. As replied,
> > we can pass an array of boolean in this case then transport driver
> > knows it doesn't need to use automatic affinity for the first two
> > queues.
> >
>
> But we don't know whether we would use other fields in structure
> irq_affinity in the future. So a full set should be better?
Good point. So the issue is the calc_sets() and we probably need that
if there's a virtio driver that needs more than one set of vectors
that needs to be spreaded. Technically, we could have a virtio level
abstraction for this but I agree it's probably not worth bothering
now.
>
> > >
> > > > E.g virtio-blk uses:
> > > >
> > > > struct irq_affinity desc = { 0, };
> > > >
> > > > Atleast we can tell the transport driver which vq requires automatic
> > > > irq affinity.
> > > >
> > >
> > > I think that is what the current implementation does.
> > >
> > > > > So I think the
> > > > > irq_affinity structure can only be created by device drivers and
> > > > > passed to the virtio-pci/virtio-vdpa driver.
> > > >
> > > > This could be not easy since the driver doesn't even know how many
> > > > interrupts will be used by the transport driver, so it can't built the
> > > > actual affinity structure.
> > > >
> > >
> > > The actual affinity mask is built by the transport driver,
> >
> > For PCI yes, it talks directly to the IRQ subsystems.
> >
> > > device
> > > driver only passes a hint on which queues don't need the automatic irq
> > > affinity assignment.
> >
> > But not for virtio-vDPA since the IRQ needs to be dealt with by the
> > parent driver. For our case, it's the VDUSE where it doesn't need IRQ
> > at all, a queue to cpu mapping is sufficient.
> >
>
> The device driver doesn't know whether it is binded to virtio-pci or
> virtio-vdpa. So it should pass a full set needed by the automatic irq
> affinity assignment instead of a subset. Then virtio-vdpa can choose
> to pass a queue to cpu mapping to VDUSE, which is what we do now (use
> set_vq_affinity()).
Yes, so basically two ways:
1) automatic IRQ management, passing affd to find_vqs(), affinity was
determined by the transport (e.g vDPA).
2) affinity that is under the control of the driver, it needs to use
set_vq_affinity() but need to deal with cpu hotplug stuffs.
Thanks
>
> Thanks,
> Yongji
>