Re: [PATCH 5/6] vdpa: add get_vq_num_unchangeable callback in vdpa_config_ops

From: Jason Wang
Date: Sun Sep 12 2021 - 23:13:38 EST


On Mon, Sep 13, 2021 at 10:59 AM Wu Zongyong
<wuzongyong@xxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Sep 13, 2021 at 09:43:40AM +0800, Jason Wang wrote:
> > On Fri, Sep 10, 2021 at 11:11 PM Cindy Lu <lulu@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Sep 10, 2021 at 5:20 PM Wu Zongyong
> > > <wuzongyong@xxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Fri, Sep 10, 2021 at 04:25:18PM +0800, Cindy Lu wrote:
> > > > > ,
> > > > >
> > > > > On Fri, Sep 10, 2021 at 3:33 PM Wu Zongyong
> > > > > <wuzongyong@xxxxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Fri, Sep 10, 2021 at 09:45:53AM +0800, Jason Wang wrote:
> > > > > > > On Thu, Sep 9, 2021 at 5:57 PM Wu Zongyong <wuzongyong@xxxxxxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Thu, Sep 09, 2021 at 05:28:26PM +0800, Jason Wang wrote:
> > > > > > > > > On Thu, Sep 9, 2021 at 4:02 PM Wu Zongyong <wuzongyong@xxxxxxxxxxxxxxxxx> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Sep 09, 2021 at 10:55:03AM +0800, Jason Wang wrote:
> > > > > > > > > > > On Wed, Sep 8, 2021 at 8:23 PM Wu Zongyong <wuzongyong@xxxxxxxxxxxxxxxxx> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > This new callback is used to indicate whether the vring size can be
> > > > > > > > > > > > change or not. It is useful when we have a legacy virtio pci device as
> > > > > > > > > > > > the vdpa device for there is no way to negotiate the vring num by the
> > > > > > > > > > > > specification.
> > > > > > > > > > >
> > > > > > > > > > > So I'm not sure it's worth bothering. E.g what if we just fail
> > > > > > > > > > > VHOST_SET_VRING_NUM it the value doesn't match what hardware has?
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > I think we should not call VHOST_SET_VRING_NUM in that case.
> > > > > > > > > >
> > > > > > > > > > If the hardware reports that the virtqueue size cannot be changed, we
> > > > > > > > > > should call VHOST_GET_VRING_NUM to get the static virtqueue size
> > > > > > > > > > firstly, then allocate the same size memory for the virtqueues and write
> > > > > > > > > > the address to hardware finally.
> > > > > > > > > >
> > > > > > > > > > For QEMU, we will ignore the properties rx/tx_queue_size and just get it
> > > > > > > > > > from the hardware if this new callback return true.
> > > > > > > > >
> > > > > > > > > This will break live migration. My understanding is that we can
> > > > > > > > > advertise those capability/limitation via the netlink management
> > > > > > > > > protocol then management layer can choose to use the correct queue
> > > > > > > > > size.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > I agree, it is a good idea.
> > > > > > > > BTW, can we also advertise mac address of network device? I found the
> > > > > > > > mac address generated by libvirt or qemu will break the network datapath
> > > > > > > > down if I don't specify the right mac explicitly in the XML or qemu
> > > > > > > > commandline.
> > > > > > >
> > > > > > > We never saw this before, AFAIK when vhost-vdpa is used, currently
> > > > > > > qemu will probably ignore the mac address set via command line since
> > > > > > > the config space is read from the device instead of qemu itself?
> > > > > > >
> > > > > >
> > > > > > I saw the code below in qemu:
> > > > > >
> > > > > > static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> > > > > > {
> > > > > > ...
> > > > > > if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> > > > > > struct virtio_net_config netcfg = {};
> > > > > > memcpy(&netcfg.mac, &n->nic_conf.macaddr, ETH_ALEN);
> > > > > > vhost_net_set_config(get_vhost_net(nc->peer),
> > > > > > (uint8_t *)&netcfg, 0, ETH_ALEN, VHOST_SET_CONFIG_TYPE_MASTER);
> > > > > > }
> > > > > > ...
> > > > > > }
> > > > > >
> > > > > > This write the mac address set via cmdline into vdpa device config, and
> > > > > > then guest will read it back.
> > > > > > If I remove these codes, it behaves like you said.
> > > > > >
> > > > > >
> > > > > Hi Zongyong
> > > > > I think this code only works while qemu get an all 0 mac address from
> > > > > hardware , you can get more information from the function
> > > > > virtio_net_get_config.
> > > >
> > > > It depends how vdpa_config_ops->set_config implements.
> > > > For mlx5, callback set_config do nothing. But for virtio-pci, callback
> > > > set_config will write the config register of the vdpa device, so qemu
> > > > will write the mac set via cmdline to hardware and the mac guest read
> > > > it back is the value writted by qemu just now.
> > > >
> > > So here comes a question, which MAC address has higher priority ?
> > > the MAC address in hardware or the MAC address from the cmdline?
> > > If both of these two MAC addresses exist, which should we use?
> > > I have checked the spec, not sure if the bit VIRTIO_NET_F_MAC is the right one?
> >
> > I think so, if VIRTIO_NET_F_MAC is set, qemu can override the mac otherwise not.
> >
> The spec says:
> "driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it. If the driver
> negotiates the VIRTIO_NET_F_MAC feature, the driver MUST set the physical address
> of the NIC to mac. Otherwise, it SHOULD use a locally-administered MAC address."
>
> To my understanding, I guess you mean qemu CANNOT override the mac
> device provides actually?

Seems not, if VIRTIO_NET_F_MAC is not negotiated, mac is not valid in
the config space:

"The mac address field always exists (though is only valid if
VIRTIO_NET_F_MAC is set)"

So I think the right approach:

- if mac is not specified in the cli, Qemu doesn't need to override the mac
- if mac is specified in the cli and VIRTIO_NET_F_MAC is supported,
Qemu can override the mac
- if mac is specified in the cli and VIRTIO_NET_F_MAC is not
supported, we need fail the launching

Note that we're working on extending the netlink management API to set
mac address during vDPA instance provisioning. Management layer can
then get the correct mac address and set it via cli. AFAIK, Cindy's
patch is a workaround when netlink doesn't support mac address.

Thanks

> > Thanks
> >
> > > if yes, I will post a patch in qemu and add check for this bit before
> > > we set the mac to hardware
> > > https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html
> > >
> > > Thanks
> > > cindy
> > > > > > > Thanks
> > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > What do you think?
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Wu Zongyong <wuzongyong@xxxxxxxxxxxxxxxxx>
> > > > > > > > > > > > ---
> > > > > > > > > > > > drivers/vhost/vdpa.c | 19 +++++++++++++++++++
> > > > > > > > > > > > drivers/virtio/virtio_vdpa.c | 5 ++++-
> > > > > > > > > > > > include/linux/vdpa.h | 4 ++++
> > > > > > > > > > > > include/uapi/linux/vhost.h | 2 ++
> > > > > > > > > > > > 4 files changed, 29 insertions(+), 1 deletion(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > > > > > > > > > index 9479f7f79217..2204d27d1e5d 100644
> > > > > > > > > > > > --- a/drivers/vhost/vdpa.c
> > > > > > > > > > > > +++ b/drivers/vhost/vdpa.c
> > > > > > > > > > > > @@ -350,6 +350,22 @@ static long vhost_vdpa_get_iova_range(struct vhost_vdpa *v, u32 __user *argp)
> > > > > > > > > > > > return 0;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > +static long vhost_vdpa_get_vring_num_unchangeable(struct vhost_vdpa *v,
> > > > > > > > > > > > + u32 __user *argp)
> > > > > > > > > > > > +{
> > > > > > > > > > > > + struct vdpa_device *vdpa = v->vdpa;
> > > > > > > > > > > > + const struct vdpa_config_ops *ops = vdpa->config;
> > > > > > > > > > > > + bool unchangeable = false;
> > > > > > > > > > > > +
> > > > > > > > > > > > + if (ops->get_vq_num_unchangeable)
> > > > > > > > > > > > + unchangeable = ops->get_vq_num_unchangeable(vdpa);
> > > > > > > > > > > > +
> > > > > > > > > > > > + if (copy_to_user(argp, &unchangeable, sizeof(unchangeable)))
> > > > > > > > > > > > + return -EFAULT;
> > > > > > > > > > > > +
> > > > > > > > > > > > + return 0;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> > > > > > > > > > > > void __user *argp)
> > > > > > > > > > > > {
> > > > > > > > > > > > @@ -487,6 +503,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> > > > > > > > > > > > case VHOST_VDPA_GET_IOVA_RANGE:
> > > > > > > > > > > > r = vhost_vdpa_get_iova_range(v, argp);
> > > > > > > > > > > > break;
> > > > > > > > > > > > + case VHOST_VDPA_GET_VRING_NUM_UNCHANGEABLE:
> > > > > > > > > > > > + r = vhost_vdpa_get_vring_num_unchangeable(v, argp);
> > > > > > > > > > > > + break;
> > > > > > > > > > > > default:
> > > > > > > > > > > > r = vhost_dev_ioctl(&v->vdev, cmd, argp);
> > > > > > > > > > > > if (r == -ENOIOCTLCMD)
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> > > > > > > > > > > > index 72eaef2caeb1..afb47465307a 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_vdpa.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_vdpa.c
> > > > > > > > > > > > @@ -146,6 +146,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index,
> > > > > > > > > > > > struct vdpa_vq_state state = {0};
> > > > > > > > > > > > unsigned long flags;
> > > > > > > > > > > > u32 align, num;
> > > > > > > > > > > > + bool may_reduce_num = true;
> > > > > > > > > > > > int err;
> > > > > > > > > > > >
> > > > > > > > > > > > if (!name)
> > > > > > > > > > > > @@ -171,8 +172,10 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index,
> > > > > > > > > > > >
> > > > > > > > > > > > /* Create the vring */
> > > > > > > > > > > > align = ops->get_vq_align(vdpa);
> > > > > > > > > > > > + if (ops->get_vq_num_unchangeable)
> > > > > > > > > > > > + may_reduce_num = !ops->get_vq_num_unchangeable(vdpa);
> > > > > > > > > > > > vq = vring_create_virtqueue(index, num, align, vdev,
> > > > > > > > > > > > - true, true, ctx,
> > > > > > > > > > > > + true, may_reduce_num, ctx,
> > > > > > > > > > > > virtio_vdpa_notify, callback, name);
> > > > > > > > > > > > if (!vq) {
> > > > > > > > > > > > err = -ENOMEM;
> > > > > > > > > > > > diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> > > > > > > > > > > > index 35648c11e312..f809b7ada00d 100644
> > > > > > > > > > > > --- a/include/linux/vdpa.h
> > > > > > > > > > > > +++ b/include/linux/vdpa.h
> > > > > > > > > > > > @@ -195,6 +195,9 @@ struct vdpa_iova_range {
> > > > > > > > > > > > * @vdev: vdpa device
> > > > > > > > > > > > * Returns the iova range supported by
> > > > > > > > > > > > * the device.
> > > > > > > > > > > > + * @get_vq_num_unchangeable Check if size of virtqueue is unchangeable (optional)
> > > > > > > > > > > > + * @vdev: vdpa device
> > > > > > > > > > > > + * Returns boolean: unchangeable (true) or not (false)
> > > > > > > > > > > > * @set_map: Set device memory mapping (optional)
> > > > > > > > > > > > * Needed for device that using device
> > > > > > > > > > > > * specific DMA translation (on-chip IOMMU)
> > > > > > > > > > > > @@ -262,6 +265,7 @@ struct vdpa_config_ops {
> > > > > > > > > > > > const void *buf, unsigned int len);
> > > > > > > > > > > > u32 (*get_generation)(struct vdpa_device *vdev);
> > > > > > > > > > > > struct vdpa_iova_range (*get_iova_range)(struct vdpa_device *vdev);
> > > > > > > > > > > > + bool (*get_vq_num_unchangeable)(struct vdpa_device *vdev);
> > > > > > > > > > > >
> > > > > > > > > > > > /* DMA ops */
> > > > > > > > > > > > int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
> > > > > > > > > > > > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> > > > > > > > > > > > index c998860d7bbc..184f1f7f8498 100644
> > > > > > > > > > > > --- a/include/uapi/linux/vhost.h
> > > > > > > > > > > > +++ b/include/uapi/linux/vhost.h
> > > > > > > > > > > > @@ -150,4 +150,6 @@
> > > > > > > > > > > > /* Get the valid iova range */
> > > > > > > > > > > > #define VHOST_VDPA_GET_IOVA_RANGE _IOR(VHOST_VIRTIO, 0x78, \
> > > > > > > > > > > > struct vhost_vdpa_iova_range)
> > > > > > > > > > > > +/* Check if the vring size can be change */
> > > > > > > > > > > > +#define VHOST_VDPA_GET_VRING_NUM_UNCHANGEABLE _IOR(VHOST_VIRTIO, 0X79, bool)
> > > > > > > > > > > > #endif
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.31.1
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
>