Re: [PATCH net-next v3 3/3] virtio-net: synchronize operstate with admin state on up/down

From: Jason Wang
Date: Sun Jul 28 2024 - 22:00:38 EST


On Fri, Jul 26, 2024 at 3:24 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
>
> On Wed, Jul 10, 2024 at 11:03:42AM +0800, Jason Wang wrote:
> > On Tue, Jul 9, 2024 at 9:28 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Jul 09, 2024 at 04:02:14PM +0800, Jason Wang wrote:
> > > > This patch synchronize operstate with admin state per RFC2863.
> > > >
> > > > This is done by trying to toggle the carrier upon open/close and
> > > > synchronize with the config change work. This allows propagate status
> > > > correctly to stacked devices like:
> > > >
> > > > ip link add link enp0s3 macvlan0 type macvlan
> > > > ip link set link enp0s3 down
> > > > ip link show
> > > >
> > > > Before this patch:
> > > >
> > > > 3: enp0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
> > > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > > ......
> > > > 5: macvlan0@enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> > > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > > >
> > > > After this patch:
> > > >
> > > > 3: enp0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
> > > > link/ether 00:00:05:00:00:09 brd ff:ff:ff:ff:ff:ff
> > > > ...
> > > > 5: macvlan0@enp0s3: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> > > > link/ether b2:a9:c5:04:da:53 brd ff:ff:ff:ff:ff:ff
> > >
> > > I think that the commit log is confusing. It seems to say that
> > > the issue fixed is synchronizing state with hardware
> > > config change.
> > > But your example does not show any
> > > hardware change. Isn't this example really just
> > > a side effect of setting carrier off on close?
> >
> > The main goal for this patch is to make virtio-net follow RFC2863. The
> > main thing that is missed is to synchronize the operstate with admin
> > state, if we do this, we get several good results, one of the obvious
> > one is to allow virtio-net to propagate status to the upper layer, for
> > example if the admin state of the lower virtio-net is down it should
> > be propagated to the macvlan on top, so I give the example of using a
> > stacked device. I'm not we had others but the commit log is probably
> > too small to say all of it.
> >
> > >
> > >
> > > > Cc: Venkat Venkatsubra <venkat.x.venkatsubra@xxxxxxxxxx>
> > > > Cc: Gia-Khanh Nguyen <gia-khanh.nguyen@xxxxxxxxxx>
> > > > Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
> > >
> > > Yes but this just forces lots of re-reads of config on each
> > > open/close for no good reason.
> >
> > Does it really harm? Technically the link status could be changed
> > several times when the admin state is down as well.
>
> It's a bunch of extra vmexits on each VM boot, yes.

A new version is ready, will be posted to net-next soon.

>
> > > Config interrupt is handled in core, you can read once
> > > on probe and then handle config changes.
> >
> > Per RFC2863, the code tries to avoid dealing with any operstate change
> > via config space read when the admin state is down.
>
> what exactly in RFC2863 are you referring to? This?
> (2) if ifAdminStatus is down, then ifOperStatus will normally also
> be down (or notPresent) i.e., there is not (necessarily) a
> fault condition on the interface.

Yes.

> So basically, just call virtio_config_driver_disable on close,
> and then config interrupt will not trigger.
> Why is that not enough?

So when close (admin down), we need to call netif_carrier_off() to
make (oper status down).

Thanks

>
>
> > >
> > >
> > >
> > >
> > >
> > > > ---
> > > > drivers/net/virtio_net.c | 64 ++++++++++++++++++++++++----------------
> > > > 1 file changed, 38 insertions(+), 26 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 0b4747e81464..e6626ba25b29 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -2476,6 +2476,25 @@ static void virtnet_cancel_dim(struct virtnet_info *vi, struct dim *dim)
> > > > net_dim_work_cancel(dim);
> > > > }
> > > >
> > > > +static void virtnet_update_settings(struct virtnet_info *vi)
> > > > +{
> > > > + u32 speed;
> > > > + u8 duplex;
> > > > +
> > > > + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> > > > + return;
> > > > +
> > > > + virtio_cread_le(vi->vdev, struct virtio_net_config, speed, &speed);
> > > > +
> > > > + if (ethtool_validate_speed(speed))
> > > > + vi->speed = speed;
> > > > +
> > > > + virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, &duplex);
> > > > +
> > > > + if (ethtool_validate_duplex(duplex))
> > > > + vi->duplex = duplex;
> > > > +}
> > > > +
> > > > static int virtnet_open(struct net_device *dev)
> > > > {
> > > > struct virtnet_info *vi = netdev_priv(dev);
> > > > @@ -2494,6 +2513,18 @@ static int virtnet_open(struct net_device *dev)
> > > > goto err_enable_qp;
> > > > }
> > > >
> > > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> > > > + virtio_config_driver_enable(vi->vdev);
> > > > + /* Do not schedule the config change work as the
> > > > + * config change notification might have been disabled
> > > > + * by the virtio core. */
> > >
> > > I don't get why you need this.
> > > If the notification was disabled it will just trigger later.
> > > This is exactly why using core is a good idea.
> >
> > So we need a read here (this seems to be not rare for most modern
> > hardware NICs) because we don't know if the link status is changed or
> > not and it is not guaranteed by virtio_config_driver_enable() since it
> > only works when there's a pending config change. Another thing is that
> > the device is being freezed, so the virtio core may prevent the device
> > from accessing the device.
> >
> > So using virtio_config_changed() will guaranteed that:
> >
> > 1) if the device is not being freezed, it will read the config space soon
> > 2) if the device is being freezed, the read of the config space will
> > be delayed to resume
> >
> > >
> > >
> > > > + virtio_config_changed(vi->vdev);
> > > > + } else {
> > > > + vi->status = VIRTIO_NET_S_LINK_UP;
> > > > + virtnet_update_settings(vi);
> > >
> > >
> > > And why do we need this here I don't get at all.
> >
> > See above, because doing this on a probe is racy and buggy: The
> > opersate is up even if the adminstate is not, this is conflict with
> > RFC2863:
> >
> > "
> > If ifAdminStatus is down(2) then ifOperStatus
> > should be down(2)
> > "
> >
> > >
> > > > + netif_carrier_on(dev);
> > > > + }
> > >
> > >
> > >
> > > > +
> > > > return 0;
> > > >
> > > > err_enable_qp:
> > > > @@ -2936,12 +2967,19 @@ static int virtnet_close(struct net_device *dev)
> > > > disable_delayed_refill(vi);
> > > > /* Make sure refill_work doesn't re-enable napi! */
> > > > cancel_delayed_work_sync(&vi->refill);
> > > > + /* Make sure config notification doesn't schedule config work */
> > > > + virtio_config_driver_disable(vi->vdev);
> > > > + /* Make sure status updating is cancelled */
> > > > + cancel_work_sync(&vi->config_work);
> > > >
> > > > for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > virtnet_disable_queue_pair(vi, i);
> > > > virtnet_cancel_dim(vi, &vi->rq[i].dim);
> > > > }
> > > >
> > > > + vi->status &= ~VIRTIO_NET_S_LINK_UP;
> > > > + netif_carrier_off(dev);
> > > > +
> > > > return 0;
> > > > }
> > > >
> > > > @@ -4640,25 +4678,6 @@ static void virtnet_init_settings(struct net_device *dev)
> > > > vi->duplex = DUPLEX_UNKNOWN;
> > > > }
> > > >
> > > > -static void virtnet_update_settings(struct virtnet_info *vi)
> > > > -{
> > > > - u32 speed;
> > > > - u8 duplex;
> > > > -
> > > > - if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_SPEED_DUPLEX))
> > > > - return;
> > > > -
> > > > - virtio_cread_le(vi->vdev, struct virtio_net_config, speed, &speed);
> > > > -
> > > > - if (ethtool_validate_speed(speed))
> > > > - vi->speed = speed;
> > > > -
> > > > - virtio_cread_le(vi->vdev, struct virtio_net_config, duplex, &duplex);
> > > > -
> > > > - if (ethtool_validate_duplex(duplex))
> > > > - vi->duplex = duplex;
> > > > -}
> > > > -
> > > > static u32 virtnet_get_rxfh_key_size(struct net_device *dev)
> > > > {
> > > > return ((struct virtnet_info *)netdev_priv(dev))->rss_key_size;
> > > > @@ -6000,13 +6019,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > > /* Assume link up if device can't report link status,
> > > > otherwise get link status from config. */
> > > > netif_carrier_off(dev);
> > > > - if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> > > > - schedule_work(&vi->config_work);
> > > > - } else {
> > > > - vi->status = VIRTIO_NET_S_LINK_UP;
> > > > - virtnet_update_settings(vi);
> > > > - netif_carrier_on(dev);
> > > > - }
> > >
> > >
> > > Here it all made sense - we were reading config for the 1st time.
> >
> > See above.
> >
> > Thanks
> >
> > >
> > >
> > > > for (i = 0; i < ARRAY_SIZE(guest_offloads); i++)
> > > > if (virtio_has_feature(vi->vdev, guest_offloads[i]))
> > > > --
> > > > 2.31.1
> > >
> > >
>