Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

From: Jakub Kicinski
Date: Mon May 27 2024 - 12:31:47 EST


On Mon, 27 May 2024 19:10:55 +0300 Ido Schimmel wrote:
> On Wed, May 22, 2024 at 07:22:12AM -0700, Jakub Kicinski wrote:
> > On Wed, 22 May 2024 13:56:11 +0000 Danielle Ratson wrote:
> > > The event should match the below:
> > > event == NETLINK_URELEASE && notify->protocol == NETLINK_GENERIC
> > >
> > > Then iterate over the list to look for work that matches the dev and portid.
> > > The socket doesn’t close until the work is done in that case.
> >
> > Okay, good, yes. I think you can use one of the callbacks I mentioned
> > below to achieve the same thing with less complexity than the notifier.
>
> Danielle already has a POC with the notifier and it's not that
> complicated. I wasn't aware of the netlink notifier, but we found it
> when we tried to understand how other netlink families get notified
> about a socket being closed.
>
> Which advantages do you see in the sock_priv_destroy() approach? Are you
> against the notifier approach?

Notifier is not incorrect, but I worry it will result in more code,
and basically duplication of what genl_sk_priv* does. Perhaps you
managed to code it up very neatly - if so feel free to send the v6
and we can discuss further if needed?

> > > > Easiest way to "notice" the socket got closed would probably be to add some
> > > > info to genl_sk_priv_*(). ->sock_priv_destroy() will get called. But you can also
> > > > get a close notification in the family
> > > > ->unbind callback.
>
> Isn't the unbind callback only for multicast (whereas we are using
> unicast)?

True, should work in practice, I think. But sock_priv is much better.

> > > Is there a scenario that we hit this event and won't intend to cancel the work?
> >
> > I think it's up to us. I don't see any legit reason for user space to
> > intentionally cancel the flashing. So the only option is that user space
> > is either buggy or has crashed, and the socket got closed before
> > flashing finished. Right?
>
> We don't think that closing the socket / killing the process mid
> flashing is a legitimate scenario. We looked into it in order to avoid
> sending unicast notifications to a socket that did not ask for them but
> gets them because it was bound to the port ID that was used by the old
> socket.
>
> I agree that we don't need to cancel the work and can simply have the
> work item stop sending notifications. User space will get an error if it
> tries to flash a module that is already being flashed in the background.
> WDYT?

SGTM!