Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable

From: Leon Romanovsky
Date: Wed Aug 11 2021 - 10:01:33 EST


On Wed, Aug 11, 2021 at 06:27:32AM -0700, Jakub Kicinski wrote:
> On Wed, 11 Aug 2021 09:10:49 +0300 Leon Romanovsky wrote:
> > On Tue, Aug 10, 2021 at 04:53:18PM -0700, Jakub Kicinski wrote:
> > > On Tue, 10 Aug 2021 16:37:30 +0300 Leon Romanovsky wrote:
> > > > This series prepares code to remove devlink_reload_enable/_disable API
> > > > and in order to do, we move all devlink_register() calls to be right
> > > > before devlink_reload_enable().
> > > >
> > > > The best place for such a call should be right before exiting from
> > > > the probe().
> > > >
> > > > This is done because devlink_register() opens devlink netlink to the
> > > > users and gives them a venue to issue commands before initialization
> > > > is finished.
> > > >
> > > > 1. Some drivers were aware of such "functionality" and tried to protect
> > > > themselves with extra locks, state machines and devlink_reload_enable().
> > > > Let's assume that it worked for them, but I'm personally skeptical about
> > > > it.
> > > >
> > > > 2. Some drivers copied that pattern, but without locks and state
> > > > machines. That protected them from reload flows, but not from any _set_
> > > > routines.
> > > >
> > > > 3. And all other drivers simply didn't understand the implications of early
> > > > devlink_register() and can be seen as "broken".
> > >
> > > What are those implications for drivers which don't implement reload?
> > > Depending on which parts of devlink the drivers implement there may well
> > > be nothing to worry about.
> > >
> > > Plus devlink instances start out with reload disabled. Could you please
> > > take a step back and explain why these changes are needed.
> >
> > The problem is that devlink_register() adds new devlink instance to the
> > list of visible devlinks (devlink_list). It means that all devlink_*_dumpit()
> > will try to access devices during their initialization, before they are ready.
> >
> > The more troublesome case is that devlink_list is iterated in the
> > devlink_get_from_attrs() and it is used in devlink_nl_pre_doit(). The
> > latter function will return to the caller that new devlink is valid and
> > such caller will be able to proceed to *_set_doit() functions.
> >
> > Just as an example:
> > * user sends netlink message
> > * devlink_nl_cmd_eswitch_set_doit()
> > * ops->eswitch_mode_set()
> > * Are you sure that all drivers protected here?
> > I remind that driver is in the middle of its probe().
> >
> > Someone can argue that drivers and devlink are protected from anything
> > harmful with their global (devlink_mutex and devlink->lock) and internal
> > (device->lock, e.t.c.) locks. However it is impossible to prove for all
> > drivers and prone to errors.
> >
> > Reload enable/disable gives false impression that the problem exists in
> > that flow only, which is not true.
> >
> > devlink_reload_enable() is a duct tape because reload flows much easier
> > to hit.
>
> Right :/
>
> > > > In this series, we focus on items #1 and #2.
> > > >
> > > > Please share your opinion if I should change ALL other drivers to make
> > > > sure that devlink_register() is the last command or leave them in an
> > > > as-is state.
> > >
> > > Can you please share the output of devlink monitor and ip monitor link
> > > before and after? The modified drivers will not register ports before
> > > they register the devlink instance itself.
> >
> > Not really, they will register but won't be accessible from the user space.
> > The only difference is the location of "[dev,new] ..." notification.
>
> Is that because of mlx5's use of auxdev, or locking? I don't see
> anything that should prevent the port notification from coming out.

And it is ok, kernel can (and does) send notifications, because we left
devlink_ops assignment to be in devlink_alloc(). It ensures that all
flows that worked before will continue to work without too much changes.

>
> I think the notifications need to get straightened out, we can't notify
> about sub-objects until the object is registered, since they are
> inaccessible.

I'm not sure about that. You present the case where kernel and user
space races against each other and historically kernel doesn't protect
from such flows.

For example, you can randomly remove and add kernel modules. At some
point of time, you will get "missing symbols errors", just because
one module tries to load and it depends on already removed one.

We must protect kernel and this is what I do. User shouldn't access
devlink instance before he sees "dev name" notification.

Of course, we can move various iterators to devlink_register(), but it
will make code much complex, because we have objects that can be
registered at any time (IMHO. trap is one of them) and I will need to
implement notification logic that separate objects that were created
before devlink_register and after.

Thanks