Re: [Intel-gfx] [PATCH 1/2] module: add a function to add module references

From: Lucas De Marchi
Date: Fri Apr 29 2022 - 11:53:27 EST


On Fri, Apr 29, 2022 at 11:23:51AM +0100, Mauro Carvalho Chehab wrote:
Em Fri, 29 Apr 2022 12:10:07 +0200
Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> escreveu:

On Fri, Apr 29, 2022 at 10:15:03AM +0100, Mauro Carvalho Chehab wrote:
> HI Greg,
>
> Em Fri, 29 Apr 2022 10:30:33 +0200
> Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> escreveu:
>
> > On Fri, Apr 29, 2022 at 09:07:57AM +0100, Mauro Carvalho Chehab wrote:
> > > Hi Daniel,
> > >
> > > Em Fri, 29 Apr 2022 09:54:10 +0200
> > > Daniel Vetter <daniel@xxxxxxxx> escreveu:
> > >
> > > > On Fri, Apr 29, 2022 at 07:31:15AM +0100, Mauro Carvalho Chehab wrote:
> > > > > Sometimes, device drivers are bound using indirect references,
> > > > > which is not visible when looking at /proc/modules or lsmod.
> > > > >
> > > > > Add a function to allow setting up module references for such
> > > > > cases.
> > > > >
> > > > > Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab@xxxxxxxxxx>
> > > >
> > > > This sounds like duct tape at the wrong level. We should have a
> > > > device_link connecting these devices, and maybe device_link internally
> > > > needs to make sure the respective driver modules stay around for long
> > > > enough too. But open-coding this all over the place into every driver that
> > > > has some kind of cross-driver dependency sounds terrible.
> > > >
> > > > Or maybe the bug is that the snd driver keeps accessing the hw/component
> > > > side when that is just plain gone. Iirc there's still fundamental issues
> > > > there on the sound side of things, which have been attempted to paper over
> > > > by timeouts and stuff like that in the past instead of enforcing a hard
> > > > link between the snd and i915 side.
> > >
> > > I agree with you that the device link between snd-hda and the DRM driver
> > > should properly handle unbinding on both directions. This is something
> > > that require further discussions with ALSA and DRM people, and we should
> > > keep working on it.
> > >
> > > Yet, the binding between those drivers do exist, but, despite other
> > > similar inter-driver bindings being properly reported by lsmod, this one
> > > is invisible for userspace.
> > >
> > > What this series does is to make such binding visible. As simple as that.
> >
> > It also increases the reference count, and creates a user/kernel api
> > with the symlinks, right? Will the reference count increase prevent the
> > modules from now being unloadable?
> >
> > This feels like a very "weak" link between modules that should not be
> > needed if reference counting is implemented properly (so that things are
> > cleaned up in the correct order.)
>
> The refcount increment exists even without this patch, as
> hda_component_master_bind() at sound/hda/hdac_component.c uses
> try_module_get() when it creates the device link.

Ok, then why shouldn't try_module_get() be creating this link instead of
having to manually do it this way again? You don't want to have to go
around and add this call to all users of that function, right?

Works for me, but this is not a too trivial change, as the new
try_module_get() function will require two parameters, instead of one:

- the module to be referenced;
- the module which will reference it.

On trivial cases, one will be THIS_MODULE, but, in the specific case
of snd_hda, the binding is done via an ancillary routine under
snd_hda_core, but the actual binding happens at snd_hda_intel.

Ok, we could add a __try_module_get() (or whatever other name that
would properly express what it does) with two parameters, and then
define try_module_get() as:

#define try_module_get(mod) __try_module_get(mod, THIS_MODULE)

agree that this should be done at this level rather than open coding it
at every driver. Main improvement being fixed here regardless of the
snd-hda-intel issue is to properly annotate what is holding a module.

Right now we have 1) symbol module dependencies; 2) kernel references;
3) userspace references. With (2) and (3) being unknown to the user from
lsmod pov. Handling this any time try_module_get() is called would make
(2) visible to lsmod.

Paired with fixes to the (unreleased) kmod 30[1], this allows `modprobe
-r --remove-holders <module>` to also try removing the holders before
removing the module itself.

thanks
Lucas De Marchi

[1] https://lore.kernel.org/linux-modules/20220329090912.geymr6xk7taq4rtq@xxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#t



Would that work for you?

Regards,
Mauro