Re: [PATCH] libkmod-module: add support for a patient module removal option

From: Luis Chamberlain
Date: Fri Aug 06 2021 - 17:45:31 EST


On Wed, Aug 04, 2021 at 11:47:20AM -0700, Lucas De Marchi wrote:
> On Wed, Aug 04, 2021 at 10:58:59AM -0700, Luis Chamberlain wrote:
> > On Tue, Aug 03, 2021 at 01:24:17PM -0700, Luis Chamberlain wrote:
> > > + diff --git a/libkmod/libkmod-module.c b/libkmod/libkmod-module.c
> > <-- snip -->
> > > + ERR(mod->ctx, "%s refcnt is %ld waiting for it to become 0\n", mod->name, refcnt);
> >
> > OK after running many tests with this I think we need to just expand
> > this so that the error message only applies when -v is passed to
> > modprobe, otherwise we get the print message every time, and using
> > INFO() doesn't cut it, given the next priority level available to
> > the library is LOG_INFO (6) and when modprobe -v is passed we set the
> > log level to LOG_NOTICE (5), so we need a new NOTICE(). I'll send a v2
> > with that included as a separate patch.
>
> Or maybe move the sleep to modprobe instead of doing it in the
> library?

We have a few callers which need to impement the wait, and so having
a library call is one way to share code. I realized this while first
open coding this in each call site and realizing I was just
re-inventing the wheel.

> The sleep(1) seems like an arbitrary number to be introduced
> in the lib.

Indeed, the ideal thing here is to introduce then a timeout, and have
a default setting, which code can override.

> Since kernfs is pollable,
> maybe we could rather introduce an API to
> return the pid in which the application has to wait for

The kernel does not have information to provide userspace any
information for how long it should wait. I mean, in a test case I am
using its an lvm pvremove and for some reason it seems the removal
is asynchronously putting the module refcnt (even though I do not believe
DM_DEFERRED_REMOVE is ever set by default), and I had fixed the kernel's
own asynchronous removal of the request_queue a little while back so
I don't think that's the issue either.

If you mean to consider adding *more* information to the kernel, as
a new feature, I'm afraid any PID triggering a bump in the refcnt is
ephemeral, and so I don't think it would help. It would only be useful
if userspace *knew* it would trigger a recnt bump, and that seems
likely a promise we would probaby want to break any time. Unless we
want to get into the business of granting userspace the ability to
bump refcnt's with new structural information it promises is legit.
Then userspace can query for this. However that still seems overkill,
I can't see a need for it yet.

> and then the
> application can use whatever it wants to poll, including controlling a
> timeout.

A dynamic timeout seems sensible indeed.

> I'm saying this because sleep(1) may be all fine for modprobe, but for
> other applications using libkmod it may not play well with their mainloops
> (and they may want to control both granularity of the sleep and a max
> timeout).

Indeed. A default timeout set into the context seem to be the way to go,
which callers can override. Then modprobe -t <ms_wait> can use this caller to
update the ctx timeout to a setting.

I'll post a v2 with these changes.

Luis