Re: [bisected] Re: Module removal-related regression?

From: Dmitry Torokhov
Date: Mon Sep 11 2017 - 14:29:38 EST


On Mon, Sep 11, 2017 at 08:23:32AM -0700, Greg Kroah-Hartman wrote:
> On Sun, Sep 10, 2017 at 02:22:22PM -0700, Dmitry Torokhov wrote:
> > On Sun, Sep 10, 2017 at 12:13 PM, Jakub Kicinski <kubakici@xxxxx> wrote:
> > > On Sun, 10 Sep 2017 21:09:08 +0200, Greg Kroah-Hartman wrote:
> > >> On Sun, Sep 10, 2017 at 11:12:17AM -0700, Dmitry Torokhov wrote:
> > >> > On September 10, 2017 11:00:10 AM PDT, Jakub Kicinski <kubakici@xxxxx> wrote:
> > >> > >On Sun, 10 Sep 2017 09:21:11 -0700, Dmitry Torokhov wrote:
> > >> > >> On Sun, Sep 10, 2017 at 12:03:38AM +0200, Jakub Kicinski wrote:
> > >> > >> > On Sat, 09 Sep 2017 13:59:25 -0700, Dmitry Torokhov wrote:
> > >> > >> > > On September 9, 2017 1:17:26 PM PDT, Jakub Kicinski
> > >> > ><kubakici@xxxxx> wrote:
> > >> > >> > > >On Sat, 9 Sep 2017 12:55:51 -0700, Dmitry Torokhov wrote:
> > >> > >> > > >> On Sat, Sep 9, 2017 at 12:27 PM, Jakub Kicinski
> > >> > ><kubakici@xxxxx>
> > >> > >> > > >wrote:
> > >> > >> > > >> > On Sat, 9 Sep 2017 19:41:21 +0200, Jakub Kicinski wrote:
> > >> > >
> > >> > >> > > >> >> Hi!
> > >> > >> > > >> >>
> > >> > >> > > >> >> I'm having trouble with modules on linux/master. rmmod
> > >> > >succeeds
> > >> > >> > > >but the
> > >> > >> > > >> >> module is still loaded and the refcount goes to 1:
> > >> > >> > > >> >>
> > >> > >> > > >> >> #rmmod nfp; insmod ./src/nfp.ko nfp_pf_netdev=0 ; \
> > >> > >> > > >> >> /opt/netronome/bin/nfp-hwinfo -n 2 assembly.partno \
> > >> > >> > > >> >> lsmod | grep nfp; \
> > >> > >> > > >> >> rmmod nfp; \
> > >> > >> > > >> >> lsmod | grep nfp
> > >> > >> > > >> >> nfp 249856 0
> > >> > >> > > >> >> nfp 200704 1
> > >> > >> > > >> >>
> > >> > >> > > >> >> If I rmmod again the module will be actually unloaded. The
> > >> > >user
> > >> > >> > > >space
> > >> > >> > > >> >> is mostly Ubuntu 14.04. Has anyone seen this? I'm trying
> > >> > >to
> > >> > >> > > >bisect
> > >> > >> > > >> >> now...
> > >> > >> > > >> >
> > >> > >> > > >> > Got 'em!
> > >> > >> > > >> >
> > >> > >> > > >> > commit 1455cf8dbfd06aa7651dcfccbadb7a093944ca65 (HEAD,
> > >> > >> > > >refs/bisect/bad)
> > >> > >> > > >> > Author: Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx>
> > >> > >> > > >> > Date: Wed Jul 19 17:24:30 2017 -0700
> > >> > >> > > >> >
> > >> > >> > > >> > driver core: emit uevents when device is bound to a
> > >> > >driver
> > >> > >> > > >>
> > >> > >> > > >> Does it happen with all modules or only nfp one?
> > >> > >> > > >>
> > >> > >> > > >> It seems to work here:
> > >> > >> > > >>
> > >> > >> > > >> dtor@dtor-glaptop3:~ $ lsmod | grep psmouse
> > >> > >> > > >> psmouse 135168 0
> > >> > >> > > >> dtor@dtor-glaptop3:~ $ sudo rmmod psmouse
> > >> > >> > > >> dtor@dtor-glaptop3:~ $ lsmod | grep psmouse
> > >> > >> > > >> dtor@dtor-glaptop3:~ $ sudo modprobe psmouse
> > >> > >> > > >
> > >> > >> > > >It looks like the driver is actually reloaded. The driver used
> > >> > >to
> > >> > >> > > >return EPROBE_DEFER, but I think it doesn't any more (rebuilding
> > >> > >the
> > >> > >> > > >kernel to test that right now).
> > >> > >> > > >
> > >> > >> > > >Could the uevent on unbind tickle Ubuntu 14.04's udev or somehow
> > >> > >> > > >else cause the driver to be loaded again?
> > >> > >> > >
> > >> > >> > > It depends on how silly the udev rules are, but yes, this can
> > >> > >definitely happen.
> > >> > >> >
> > >> > >> > I confirmed the driver doesn't use EPROBE_DEFER any more:
> > >> > >> >
> > >> > >> > $ grep -nrI EPROBE_DEFER drivers/net/ethernet/netronome/
> > >> > >> > $
> > >> > >>
> > >> > >> Not sure why you bring the deferrals here, they have nothing to do
> > >> > >with
> > >> > >> module removal. Also, deferrals are rarely issued by the leaf driver,
> > >> > >and
> > >> > >> more often by providers of resources (GPIO, regulator, interrupt,
> > >> > >etc).
> > >> > >
> > >> > >Yes, it's unusual, but this driver used to do it. Which is exactly why
> > >> > >I brought it up. Turns out it was irrelevant :)
> > >> > >
> > >> > >> > I tested without any udev rules in /etc/udev/, just the standard
> > >> > >distro
> > >> > >> > ones. Same thing.
> > >> > >>
> > >> > >> Right, so this is the default udev rule:
> > >> > >>
> > >> > >> /lib/udev/rules.d/80-drivers.rules:
> > >> > >>
> > >> > >> # do not edit this file, it will be overwritten on update
> > >> > >>
> > >> > >> ACTION=="remove", GOTO="drivers_end"
> > >> > >>
> > >> > >> ENV{MODALIAS}=="?*", RUN{builtin}="kmod load $env{MODALIAS}"
> > >>
> > >> So if the new uevents do not have the MODALIAS line in them, then they
> > >> will not trigger this? Dmitry, can you see if that would fix this
> > >> problem without having to fix everyone's old versions of udev/systemd?
> >
> > Unfortunately MODALIAS= is being added by individual subsystems having
> > their subsystem specific format. Unless you'd be OK with
> > kobject_uevent_env() poking into the generated environment and zapping
> > MODALIAS= environment variables for KOBJ_BIND/KOBJ_UNBIND actions.
>
> Hm, any reason why it should be sending these values for those uevents?
> I guess it's not worth hacking around in the lower levels just for this,
> to work around crazy userspace stuff.
>
> > I'm still going to submit correction for the rule to systemd folks.
>
> Yes please.
>
> > > Perhaps another option is dropping the unbind event? From the commit
> > > message it seems like only bind is really needed ATM. Do events have
> > > to be symmetrical?
> >
> > While you are absolutely right that bind is the most important one,
> > I'd be hesitant removing unbind even though we do not have concrete
> > use case for it yet. The bind operation complements unbind, so having
> > bind uevent but not unbind "feels weird".
>
> We might want to disable it for a year or so for people to catch up with
> a newer version of udev/systemd, and then turn it back on?

That is an option, but maybe we could have the patch below for a year or
2 instead?

Jakub, can you try and see if that works for you?

--
Dmitry

driver core: suppress sending MODALIAS in UNBIND uevents

From: Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx>

The current udev rules cause modules to be loaded on all device events save
for "remove". With the introduction of KOBJ_BIND/KOBJ_UNBIND this causes
issues, as driver modules that have devices bound to their drivers get
immediately reloaded, and it appears to the user that module unloading doe
snot work.

The standard udev matching rule is foillowing:

ENV{MODALIAS}=="?*", RUN{builtin}+="kmod load $env{MODALIAS}"

Given that MODALIAS data is not terribly useful for UNBIND event, let's zap
it from the generated uevent environment until we get userspace updated
with the correct udev rule that only loads modules on "add" event.

Reported-by: Jakub Kicinski <kubakici@xxxxx>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx>
---
lib/kobject_uevent.c | 47 +++++++++++++++++++++++++++++++++++++++--------
1 file changed, 39 insertions(+), 8 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index e590523ea476..e5ccec526def 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -294,6 +294,24 @@ static void cleanup_uevent_env(struct subprocess_info *info)
}
#endif

+static void zap_modalias_env(struct kobj_uevent_env *env)
+{
+ int i;
+
+ for (i = 0; i < env->envp_idx;) {
+ if (strcmp(env->envp[i], "MODALIAS=")) {
+ i++;
+ continue;
+ }
+
+ if (i != env->envp_idx - 1)
+ memmove(&env->envp[i], &env->envp[i + 1],
+ sizeof(env->envp[i]) * env->envp_idx - 1);
+
+ env->envp_idx--;
+ }
+}
+
/**
* kobject_uevent_env - send an uevent with environmental data
*
@@ -409,16 +427,29 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
}
}

- /*
- * Mark "add" and "remove" events in the object to ensure proper
- * events to userspace during automatic cleanup. If the object did
- * send an "add" event, "remove" will automatically generated by
- * the core, if not already done by the caller.
- */
- if (action == KOBJ_ADD)
+ switch (action) {
+ case KOBJ_ADD:
+ /*
+ * Mark "add" event so we can make sure we deliver "remove"
+ * event to userspace during automatic cleanup. If
+ * the object did send an "add" event, "remove" will
+ * automatically generated by the core, if not already done
+ * by the caller.
+ */
kobj->state_add_uevent_sent = 1;
- else if (action == KOBJ_REMOVE)
+ break;
+
+ case KOBJ_REMOVE:
kobj->state_remove_uevent_sent = 1;
+ break;
+
+ case KOBJ_UNBIND:
+ zap_modalias_env(env);
+ break;
+
+ default:
+ break;
+ }

mutex_lock(&uevent_sock_mutex);
/* we will send an event, so request a new sequence number */