Re: [PATCH 2/2] module: fix bne2 "gave up waiting for init ofmodule libcrc32c"

From: Brandon Philips
Date: Tue Jun 01 2010 - 22:11:32 EST


On 16:51 Tue 01 Jun 2010, Linus Torvalds wrote:
> On Tue, 1 Jun 2010, Brandon Philips wrote:
> > When I tested a Kernel with Rusty's modules branch pulled onto
> > 2.6.35-rc1 I got duplicate sysfs path errors:
> Hmm. Yeah, the module_mutex used to be held across the whole "find -> add"
> state, but isn't any more.

Right.

> > To fix this we need to make sure that we only have one copy of a
> > module going through load_module at a time. Patch suggestion
> > follows which boots without errors. I am sure there is a better
> > way to do what it does ;)
>
> I think Rusty may have made the lock a bit _too_ finegrained there,
> and didn't add it to some places that needed it. It looks, for
> example, like PATCH 1/2 actually drops the lock in places where it's
> needed ("find_module()" is documented to need it, but now
> load_module() didn't hold it at all when it did the find_module()).

Right, I noticed that too and held the lock in the patch I sent.

> Rather than adding a new "module_loading" list, I think we should be
> able to just use the existing "modules" list, and just fix up the
> locking a bit.
>
> In fact, maybe we could just move the "look up existing module" a
> bit later - optimistically assuming that the module doesn't exist,
> and then just undoing the work if it turns out that we were wrong,
> just before adding ourselves to the list.
>
> A patch something like the appended (TOTALLY UNTESTED!)

FWIW, I tried this same idea initially and it breaks because the
kobject EEXIST is coming from mod_sysfs_init() which happens further
up in load_module() before the list_add_rcu().

I also tried the obvious variation of moving the list_add_rcu() up
to where the find_module is but got:

[ 5.495549] sd 0:0:1:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB)
[ 5.496931] ehci_hcd: Unknown symbol usb_hcd_resume_root_hub (err 0)
[ 5.497002] ehci_hcd: Unknown symbol usb_hcd_pci_probe (err 0)
[ 5.497070] ehci_hcd: Unknown symbol usb_hcd_unlink_urb_from_ep (err 0)

Feeling a bit like GoldiLocks I gave up and sent the modules_loading
patch to illustrate the issue. :)

I will keep working out all the interdependencies to see if I can get
something to boot without something like the modules_loading list.

Cheers,

Brandon

> diff --git a/kernel/module.c b/kernel/module.c
> index a1f46a5..21f7ffa 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2198,11 +2198,6 @@ static noinline struct module *load_module(void __user *umod,
> goto free_mod;
> }
>
> - if (find_module(mod->name)) {
> - err = -EEXIST;
> - goto free_mod;
> - }
> -
> mod->state = MODULE_STATE_COMING;
>
> /* Allow arches to frob section contents and sizes. */
> @@ -2486,6 +2481,13 @@ static noinline struct module *load_module(void __user *umod,
> * The mutex protects against concurrent writers.
> */
> mutex_lock(&module_mutex);
> +
> + if (find_module(mod->name)) {
> + err = -EEXIST;
> + /* This will also unlock the mutex */
> + goto already_exists;
> + }
> +
> list_add_rcu(&mod->list, &modules);
> mutex_unlock(&module_mutex);
>
> @@ -2511,6 +2513,7 @@ static noinline struct module *load_module(void __user *umod,
> mutex_lock(&module_mutex);
> /* Unlink carefully: kallsyms could be walking list. */
> list_del_rcu(&mod->list);
> + already_exists:
> mutex_unlock(&module_mutex);
> synchronize_sched();
> module_arch_cleanup(mod);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/