Re: Steve's crashing 1.3 machine, cured?

Bjorn Ekwall (bj0rn@blox.se)
Wed, 17 Apr 1996 00:39:14 +0200 (MET DST)


Alexey Kuznetsov wrote:
> Linus Torvalds (torvalds@cs.helsinki.FI) wrote:
[about Steve's Ooopes]
> : Sounds like a bad kerneld/module interaction.

The stack trace from the Oops looks _very_ strange!
I don't believe that vmalloc _ever_ calls sys_init_module!!!

If you look at the call stack from the bottom up, it looks as if
the Oops happens when sys_init_module calls the init_module
function in the module, but the stack trace seems to have been mangled.

If we can rule out any mishaps with the "latest version"(?) of
__generic_memcpy_tofs, a (slightly) possible explanation could be that =
the
module has become unloaded (by kerneld) and that a reference to
unallocated memory was being made. I still really don't believe that
the sys_init_module has _anything_ to do with the Oops...

But, with kerneld's frequent (re-)loading and unloading of modules we w=
ill
see effects of lacking MOD_INC_USE_COUNT in some modules, the hard way.=
..

> : Maybe the sound modul=
e was
> : busy being loaded when kerneld started loading it again, or somethi=
ng
> : like this. Bj=F6rn, is there anything that protects against that ki=
nd of
> : "re-entrancy" problem?
>=20
> No, current module stuff does no locking and even do not try to resol=
ve
> any race conditions. I encountered this problem very long ago.

Well, kerneld doesn't protect from this, but insmod and the kernel does=
...

If a module is loaded, or is in the process of being loaded, there is a
check in linux/kernel/module.c that looks if there already exists a mod=
ule
with the same name.
This check is made already in the first step of loading a module, i.e.
sys_create_module(), where the name of the module-to-be is also stored.
So, there _is_ a locking mechanism for handling "re-entrant" module loa=
ds.

The only "duplicate" check in kerneld is for "request-route" for the
same IP-adress.

There shouldn't be yet another check in kerneld for a thing that the
kernel itself already checks for.

> F.e. when amd starts, it mounts a lot of NFS file systems,
> and it starts happily only in 50% cases, is NFS is module.
> Moreover, I will eat my hat, if someone will able to make kerneld
> truly reliable without complete rewrite of module.c.

Do you favour any particular seasoning, or do you eat the hat "au natur=
el"? :-)

>=20
> It was the main reason, why I rewrote all the module stuff
> from the scratch. Unfortunately, it is orthogonal to standard
> module implementation.

Definitely...
The symbol versioning got lost, along with the support for different
module sets that are now implemented in /lib/modules.
You have also removed the other uses for kerneld in the process,
and put a suboptimal insmod in kernel space. It doesn't belong there..=
.

There _are_ a couple of things left to do with kerneld, especially with
handling return messages to requests generated during interrupts,
so you don't have to hurry to select _what_ hat to use just yet... :-)

Bjorn <bj0rn@blox.se>