SMP race with module loading/unloading

Zack Weinberg (zack@rabi.columbia.edu)
Mon, 14 Dec 1998 11:30:09 -0500


I've found what I believe is a general problem with automatic module loading
and SMP.

Take a module, any module. If it doesn't spend all its time under the
kernel lock, the race will be easier to trigger. Find a program that
exercises the module's code but leaves it unreferenced afterward.

On an SMP machine, with a kernel compiled for automatic module loading:

- in one VT, run the test program over and over again. It helps if you have
several copies going at once. Like this:

while :; do ./prog & ./prog & ./prog & wait; done

- in another VT, run "rmmod -a" in a tight loop.

while :; do rmmod -a; done

Leave both these running for some time - ten minutes seems to do it. After
that time, you'll have a stuck module (bogus use count) or an oops, or both.

What's going on? Combination of two things. First, MOD_{INC,DEC}_USE_COUNT
don't use atomic operations. If two threads hit one at the same time, the
use count will be corrupted. Second, if a thread enters a module while the
use count is zero, and it doesn't hold the kernel lock, delete_module() can
pull the module out from under the thread.

The upshot is that you have to hold the kernel lock before you enter module
code, and you can't drop it while the use count is zero. You also can't
modify or examine the use count without the lock. This isn't just
inefficient, in some cases it's impossible - e.g. if a module needs the mm
semaphore, it has to acquire that before it gets the kernel lock.

It would be fairly straightforward to fix this, but the changes involved
would affect a lot of code. M_[ID]_U_C need to use atomic_inc and
atomic_dec. There needs to be a 'module lock' which controls the module
syscalls and nothing else.

If it weren't a nasty bug, I'd say leave till 2.3, but I'm not sure we can
afford to do that.

zw

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/