Re: More linux-2.6.9 module problems

From: linux-os
Date: Tue Nov 09 2004 - 17:19:42 EST


On Tue, 9 Nov 2004, Mike Waychison wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

linux-os wrote:
On Tue, 9 Nov 2004, Mike Waychison wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

linux-os wrote:


I have a memory-test procedure that tests
memory on a board, accessed via the PCI bus.
There is a lot of RAM and it's bank-switched
into some 64k windows so it takes a lot of
time to test, about 60 seconds.

This is in a module, therefore inside the kernel.
When it is invoked via an ioctl() call, the
kernel is frozen for the whole test-time. The
test procedure does not use any spin-locks nor
does it even use any semaphores. It just does a
bunch of read/write operations over the PCI/Bus.

I thought that I could enable the preemptible-
kernel option and the machine would then respond
normally. Not so. Even with 4 CPUs, when one
ioctl() is busy in the kernel, nothing else
happens until its done. Even keyboard activity
is gone, no Caps Lock and no Num Lock, no `ping`
response over the network. However, the machine
comes back to life when the memory-test is done.

This is kernel version 2.6.9. Is it possible that
somebody left on the BKL when calling a module
ioctl() on this version? If not, what do I do
to be able to execute a time-consuming procedure
from inside the kernel? Do I break it up into
sections and execute schedule() periodically
(temporary work-around --works)??


The BKL has always been grabbed across ioctls. Drop the lock when you
enter your f_op->ioctl call and grab it again open completion.


Hmmm. I get 'scheduling while atomic' screaming across the screen!
There are no atomic operations in my ioctl functions so I don't
know what its complaining about. I think I shouldn't have tried
to do anything with BKL because I (my task) doesn't own it.


'Scheduling while atomic' means you called some function that may
schedule itself out while you are holding a spinlock. Note that the BKL
is not a regular spinlock, and scheduling is allowed while holding it.

Please see
http://james.bond.edu.au/courses/inft73626@033/Assigs/Papers/kernel_locking_techniques.html
by Robert Love, the section titled "The Big Kernel Lock"

Something else is wrong with your code.

Not quite. Something is wrong with the e100 network driver used in
2.6.9. When I do:

int ioctl(,,,,)
{
int ret;
unlock_kernel();
ret = original_ioctl(...);
lock_kernel();
return ret;
}
In my driver, completely unrelated to the network.... It's
something in the e100 network driver that the kernel's
complaining about. If I shut down the network and remove
the network driver module I don't have any problems while
enabling BKL. Everything runs fine.

The code that runs is:

/*
* Copyright(c) 2004 Analogic Corporation
*
* This program may be distributed under the GNU Public License
* version 2, as published by the Free Software Foundation, Inc.,
* 59 Temple Place, Suite 330 Boston, MA, 02111.
*
* File ram_test.c Created 10-MAY-2001 Richard B. Johnson
*/

#include <linux/kernel.h>

/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
* The following are in file rwcheck.S
*/
extern void xorlw(volatile void *men, size_t wrd, size_t len);
extern void fill_rnd(volatile void *men, size_t len);
extern unsigned char *check_rnd(volatile void *men, size_t len);
extern void set_seed(int);

/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
* This tests RAM to make sure it is read/writable, and uniquely-
* addressable i.e., working.
* If the RAM is not working, this returns the address of the
* first failing location, otherwise it returns NULL.
*/

#define SEED 0x12345678

unsigned char *testram(volatile void *mem, size_t len)
{
len /= sizeof(size_t);
set_seed(SEED);
fill_rnd(mem, len);
xorlw(mem, 0x55555555, len);
xorlw(mem, 0xaaaaaaaa, len);
xorlw(mem, 0xa5555555, len);
xorlw(mem, 0x5a555555, len);
xorlw(mem, 0x55a55555, len);
xorlw(mem, 0x555a5555, len);
xorlw(mem, 0x5555a555, len);
xorlw(mem, 0x55555a55, len);
xorlw(mem, 0x555555a5, len);
xorlw(mem, 0x5555555a, len);
xorlw(mem, 0x5aaaaaaa, len);
xorlw(mem, 0xa5aaaaaa, len);
xorlw(mem, 0xaa5aaaaa, len);
xorlw(mem, 0xaaa5aaaa, len);
xorlw(mem, 0xaaaa5aaa, len);
xorlw(mem, 0xaaaaa5aa, len);
xorlw(mem, 0xaaaaaa5a, len);
xorlw(mem, 0xaaaaaaa5, len);
xorlw(mem, 0xaa55aa55, len);
xorlw(mem, 0x55aa55aa, len);
xorlw(mem, 0xaa55aa55, len);
xorlw(mem, 0x55aa55aa, len);
xorlw(mem, 0xaaaaaaaa, len);
xorlw(mem, 0x5aaaaaaa, len);
xorlw(mem, 0xa5aaaaaa, len);
xorlw(mem, 0xaa5aaaaa, len);
xorlw(mem, 0xaaa5aaaa, len);
xorlw(mem, 0xaaaa5aaa, len);
xorlw(mem, 0xaaaaa5aa, len);
xorlw(mem, 0xaaaaaa5a, len);
xorlw(mem, 0xaaaaaaa5, len);
xorlw(mem, 0xa5555555, len);
xorlw(mem, 0x5a555555, len);
xorlw(mem, 0x55a55555, len);
xorlw(mem, 0x555a5555, len);
xorlw(mem, 0x5555a555, len);
xorlw(mem, 0x55555a55, len);
xorlw(mem, 0x555555a5, len);
xorlw(mem, 0x5555555a, len);
xorlw(mem, 0x55555555, len);
set_seed(SEED);
return check_rnd(mem, len);
}

The 60 seconds is a very long time to not have a responsive machine.
Once I removed the BKL, the machine was responsive as long as I
removed the network driver. There must be something in that network
driver that is timing-sensitive and I just ticked it off.

I will try a 3-COM board in a few minutes. The 'real' target machines
don't use either of these so it might just be a non-event although
the maintainer of the e100 should know that I've got an interesting
test platform if he's got a patch!


Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by John Ashcroft.
98.36% of all statistics are fiction.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/