bug fix to kerneld: Multiple concurent load request

Jacques Gelinas (jack@solucorp.qc.ca)
Wed, 3 Jul 1996 00:35:44 -0400 (EDT)


I have a small fix (sent also to bjorn) that plug an interesting bug in
kerneld. I am including the patch against modules-2.0.0 and a very small
test program that trigger the bug.

The bug is that if two processes are trying to open a device and kerneld
has to load the module, then two different load request will be sent
to kerneld. The first one will succeed. The other will fail because
/sbin/insmod will fail, noting that the request module is already loaded.
One process will effectivly received a error message stating that the
device does not exist.

The include patch fix that as kerneld now notes that there is a pending
request. With this patch, the two requester will receive a positive answer
(or negative, the same answer in fact) and will continue happily their "open"
processing. One poster on this list claims that he manages to load the
same module twice. There is probably a race in the "insmod" processing.
This patch does not address this directly.

The fix should hide this away. One unlikely scenario exist

1-module A is requested
2-kerneld call "/sbin/modprobe A"

at the same time

3-module B is requested
4-kerneld call "/sbin/modprobe B"

kerneld only knows that both module should be load since they have
different name.

The following situation may exist

-A and B are the same module (they are aliases for the same module). This
is unlikely as the kernel issue request using very specific names.
-A need B to operate (a stack of module).
This is also unlikely as the kernel does not request sub-module. For
example, the kernel may request the eth0 module, which is say the ne2000
module which need the 8390 module. THe kernel does not ask for the 8390
module directly.

These possibility do exist anyway and the fix has to be done in insmod.
When insmod is called from kerneld (it already knows that), it should not
fail if the module is already there anyway.

---------------the patch -----
*** modules-2.0.0/kerneld/kerneld.c Mon Jun 10 18:37:38 1996
--- modules-2.0.0.new/kerneld/kerneld.c Tue Jul 2 23:50:56 1996
***************
*** 327,332 ****
--- 327,354 ----


/*
+ Check if a request for a module is pending.
+ If it is, then we just returned the pid of the original
+ request.
+
+ When this request will return, all "job" with the same pid
+ will get the result. newjob->pid will be set to JOB_DONE
+ and kerneld will send back the return code to all caller.
+ */
+ static int check_pending_load (struct job *newjob)
+ {
+ int pid = -1;
+ struct job *job;
+ for (job = job_head; job; job = job->next) {
+ if (job->msg.mtype == KERNELD_REQUEST_MODULE
+ && strcmp(job->msg.text,newjob->msg.text)==0){
+ pid = job->pid;
+ }
+ }
+ return pid;
+ }
+
+ /*
* Execute the requested kerneld task, return pid or -errno.
* If pid is set to 0 then no process has been spawned.
*
***************
*** 397,404 ****
* keep an auto-loaded module at least this long
*/
alarm(delay);
!
! if ((pid = fork()) == 0) {
close(0);close(1);close(2);
dup(dev_null); dup(dev_null); dup(dev_null);
do_putenv(newjob);
--- 419,427 ----
* keep an auto-loaded module at least this long
*/
alarm(delay);
! pid = check_pending_load (newjob);
! if (pid == -1
! && (pid = fork()) == 0) {
close(0);close(1);close(2);
dup(dev_null); dup(dev_null); dup(dev_null);
do_putenv(newjob);

----------The small test program-----------
/*
This program trigger a double load of the floppy module.
It was done to replicate the problem and prove that the problem
went away.
*/
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>

static void doread()
{
int fd = open ("/dev/fd0",O_RDONLY);
if (fd != -1){
char buf[1000];
int n = read (fd,buf,1000);
printf ("n = %d\n",n);
}else{
printf ("Can't open %d(%s)\n",errno,strerror(errno));
}
}

int main (int argc, char *argv[])
{
if (fork()==0){
doread();
_exit (0);
}else{
doread();
}
return 0;
}

With the patch, this program should print "n = 1000" twice. With the old
kerneld, one process will fail. You must have your floppy driver as a
module though :-)

--------------------------------------------------------
Jacques Gelinas (jacques@solucorp.qc.ca)
Linuxconf: The ultimate administration system for Linux.
sunsite.unc.edu:/pub/Linux/system/Admin/linuxconf-...