Re: modutils-970118 (timeout in kerneld)

Jacques Gelinas (jack@solucorp.qc.ca)
Thu, 23 Jan 1997 14:39:09 -0500 (EST)


On Thu, 23 Jan 1997, Marcin Dalecki wrote:

>
> Kevin Lentin wrote:
> > Cool idea. Maybe we could have an option in the config file which could
> > specify a different wait time for different modules so people with slow
> > devices could set the timeout on those modules to 2 minutes, for instance.
>
> And once again this shows how broken the kerneld implementation currently
> is ;-). There are two timeouts in the kernel for it. Once a HARD wired
> timeout of one minute for the kerneld responding to an qutoloa request
> and secondly an hard wired timeout of several seconds for the kerneld to
> try unloading of modules marked for deletion. The second timeout may be
> dynamicalized relatively easy. But until then we really need conf_file.c
> in the libutils.a (OK. Richard it will not take too long before I'm done
> wit it.)

These are not a timeout in the protocol. This is a simplification. The
original kerneld design (done by me and bjorn) was attempting to unload a
module upon kernel request. In fact those calls are still available in the
kernel. It was the task of a module to request its own unloading.

The was overly complicated because each part of the kernel had to decide
all by itself if it should unload itself. While this was straightforward
for the normal case were a kernel has been used successfully, this was
becoming messy when one kernel resource was trying to achieve something
(mounting a filesystem) and failing. There was many place to hook the
request for unloading. Ultimatly, because loading/unloading was not that a
good idea (I mean, each time module usage goes to 0 you wipe it and 2
second later you have to reload it), a new request was designed to tell
kerneld that it was allowed to unload the module but not forced to do so.

This last attempt was also overly complex forcing all kind of very small
modification here and there in the kernel (kerneld support loading of many
different things in the kernel, from file systems to network devices).

Finally we went for the simple 1 minute cycle where kerneld is trying to
unload everything that has a 0 usage count. To provide some degree of
fairness, we introduce two information in the kernel. One is the usage
count and the other is a flag telling that the module has been used in the
last kerneld cycle. This was giving each module at least a full minute
(kernel cycle is one minute, changeable on the command line) of rest
before being unloaded. This is not a timeout in the protocol to hide some
races, just a way to avoid excessive loading activity.

There were two situations which were causing problem though

-When loading two modules which are related

-You load the isofs module
-You load your CD-Rom module. It takes too much time to load/probe
and the first module is unloaded (after a minute).

While the fix for that situation should be in the kernel, the simpler fix
was to disallow unloading of module while one is currently loading. This
is now in kerneld since release 2.1.8.

-Another case is with the PPP daemon. Mostly, it is loaded and then the
chat command is started (to do a dialout on the modem). Sometime this
chat does take too long and the module is unloaded.

Again, this should be fixed in the kernel. The PPP module is in use or
not. Maybe it is a bug (an unawareness I would say) in pppd.

We are not talking about timeout in the protocol at all here. We are
talking about delays to smooth loading/unloading usage. Some work could be
done in the kernel to manage usage count of the module more properly.

The major problem with that is that the module concept of the kernel does
not show in the different structure. Once a module is loaded, the rest of
the kernel can't tell that this is a module or not. For exemple, in
fs/*.c, we are manipulating a structure which describe what a filesystem
can do or not. This is not telling us that the filesystem is managed by a
module. From this structure, we can't talk (I guess we should) to the
module which provided it.

Here is an example, from fs/super.c in do_mount() and under it.

To mount a filesystem, we need

-the filesystem type to be loaded
-From the filesystem description structure, we use the
read_super() function to load the basic information about
the file system on disk.
-This read_super() may take a long long time to complete if
the device is itself a module and take a long time to probe.
(more than a minute in some special CDROM setup).
-This means that the filesystem module may be unloaded in the
mean time.

So we need to put some locking (increase usage count) around the
read_super call. So the solution is to put a "struct module" pointer in
different structure of the kernel (such as struct filesystem).

Comments are welcome!

--------------------------------------------------------
Jacques Gelinas (jacques@solucorp.qc.ca)
Linuxconf: The ultimate administration system for Linux.
see http://www.solucorp.qc.ca/linuxconf