I'm here...
Let me just tell you that you are missing a lot if you're not using kerneld :-)
Well, at least there is no need for any drastic measures (yet).
The problems are kerneld related, for sure, but not due to _kerneld_ per se,
but problems with the interaction with the "kerneld-type" arpd daemon.
(The problem is more noticable since "login" makes some inet calls via libc
as well, if I remember correctly, and a failed arp will make it sad...)
1. Always make sure that all kerneld-type damons are started before you
do anything that depends on them, like using an aout executable on
a kernel with binfmt_aout compiled as a module.
There are some suggestions in the file "rc.hints" in the latest
"modules-1.x.y". You can find the latest snapshot on my web-page:
<http://www.pi.se/blox/modules/>.
I will make an official release of the current snapshot as soon as
I can find the time...
2. If you see the message "Ouch, kerneld timed out, message failed" then
you haven't started the kerneld daemon that the message was intended for.
This is handled in the kernel by telling the kernel level process that
the message failed, and as a notification to the sysadmin, the "Ouch"
message is printed. It is _not_ fatal by itself, but the kernel level
process has to live (gracefully) with the failure (most do...).
Fix: Start the kerneld-type daemon; in this case arpd.
3. If you see the message "Ouch, kerneld: msgsnd wants to sleep at interrupt!"
then the total space allocated for the message queue has become exhausted.
This might be caused by too large messages, or by not fetching the
answers that the kernel level process has requested from user space.
You should be able to verify this by doing "ipcs" whenever you have seen
this message. You should see a "used bytes" size around 16k, which is
the maximum space allowed for IPC messages "in transit" between user
space and kernel space (MSGMNB in <linux/msg.h>).
Quick (and temporary) fix: Compile the "kdstat" utility in the kerneld
source directory and do a "kdstat flush". This will make the kernel
release all "stale" messages in the IPC message queue.
Slightly dirty (and not likely) fix: If the problem is due to too large
messages, then increase MSGMNB in <linux/msg.h>.
Better fix: Make sure that the user level kerneld-type daemon doesn't
send messages back to the kernel unless requested.
Also: Make sure that the kernel level process handles failed messages
gracefully (i.e. when the return code from the kerneld call is -1).
There _is_ one thing I'd like to do add to kerneld (in the kernel):
If kerneld_send is called during an interrupt, with an expectation of an
answer from user space, the current behaviour is to _not_ wait for an
answer, since this would generate a call to "interruptable_sleep_on".
Instead a return value of -1 is returned if the requestor had indicated
that it wanted an answer from user space.
I would like to have a mechanism available that _could_ receive such a
message and then handle whatever the original kernel level requestor
intended to do with the answer from user space.
I.e. some kind of kernel internal "callback"... or a kernel thread?
Well, I won't implement anything like this before 2.1, I promise :-)
Cheers,
Bjorn
P.S. Hint to Jonathan: Make sure that arpd _doesn't_ send a message
back to the kernel unless the "id" field in the request is non-zero.
A zero in the "id" field means that there is no-one around waiting for
that specific answer.