> Before we jump to any conclusions can people seeing kerneld type problems also
> try running kernels with SKB debugging on (edit include/linux/skbuff.h) and
> built from clean. Im still not convinced all the memory tramples are out of
> the net code, and while route to %p is born dead is ok in itself a storm of
> them seems a little odd too.
>
It is the whole crowd of bugs. I cannot fix them, just because
I could not test these fixes. I believe, that Bjorn and Johnathan
should make it.
The first problem is in wrong kerneld interface.
kerneld_send decides whether call should be atomic or not
on the base of intr_count.
arp.c (and, possibly another calls of kerneld. Bjorn, please check it!)
assumes, that it DOES NOT SLEEP!
Workaround: intr_count++ ... intr_count-- around kerneld_send
The second problem: the message
printk("Ouch, kerneld:msgsnd wants to sleep at interrupt!\n");
can be partially solved just by commenting it out 8)
Really, arp code should detect when arpd is not running,
and fall to normal operation mode.
The third problem: maybe, I missed something, but
I do not see, why kerneld_send is callable from interrupt!
It is apparently NOT REENTERANT!!!
Bjorn! Please look at it. If I am right, it will result in random
kernel crashes.
Johnathan wrote:
> Something/one has changed the
> behaviour of ARP in the past 10-15 releases and it hasn't been stable since,
> in particular on machines with more than one interface.
The only change: now arp looks at "gratuitous" updates,
that crucially increases load on arpd. It was the only change,
and it should not affect arpd logic, but it really will amplify
the effect of bugs listed above.
Alexey Kuznetsov.