Re: Kernel panic: Route cache, RCU, possibly FIB trie.

From: Jesper Dangaard Brouer
Date: Thu Mar 23 2006 - 10:38:47 EST

On Tue, 21 Mar 2006, David S. Miller wrote:

From: Jesper Dangaard Brouer <hawk@xxxxxxx>
Date: Tue, 21 Mar 2006 15:51:34 +0100 (CET)

You guessed right... I did enable IP_ROUTE_MULTIPATH_CACHED, I have
now disabled it and equal multi path routing in general

It is almost certainly the cause of your crashes, that code
is still extremely raw and that's why it is listed as "EXPERIMENTAL".

It seems your are right :-) (and I'll take more care of using experimental code on production again). The machine, has now been running for 34 hours without crashing. The strange thing is that I'm running the same kernel on 30 other (similar) machines, which have not crashed. (I do suspect the specific traffic load pattern to influence this)

BUT, I do think I have noticed another problem in the garbage collection code (route.c), that causes the garbage collector (almost) never to garbage collect.

This is caused by the value "ip_rt_max_size" (/proc/sys/net/ipv4/route/max_size)
being set too large. It is set to 16 times the gc_thresh value (this size dependend on the memory size). In the garbage collection function (rt_garbage_collect) garbage collecting entries are ignored (gc_ignored) if the number of entries are below "ip_rt_max_size".

With 1Gb memory, gc_thresh=65536 times 16 is 1048576. Which means that we only start to garbage collect when there is more than 1 million entries. This seems wrong... (the reason it does not grow this large is the 600 second periodic flushes).

Jesper Brouer

Cand. scient datalog
Dept. of Computer Science, University of Copenhagen

grep . /proc/sys/net/ipv4/route/*
grep: /proc/sys/net/ipv4/route/flush: Operation not permitted

