On Sat, Jan 28, 2006 at 01:51:23PM -0500, Lee Revell wrote:On Sat, 2006-01-28 at 13:00 -0500, Lee Revell wrote:OK, now we are making progress.I spoke too soon, it's not fixed:
preemption latency trace v1.1.5 on 2.6.16-rc1
--------------------------------------------------------------------
latency: 4183 us, #3676/3676, CPU#0 | (M:rt VP:0, KP:0, SP:0 HP:0)
-----------------
evolutio-2877 0d.s. 97us : local_bh_enable (rt_run_flush)
evolutio-2877 0d.s. 98us : local_bh_enable (rt_run_flush)
evolutio-2877 0d.s. 99us : local_bh_enable (rt_run_flush)
evolutio-2877 0d.s. 100us : local_bh_enable (rt_run_flush)
evolutio-2877 0d.s. 101us : local_bh_enable (rt_run_flush)
[ etc ]
evolutio-2877 0d.s. 4079us : local_bh_enable (rt_run_flush)
evolutio-2877 0d.s. 4080us : local_bh_enable (rt_run_flush)
I am not sure if I am interpreting the latency trace right,
but it seems that there is a difference between the problem
you were seeing earlier and now.
In one of your earlier traces, I saw -
<idle>-0 0d.s. 182us : dst_destroy (dst_rcu_free)
<idle>-0 0d.s. 183us : ipv4_dst_destroy (dst_destroy)
[ etc - zillions of dst_rcu_free()s deleted ]
<idle>-0 0d.s. 13403us : dst_rcu_free (__rcu_process_callbacks)
<idle>-0 0d.s. 13403us : dst_destroy (dst_rcu_free)
This points to latency increase caused by lots and lots of
RCU callbacks doing dst_rcu_free(). Do you still see those ?
Your new trace shows that we are held up in in rt_run_flush(). I guess we need to investigate why we spend so much time in rt_run_flush(),
because of a big route table or the lock acquisitions.