Re: RCU latency regression in 2.6.16-rc1

From: Dipankar Sarma
Date: Sat Jan 28 2006 - 14:32:57 EST


On Sat, Jan 28, 2006 at 01:51:23PM -0500, Lee Revell wrote:
> On Sat, 2006-01-28 at 13:00 -0500, Lee Revell wrote:
> > OK, now we are making progress.
>
> I spoke too soon, it's not fixed:
>
> preemption latency trace v1.1.5 on 2.6.16-rc1
> --------------------------------------------------------------------
> latency: 4183 us, #3676/3676, CPU#0 | (M:rt VP:0, KP:0, SP:0 HP:0)
> -----------------
> evolutio-2877 0d.s. 97us : local_bh_enable (rt_run_flush)
> evolutio-2877 0d.s. 98us : local_bh_enable (rt_run_flush)
> evolutio-2877 0d.s. 99us : local_bh_enable (rt_run_flush)
> evolutio-2877 0d.s. 100us : local_bh_enable (rt_run_flush)
> evolutio-2877 0d.s. 101us : local_bh_enable (rt_run_flush)
>
> [ etc ]
>
> evolutio-2877 0d.s. 4079us : local_bh_enable (rt_run_flush)
> evolutio-2877 0d.s. 4080us : local_bh_enable (rt_run_flush)

I am not sure if I am interpreting the latency trace right,
but it seems that there is a difference between the problem
you were seeing earlier and now.

In one of your earlier traces, I saw -

<idle>-0 0d.s. 182us : dst_destroy (dst_rcu_free)
<idle>-0 0d.s. 183us : ipv4_dst_destroy (dst_destroy)

[ etc - zillions of dst_rcu_free()s deleted ]

<idle>-0 0d.s. 13403us : dst_rcu_free (__rcu_process_callbacks)
<idle>-0 0d.s. 13403us : dst_destroy (dst_rcu_free)

This points to latency increase caused by lots and lots of
RCU callbacks doing dst_rcu_free(). Do you still see those ?

Your new trace shows that we are held up in in rt_run_flush().
I guess we need to investigate why we spend so much time in rt_run_flush(),
because of a big route table or the lock acquisitions.

Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/