Re: [PATCH] procfs: Improve Scaling in proc

From: Nathan Zimmer
Date: Thu Oct 18 2012 - 15:53:54 EST

On 10/18/2012 02:46 AM, Eric Dumazet wrote:
On Wed, 2012-10-17 at 15:25 -0500, Nathan Zimmer wrote:
I am currently tracking a hotlock reported by a customer on a large, 512 cores,
system, I am currently running 3.7.0 rc1 but the issue looks like it has been
this way for a very long time.
The offending lock is proc_dir_entry->pde_unload_lock.

This patch converts the replaces the lock with the rcu. It is a refresh of what
was orignally suggested by Eric Dumazet. I refreshed it to the 3.7.

Supporting numbers, lower is better, they are from the test I posted earlier.
cpuinfo baseline Rcu
tasks read-sec read-sec
1 0.0141 0.0141
2 0.0140 0.0142
4 0.0140 0.0141
8 0.0145 0.0140
16 0.0553 0.0168
32 0.1688 0.0549
64 0.5017 0.1690
128 1.7005 0.5038
256 5.2513 2.0804
512 8.0529 3.0162

Cc: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: David Woodhouse <dwmw2@xxxxxxxxxxxxx>
Cc: Alexey Dobriyan <adobriyan@xxxxxxxxx>
Signed-off-by: Nathan Zimmer <nzimmer@xxxxxxx>

Hmm, this patch had several issues and I had no time yet to work on a
new version. I probably wont have time in a near future.

Paul sent me some comments about it, I hope he doesnt mind I copy them
here, if you want to polish the patch.

Thanks !

I'll try to polish this up and resend it.
And any comments are most welcome.

On Wed, 2012-10-03 at 10:56 -0700, Paul E. McKenney wrote:
Finally getting back to this... :-/

Why not set the initial value of the reference counter to 1
(rather than zero), continue acquiring with atomic_inc(), but
use atomic_dec_and_test() to decrement? Put a completion in
the data structure, so if the atomic_dec_and_test() indicates that
the counter is now zero, do a complete().

Then to free the object, remove it from the data structure, do a
synchronize_rcu(), do an atomic_dec_and_test() to remove the initial
value, again doing a complete() if the counter is now zero. The do
a wait_for_completion().

This would get rid of the polling loop.

So, what am I missing here? ;-)

Thanx, Paul

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at