Re: [PATCH] : struct dentry : place d_hash close to d_parent andd_name to speedup lookups

From: Eric Dumazet
Date: Thu Sep 01 2005 - 14:53:22 EST


Dipankar Sarma a écrit :
On Thu, Sep 01, 2005 at 05:48:28PM +0200, Eric Dumazet wrote:

dentry cache uses sophisticated RCU technology (and prefetching if available) but touches 2 cache lines per dentry during hlist lookup.

This patch moves d_hash in the same cache line than d_parent and d_name fields so that :

1) One cache line is needed instead of two.
2) the hlist_for_each_rcu() prefetching has a chance to bring all the needed data in advance, not only the part that includes d_hash.next.

I also changed one old comment that was wrong for 64bits.

A further optimisation would be to separate dentry in two parts, one that is mostly read, and one writen (d_count/d_lock) to avoid false sharing on SMP/NUMA but this would need different field placement depending on 32bits or 64bits platform.


Do you have performance numbers that show the benefits ? In the
past, I did try some optimizations like this but found no demonstrable
benefits. If it ain't broken .....


You mean... a microbenchmark ? :)

Well, the dentry cache hash table is so large that it's difficult to see the benefit, because chain length average is < 1 on most setups.
I suspect the sizing strategy of this hash table was made a long time ago...
Reducing the number of cache lines touched per dentry in lookup could let us shrink the hash table and save some ram.

Facts now :
------------

On a 512MB ia32 machine, the default sizing of the hash table gives 131072 slots (0.5 MB of ram). So it's quite hard to populate the cache without eating all ram.

To see a benefit, I had to force a smaller hash size (appending in kernel params "dhash_entries=8191")

On this 512MB PIII 866MHz machine (L1_CACHE_SIZE=32), with 563887 files or directories the results are :

Remount the disk with noatime to make sure no disk access will be done

# mount -o remount /dev/hda2 / -o noatime

Populate the caches (dentry and disk buffers for directories)
# find / -name foofoo >/dev/null

# grep dentry_cache /proc/slabinfo
dentry_cache 131104 131370 136 29 1 : tunables 120 60 0 : slabdata 4530 4530 0

So average chain length is now 16

then time new searches using data in cache
time / -name foofoo >/dev/null

# time find / -name foofoo

real 0m3.077s
user 0m1.724s
sys 0m1.344s
# time find / -name foofoo

real 0m3.077s
user 0m1.728s
sys 0m1.348s
# time find / -name foofoo

real 0m3.084s
user 0m1.784s
sys 0m1.276s
# time find / -name foofoo

real 0m3.108s
user 0m1.716s
sys 0m1.348s
# time find / -name foofoo

real 0m3.061s
user 0m1.672s
sys 0m1.384s
# time find / -name foofoo

real 0m3.067s
user 0m1.644s
sys 0m1.424s
# time find / -name foofoo

real 0m3.061s
user 0m1.804s
sys 0m1.252s

kernel 2.6.13 build time
------------------------
mount -o remount /dev/hda2 /
make clean
time make bzImage
real 12m21.755s
user 11m5.070s
sys 1m0.756s


------------------------------------------------------------------

Results with a patched kernel


# time find / -name foofoo

real 0m3.017s
user 0m1.700s
sys 0m1.316s
# time find / -name foofoo

real 0m3.018s
user 0m1.592s
sys 0m1.424s
# time find / -name foofoo

real 0m3.017s
user 0m1.624s
sys 0m1.396s
# time find / -name foofoo

real 0m3.019s
user 0m1.680s
sys 0m1.340s
# time find / -name foofoo

real 0m3.021s
user 0m1.676s
sys 0m1.344s
# time find / -name foofoo

real 0m3.022s
user 0m1.644s
sys 0m1.376s
# time find / -name foofoo

real 0m3.021s
user 0m1.620s
sys 0m1.404s

kernel 2.6.13 build time
------------------------
mount -o remount /dev/hda2 /
make clean
time make bzImage
real 12m8.814s
user 11m2.129s
sys 0m52.187s

Conclusions:
-----------

1) There is a clear benefit, even without a microbenchmark :)

2) As sizeof(struct dentry)=136 on ia32 UP (140 on SMP), and dentry slab is not using SLAB_HWCACHE_ALIGN, I suspect lot of dentries are not aligned on a cache line boundary, so maybe reducing DNAME_INLINE_LEN_MIN (or increasing it) so that sizeof(struct dentry)%L1_CACHE_BYTES = {0 or (L1_CACHE_BYTES/2)} would even give better results.

Thank you

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/