Re: [RFC PATCH linux 2/2] fs/proc: use a hash table for the directory entries

From: Nicolas Dichtel
Date: Fri Oct 03 2014 - 09:09:37 EST


Le 02/10/2014 20:01, Eric W. Biederman a Ãcrit :
Nicolas Dichtel <nicolas.dichtel@xxxxxxxxx> writes:

From: Thierry Herbelot <thierry.herbelot@xxxxxxxxx>

The current implementation for the directories in /proc is using a single
linked list. This is slow when handling directories with large numbers of
entries (eg netdevice-related entries when lots of tunnels are opened).

This patch enables multiple linked lists. A hash based on the entry name is
used to select the linked list for one given entry.

The speed creation of netdevices is faster as shorter linked lists must be
scanned when adding a new netdevice.

Is the directory of primary concern /proc/net/dev/snmp6 ?
Yes.


Unless I have configured my networking stack weird by mistake that
is the only directory under /proc/net that grows when we add an
interface.

I just want to make certain I am seeing the same things that you are
seeing.

I feel silly for overlooking this directory when the rest of the
scalability work was done.

Here are some numbers:

dummy30000.batch contains 30 000 times 'link add type dummy'.

Before the patch:
time ip -b dummy30000.batch
real 2m32.221s
user 0m0.380s
sys 2m30.610s

After the patch:
time ip -b dummy30000.batch
real 1m57.190s
user 0m0.350s
sys 1m56.120s

The single 'subdir' list head is replaced by a subdir hash table. The subdir
hash buckets are only allocated for directories. The number of hash buckets
is a compile-time parameter.

That looks like a nice speed up. A couple of things.

With sysfs and sysctl when faced this class of challenge we used an
rbtree instead of a hash table. That should use less memory and scale
better.

I am concerned about a fixed sized hash table moving the location where
we fall off a cliff but not removing the cliff itself.

I suppose it would be possible to use the new fancy resizable hash
tables but previous work on sysctl and sysfs suggests that we don't look
up these entries sufficiently to require a hash table. We just need a
data structure that doesn't fall over at scale, and the rbtrees seem to
do that very nicely.
Ok, I will have a look at it.



Thank you,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/