Re: [PATCH 00/69] faster tree-based sysctl implementation

From: Eric W. Biederman
Date: Sun May 01 2011 - 23:06:57 EST


Lucian Adrian Grijincu <lucian.grijincu@xxxxxxxxx> writes:

> On Mon, May 2, 2011 at 12:49 AM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>>>> Since we are touching most if not all of the sysctl registrations this
>>>> would also be a good time to pass a path string instead of the weird
>>>> ctl_path data structure. ÂWe needed ctl_path when we had both binary
>>>> and proc paths to worry about but we no longer have that concern.
>>>
>>>
>>> I still find good use for it in the next patches (some optimisations).
>>> Getting rid of it makes some things more difficult:
>>> - I wouldn't like to parse strings into path components at registeration
>>
>> I don't expect '/' being more difficult to deal with than an array. ÂIn
>> general I expect a single string to be more space efficient and easier
>> for human comprehension.
>
>
> We also use the string from ctl_path as a name for the sysctl
> directory. We would need to either:
> * strdup part of the string for each directory, remember to kfree
> * replace '/' with '\0' in the given string (meaning it can't be put
> in a read-only zone)

If we are only registering leaves, we can just deal with the tail of the
path and point just past the final /. There should be no need to
duplicate anything.

> Also I make use of the ctl_path to add some optimisations that deal
> with the case when there are very many known-to-be-uniquely-named
> sub-directories like for /proc/sys/net/ipv4/conf/DEVICE. IXIACOM which
> sponsored this work has usecases where they need to create 10^3..10^5
> virtual network devices and these optimisations really add up for that
> many interfaces.

I am convinced the places where we have network devices in the path are
indeed the pain points for scaling.

My gut feel is that we should use a balanced binary tree instead of a
doubly linked list for the directories. The space cost of a tree
is just an extra color member instead of two pointers.

For 100000 entries your target of a binary tree should only be 17
entries tall. Maybe twice that for if the tree is an rbtree. 17 or
even 33 should be a small enough value log(N) to keep the cost from
being painful. And using a binary tree means fewer special cases
overall.


A binary tree is faster than your special case for lookup. Which means
it solves the case of actually using the sysctl entries as well as the
case for creating them.

Furthermore to we also need to change sysfs because it also has
directories that will contain all 100000 of the network devices,
and I don't expect simply skipping the duplicate check is going to
fly in sysfs.

We could do something besides a data structure without a logN
insert/remove/lookup cost complexity. But I think we need numbers
to show that won't scale. So far all we have are numbers that show
a linked list doesn't scale.

> For details about the optimisation see patches:
> 61/69 http://thread.gmane.org/gmane.linux.kernel/1133667/focus=1133694 and
> 62/69 http://thread.gmane.org/gmane.linux.kernel/1133667/focus=1133711
>
>
> I will make another function that would take a string, parse it up,
> create a ctl_path array and register it, but I'd really like to keep
> ctl_path both in the implementation and as a means to register a
> table.

Using a string path certainly isn't critical at this point. But so
far I don't see practical down sides.

>>> - users of the register_sysctl_paths would need to create strings with
>>> dynamic components (for example "net/conf/%s/" - where %s is a
>>> netdevice-name or "kernel/sched_domain/%s/%s" with cpu-name and
>>> domain-name).
>>
>> This is a good point.
>>
>> In the normal proc implementation this is solved by being able to
>> pass the equivalent of a ctl_table_header into the registration
>> function, which allows the use of relative paths in the registration
>> function.
>>
>> In the examples you have given relative paths should also work for
>> sysctl.
>
>
> Hmm, I don't think we're on the same channel here. I don't understand
> what you're trying to say
> - normal proc implementation?
> - the equivalent of a ctl_table_header?
> - relative paths?

I was looking at effectively other virtual filesystems that have
had similar problems and talking about other solutions used.

In particular I was referring to create_proc_entry. It takes
a path and an optional parent directory.

> I was saying that if we are to *replace* the ctl_path based mechanism
> with a string denoting the path, then some other registrants will need
> to allocate memory for those strings because the paths they register
> are computed at runtime. Then I gave two distinct examples where this
> is done. In both of those cases, ctl_path saves us from allocating a
> string before allocation, only to chop it then back to pieces in the
> __register function.

And I was saying if that string was treated as a relative path. We
could have:

struct ctl_table_header *register_sysctl_path(struct ctl_table_header *parent,
const char *path,
struct ctl_table *table);

The optional parent parameter would save us from the pain of having to
even place the sysctl entry in a ctl_path. __register_sysctl_paths
already has a very similar interface.

Eric


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/