/proc object tree: /proc/binary not needed, just tree lookup()

Clayton Weaver (cgweav@eskimo.com)
Sun, 1 Nov 1998 06:13:41 -0800 (PST)


Update:

A few modifications to the first sketch of the MIB-like object tree
idea for /proc. It doesn't need to be a visible file in
/proc/binary, and in fact we don't /proc/binary at all. The
endian-version file can be /proc/meta.

Representing the links between levels in the object tree
is normal pointer indirection and array indexes in memory,
but in a file it would be an offset-length tuple.
The offset-length tuples can't stay hard-coded in user-space
software, because of variable length strings, variable sets
of enabled options, added tree branches and leaf data items
in later versions, and so on.

The mapping callback from the object tree to a traditional file
system representation needs to use abstraction anyway, it can't expect
fixed offsets except in localized parts of the object tree that happen
not to have any variable length data. Relying on the actual memory
layout of the object tree is not robust, compared to using index
numbers to descend the tree to a particular set of leaf data.
If the callback routine supplies an index number sequence to an
internal tree-walk lookup (like snmp's 1.3.6.1.... indexes), then
the memory layout of the object tree only matters when the kernel
is building it. It doesn't matter to to_file mapping and it doesn't matter
to user-space software.

If we need this kind of multi-level abstract indexed retrieval anyway
for the internal convenience of the objects-to-files mapping, why
not export that indexed lookup as a function call to software like top,
ps, and so on?

It would not be the same function exactly in both cases, because admin
tools don't need the /path/ components and inodes that the filesystem view
of /proc/* provides. They only need the data they asked for. But the tree
traversal by numbered index can be shared.

These tools don't need a view of the tree itself at runtime, they
just need the function to get tree-indexed data from it, so there
isn't any good reason to make the object tree itself visible as a
file.

And we only need one lookup function for any tool wanting any data
in the tree. Different data just has different object node numbers.

There are still a few complexities to account for. How does a
caller ask for more than one leaf value at the same time,
whether at different levels or at a single tree level? We could
make object numbers 1-indexed and make a zero value a wild
card, meaning "all objects at this level". This complicates
the return data type, because the returned block of values
has to tell the caller how many there were. If a caller is
just passing an array of index sequences for some arbitrary
collection of leaf values, the caller knows how many there are
so it should be able to figure out from the length-value
tuples where it's data ends. But when a wild card is passed,
how many data values were returned? The return type of
the tree lookup syscall has to convey this information.

It's only complex because we have to cover both cases with
a single return type definition (or have multiple retrieval
functions, "single leaf", "multiple leaves", "multiple leaves
with wildcards", etc, which seems redundant to me compared
to one calling interface with a return type where some
parts of it aren't needed in the single leaf and no-wildcards cases).

Finally, there are files like /proc/cmdline. This file is a bit more
complicated. It's the one obvious case that would benefit from full BER
ASN.1, where the length and value are prefaced with the data type code,
because it can have an arbitrary mix of character strings and numeric
values in it that aren't determined until boot time. It's not like the
representation of an array of interfaces of a particular type,
where the number of them changes but their leaf structure is common
to each. You can't predefine what leaves are going to be in an object
node for kernel options at boot.

Perhaps enable BER (data type notation) for just those objects like
cmdline that need it?

Or just make those object strings, as in the current implementation,
and extracting numbers from it is some user-space software's problem?
This is perhaps not too painful for /proc/cmdline, because it isn't
dynamically updated, it's value is encoded once and stays the same
until a reboot. So there isn't reason why user software has to
hammer on it, it can read cmdline once if it wants to know what
was passed to the kernel at boot, and just do the strtoul() that's
inconvenient for software using dynamically updated numeric
/proc data.

Finally, a last question: is the (apparently) despised strtoul()
conversion on /proc file data really more overhead than a function call to
retrieve indexed object tree data? There is a context switch either way.
Is composing the length-sequence data structures and decomposing a complex
return struct really less user-space overhead than calling strtoul() on a
string of digits at some offset into an mmap()ed /proc file? Does the
difference get lost in the noise (overhead of context switch)? Is it
faster for the kernel itself to handle /proc data as an object tree mapped
to a filesystem on demand than the current in-core filesystem?

This is some work to define all of the MIB-type objects for a /proc object
tree, to code the kernel's update of tree data values, tree lookup
interface, the virtual file mapping callback interface for traditional
/proc files, and so on. Is it going to make a difference at runtime? It
would be a bit silly to radically change the whole /proc filesystem just
for the fun of it only to "use more elegant data structures" with no
runtime performance benefit. One good thing about this kind of structure
is that extending it to accomodate new data in /proc/ is painless: "add
node with higher index number than all existing nodes at this level". But
then, "add new file to /proc/subdir/" isn't exactly a "recompile the
world" change for user space tools, either.

Regards, Clayton Weaver <mailto:cgweav@eskimo.com> (Seattle)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/