[RFC] Union mount readdir support in glibc

From: Bharata B Rao
Date: Tue Mar 11 2008 - 01:56:06 EST


Extending glibc readdir() to support union mounted directories
==============================================================

Any filesystem namespace unification solution needs to merge the
directory contents by eliminating duplicate(1) and whiteout(2) directory
entries. We (as part of Union Mount(3) effort) have been trying to do this
inside readdir/getdents system call in the kernel(4), but have found the
approach to be not very feasible. The effort of maintaining a cache of dirents
within the kernel turned out to be expensive on memory and cpu.
In this approach, we discuss about the feasibility of achieving
duplicate elimination and whiteout supression by extending the readdir
implementation of glibc.

Before actually going ahead with the implementation, this is an effort
to build consensus about the approach within the glibc community.

readdir/getdents in kernel
--------------------------
readdir/getdents system call on a union mounted directory would just return
all the entries of the union starting from the topmost directory.

readdir in glibc
----------------
glibc readdir would maintain a cache of dirents returned by readdir/getdents
system call to perform duplicate elimination and whiteout supression.

glibc readdir would need the following information from the kernel:

- Indication that this directory is a union mounted directory.
With this indication, glibc need not build and maintain
dirent cache for normal non-union directories. Need to check if glibc
can live without such indication, which means that glibc readdir would
build dirent cache for _all_ directories. Need to check the overhead
incurred by doing this.

- Indication that kernel has finished returning the dirents from the
topmost directory of the union.
This would tell glibc readdir to start eliminating the duplicates
for the subsequent dirents returned from the bottom directories of the union.
Can kernel pass a special dirent like a "." whiteout to indicate this ?
Any other methods ? Or should glibc not bother about dirents from
different directories and start duplicate elimination right from the
beginning ? Ofcourse this would result in some extra comparisions
(comparisions against d_name to determine the duplicates) for the dirents
from topmost directory.

- Indication that this directory entry is a whiteout entry.
DT_WHT(which already exists in linux) value for d_type will be
used to identify whiteouts.

- Indication that one of the directories of the union got modified(removal
or addition of a directory entry) during readdir.
Ideally any change to the directories should result in
dirent cache invalidation. But, should we be even bothered about this ?
Should we just ignore directory modifications happening between glibc opendir
and closedir and just live with whatever entries the kernel returns ?

Downsides of this approach

- Applications using readdir/getdents system calls directly would have
to handle duplicate elimination and whiteout supression themselves.
- Applications linked statically with glibc would have problems with
union mounted directories.

Directory seek support
----------------------
Can we just don't support seek on a union mounted directory ?

But if directory seek support on a union mounted directory is a necessity,
my current thinking is that we can define the seek behaviour on the
dirent cache maintained by glibc readdir. With this, glibc would be
defining the seek behaviour for union mounted directories instead of
getting it from kernel. But again this would work only for applications
that make use of seekdir() and friends and not for applications that
use lseek(2) directly. I am yet to look at glibc seekdir in detail to
ascertain if the above approach is doable.

Regards,
Bharata.
--

(1) Duplicates: With filesystem namespace unification, it is possible to
have same-named entries in multiple layers/branches/directories of the union.
In such cases only the entry from the topmost layer/branch/directory is
made available.

(2) Whiteouts: Whiteouts are place holders for the entries that don't exist
logically in the union. Typically, deletion of an entry present only in
the (read-only) lower layer of a union would result in a whiteout getting
created in the topmost layer of the union. Whiteout lookup would return
-ENOENT.

(3) Union Mount: http://lkml.org/lkml/2007/7/30/193

(4) Directory lising approaches in the kernel:
http://lkml.org/lkml/2007/12/5/147
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/