Re: [PATCH] fs: Restore files_lock and set_fs_root exports

From: Paul E. McKenney
Date: Wed Jan 12 2005 - 21:55:30 EST


On Thu, Jan 06, 2005 at 05:01:19PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 06, 2005 at 09:24:17PM +0000, Al Viro wrote:
> > "Use recursive bindings instead of trying to take over the entire mount tree
> > and mirroring it within your fs code. And do that explicitly from userland".
>
> Thank you for the pointer! By this, you mean do mount operations in
> conjunction with namespaces, right?
>
> I will follow up with more detail as I learn more. The current issue
> seems to be with removeable devices. Their users want to be accessing
> a particular version, but still see a memory stick that was subsequently
> mounted outside of the view. Straightforward use of mounts and namespaces
> would prevent the memory stick from being visible to users that were
> already in view.

OK, after much thrashing, here is what the ClearCase users need.
Sorry for the length -- the first few paragraphs gives the flavor of
it and the rest goes into more detail with examples.

Thoughts?

Thanx, Paul

------------------------------------------------------------------------

ClearCase provides a filesystem view of revision-control repositories.
ClearCase users can specify the desired revision directly in the pathname
("view extended naming", for example "/views/v1.1/vob/foo/bar.c")
or they can associate a particular revision with a process and its
descendants ("setview context naming", for example "/vob/foo/bar.c").
Specifying the revision in the pathname is useful for diffing and such,
and associating a revision with a process is useful for builds and
testing using standard tools, sort of like chroot jails.

ClearCase users need the root filesystem to be overlaid on each view,
so that "/etc/passwd" and "/views/v1.1/etc/passwd" reference the same
file.

The current hope is that adding (a) shared and asymmetrically shared
subtrees between namespaces/locations in the same namespace, (b) stackable
LSM modules, and (c) dynamic recursive union mount would enable Linux
to provide this in a technically sound manner. [But this is not clear
to me yet.]

More details on what ClearCase users want to see follow.

1. "View extended naming", where the revision is specified by
the pathname.

a. Users explicitly specify that they want to see a specific
revision (e.g., v1.1) using the ClearCase "startview"
command. This revision is then visible to all processes.

b. The pathname fully specifies the revision and file, e.g.,
"/views/v1.1/vob/foo/bar.c", where the "startview"
command has initialized an MVFS view of the v1.1 version on
"/views/v1.1/vob/foo". ("vob" stands for "versioned object
base".)

c. Note that the entire filesystem is also visible within
the view, so that /etc/passwd" is an synonym for
"/views/v1.1/etc/passwd". This can be thought of
as (currently mythical) "dynamic union mount rbind".
See #4 below for more detail.

2. "Setview context naming", where revision is -not- specified
by the pathname.

a. A particular process designates a particular revision
(e.g., "v1.1") using a ClearCase "setview" command.
This designation applies to both the process and its
future descendants (any descendants already created are
unaffected. The "setview" command does an implicit
"startview" if needed.

b. That process and any of its descendants will then see the
specified revision via "/vob/foo/bar.c", where an
MVFS filesystem has been mounted on "/vob/foo".

c. Changes made via the "setview context naming" paths
are visible in "view extended naming" paths, with
the same restrictions as called out in #1c above.

d. A given process can mix and match "view extended naming"
and "setview context naming" references, so that
both "/views/v1.1/vob/foo/bar.c" and "/vob/foo/bar.c"
may be used to refer to the same file, but in that case
"/views/v1.2/vob/foo/bar.c" might reference a different
file. These synonyms for the same file need to have the
properties of the mythical "dynamic union mount rbind"
described in #4 below.

3. ClearCase users are -not- permitted to mount over a file or
directory within an MVFS filesystem. Therefore, if a user
mounts a memory stick over (say) "/vob/foo/mnt", the results
are undefined.

4. ClearCase users need to be able to access multiple repositories
simultaneously (e.g., for related projects). So there might
be a "/vob/foo" and "/vob/oof" project visible simultaneously.

There are some specialized but important testing situations
that require separate top-level directories, e.g., "/vob"
and "/tmp/vobtest".

5. The ClearCase users need the mythical "dynamic union mount rbind"
to set things up so that any access to a non-MVFS file made from
within a view gives the same results as a direct access to that
file would give if no MVFS filesystems were mounted. For example,
any access to "/views/v1.1/etc/passwd" must give exactly the same
results as the same access to "/etc/passwd" would if there were
no MVFS filesystems mounted.

The "dynamic union mount rbind" therefore needs to have the
following properties:

a. The underlying inodes retain the same link count
no matter how many ClearCase views they are
accessible from. So, if the link count of /etc/passwd
is initially 1, it remains 1 even though it is
accessible via "/etc/passwd", "/views/v1.1/etc/passwd",
"/views/v1.2/etc/passwd", and so on. Similarly,
if the link count of foo/bar.c in the v1.1 revision
is initially 1, it remains 1 even when it is
accessible both via "/vob/foo/bar.c" and
"/views/v1.1/vob/foo/bar.c".

Hard-link tricks break user scripts and programs.

b. The types of the underlying files remain the same
regardless of how many views they are visible in
and regardless of how they are accessed. So, if
/etc/passwd is a normal file, it will appear to be
a normal file when accessed via "/etc/passwd",
"/views/v1.1/etc/passwd", "/views/v1.2/etc/passwd",
and so on. Similarly, if foo/bar.c is a normal file,
is will appear to be a normal file when accessed
via "/vob/foo/bar.c" and "/views/v1.1/vob/foo/bar.c".

Symbolic-link tricks break user scripts and Makefiles.

c. The types of the underlying filesystem remain the
same regardless of which view is used. This is
required by user scripts that use things like "df -k".

d. Any changes to any non-MVFS filesystem are immediately
visible in the view. Some examples:

i. A mount in the root filesystem, e.g.,

mount /dev/cdrom /mnt/cdrom

must result in the CDROM being visible in both
"/mnt/cdrom" and in "/views/v1.1/mnt/cdrom".
Ditto for NFS mounts and automounting.

In this case "immediately" means "before the
mount command's process exits". This need for
immediacy applies both to explicit mounts and
to autofs-induced mounts. Ditto for unmounts.

ii. Dynamic filesystems such as /proc must
exhibit their dynamic nature whether accessed
via "/proc", "/views/v1.1/proc", or
"/views/v1.2/proc".

Again, changes must be immediately visible in
all views, where "immediately" means that there
should be no "time warps". For example, any
monotonic counters must be seen as monotonic,
despite successive reads happening from different
views.

e. Any changes made via "view extended naming" or via
"setview context naming" are visible in the other
view. For example, the file created by

touch /vob/foo/bar.c.new

would also be immediately visible as
"/views/v1.1/vob/foo/bar.c.new" and vice versa.

f. Symbolic links from inside an MVFS filesystem act as
expected, with the expected advice that views use
relative rather than absolute symlinks for targets
within the view.

However, absolute symbolic links from inside an MVFS
filesystem to files that are outside an MVFS filesystem
work as expected.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/