Re: [PATCH 14/39] union-mount: Union mounts documentation

From: Valerie Aurora
Date: Mon Aug 23 2010 - 20:05:41 EST


On Thu, Aug 19, 2010 at 10:34:59AM +0900, J. R. Okajima wrote:
>
> Valerie Aurora:
> > According Al Viro, unionfs has some fundamental architectural problems
> > that prevents it from being correct and leads to crashes:
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/0802.0/0839.html
> >
> > The main question for me is whether aufs has fixed these problems. If
> > it hasn't, then it can't be bug-free.
>
> Although I don't understand fully your question, aufs actually verifies
> the parent-child relationship after lock_rename() on the writable layer.
> Such verification is done in other operations too.
> And aufs provides three options to specify the level of
> verification. When the highest (most strict) level is given, aufs_rename
> lookup again after lock_rename() and compares the got parent and the
> given (cached) parent.
> Does this answer your question correctly?

First, my theory when writing any file system code is that whenever Al
Viro says, "You can deadlock easily" or "It violates the locking
rules" that I have to understand the problem and fix it. I understand
why union mounts doesn't have the problems unionfs had when Al wrote
this email (because lower layers are not writable). But since aufs
allows directories on lower layers to be renamed in the way that
creates the problems Al describes, I assume it has this same problem
until the author understands the unionfs problem and can describe why
aufs didn't inherit it (or fixed it, or whatever).

Second, why isn't the most strict level of lookup the only option? It
seems like anything else is a bug.

Third, you have this odd circular inheritance problem that comes from
moving a child directory on the lower layer to the path of its parent,
and vice versa. From Al's email:

> If you allow a mix of old and new mappings, you can easily run into the
> situations when at some moment X1 covers Y1, X2 covers Y2, X2 is a descendent
> of X1 and Y1 is a descendent of Y2. You *really* don't want to go there -
> if nothing else, defining behaviour of copyup in face of that insanity
> will be very painful.

I understand the circular inheritance problem but find this hard to
explain better than Al does above. But here's an example of how you
get there:

Start with parent_dir1/child_dir1 covering parent_dir2/child_dir2
thread 1 does a union lookup and gets:
parent_dir1 covering parent_dir2
child_dir1 covering child_dir2
parent_dir1 parent of child_dir1
parent_dir2 parent of child_dir2
thread 2 swaps parent_dir2 with child_dir2 (using rename and a tmp dir)
now lower fs looks like: child_dir2/parent_dir2

Who inherits what? Does thread 1 see parent_dir2 as a descendant of
child_dir2 which is a descendant of parent_dir2 through the union with
parent_dir1? Can you sanely define the behavior here?

Fourth, you have a potential deadlock now. Say thread 1 is operating
with the belief that parent_dir1/child_dir1 covers
parent_dir2/child_dir2. parent_dir2/child_dir2 gets renamed such that
the two switch places, as described above. And thread 2 is directly
accessing the lower file system, now with child_dir2/parent_dir2. The
locking order for thread 1 is:

parent_dir2 -> parent_dir1 -> child_dir1 -> child_dir2

For thread 2, it is:

child_dir2 -> parent_dir2

So if thread 1 gets a lock on parent_dir2, and then thread 2 gets a
lock on child_dir2, they will deadlock. In general, this situation
violates the fundamental assumptions of correct directory locking,
described in Documentation/filesystems/directory-locking.

That's my attempt to explain Al's email, anyway. :) All errors are my
own.

> > Think about the case of two different RPM package database files. One
> > contains the info from newly installed packages on the top layer file
> > system. The lower layer contains info from packages newly installed
> > on the lower file system. You don't want either file; you want the
> > merged packaged database showing the info for all packages installed
> > on both layers. Any practical file system based system is only going
> > to be able to pick one file or the other, and it's going to be wrong
> > in some cases.
>
> Let me make sure.
> Do you mean something like this?
> - a user makes a union
> - fileA exists on the lower layer but upper
> - modify fileA in the union
> --> the file is copied-up and updated on the upper layer.
> - modify fileA on the lower layer directly (by-passing union)
> --> the file on the lower is updated.
> - and the user will not see the uptodate fileA in the union, lack of the
> modification made on the lower directly.
>
> Then I'd say it is an expected behaviour. Simply the upper file hides
> the lower.

I am not arguing with you and I agree that this is the expected
behavior. I wrote about this case just to show that there is a case
in which what the user "wants" in an upgrade situation is impossible
to do automatically in the file system. So you need to have a smart
tool to do an upgrade of the lower layer file system. And I argue
that smart tool should deal with all cases of a file copied up to the
topmost file system that covers an updated file on the lower file
system, instead of putting this policy decision into the VFS.

-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/