Re: Union mount and lockdep design issues
From: Michal Suchanek
Date: Mon Jul 11 2011 - 09:36:47 EST
On 11 July 2011 14:00, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote:
>> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>
>> > Also, why would you want to have a class per sb-instance? From last
>> > talking to David, he said there could only ever be 2 filesystems
>> > involved in this, the top and bottom, and it is determined on (union)
>> > mount time which is which.
>>
>> There can be more than 2 - one upperfs (the actual union) and many lowerfs -
>> though I think only one lowerfs is accessed at a time.
>
> Right, however I understood from our earlier discussion that the vfs
> would only ever try to lock 2 filesystems at a time, the top and one
> lower.
This is true from local point of view. However, it is technically
possible to use overlayfs as the upper layer of another overlayfs
which allows layering multiple readonly "branches" into a single
overlay. Since the vfs will lock the "union" and one (or possibly
both) of its branches and one of the branches may be itself an union
you can get arbitrary depth (which is currently limited by a constant
in the code to cut recursion depth and stack usage).
>
>> However, I was wondering that if in the future it could be possible to make it
>> possible to union over a union. ÂI think that conceptually it shouldn't be that
>> hard, but definitely lockdep presents a barrier unless the top union goes
>> behind the scenes of the lower union and interacts with its lowerfs's directly.
>
> Aside from lockdep, how many fs locks will you nest and how will you
> enforce the filesystem relations remain a DAG? But yeah, that'll be a
> tad harder to do. One of the ways we could tackle that is create a lock
> class per depth, and statically create say 16 of those, allowing for a
> DAG with span of 16.
This would be consistent with the limit on nesting imposed by stack
size but there should be probably some mechanism to infer one of the
numbers from the other.
>
>> > I'm also assuming that once a filesystem is part of a union mount, it
>> > cannot be accessed from outside of said union (can it? can the botton be
>> > itself be the top layer of another union?)
>>
>> Not at the moment; the hard read-only requirements on the lowerfs versus the
>> writeability requirements of the upperfs (you can't enter a directory that you
>> can't mirror up) prevent it.
>>
>> However, at some point I'd be interested in trying to make it possible to union
>> over a writeable filesystem. ÂThis is pretty much a requirement for unioning
>> over NFS (as you can't tell the server to make the volume you're mounting hard
>> read-only).
I don't think that there is a hard readonly requirement. As far s a I
understand the current status is that "The filesystem should not be
modified directly" and "doing so will lead to undefined behaviour but
no crash or lockup". Unless there are bugs, obviously.
>> > Also, in what state are the filesystems on construction of the union? ÂAre
>> > they already fully formed and populated (do inodes already exist?)
>>
>> The lower filesystems must be fully formed and, at present, may not be modified
>> whilst in the union.
>>
>> The upper filesystem can be empty or filled by a previous union. ÂIn fact,
>> there's nothing stopping the upper fs being an ordinary fs that's then used as
>> the upper layer in a union, but I'm not sure you can then access the lower
>> echelons as the directories don't contain fallthru entries.
As overlayfs does not have explicit fallthru entries layering any two
fully formed filesystems gives an union of the two. You will only lose
access to entries that were previously deleted in an union and have a
whiteout entry in the upper layer.
Unionmount makes any directories which were touched in an upper union
layer opaque and requires explicit fallthru entries to access the
lower layer. A normal filesystem does not have opaque directories and
allows access to the lower layer when it is used as the top layer for
the first time. Traversing the union will make it opaque, though.
Thanks
Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/