Re: [POC/RFC PATCH] overlayfs: constant inode numbers
From: Amir Goldstein
Date: Wed Nov 30 2016 - 10:05:39 EST
On Tue, Nov 29, 2016 at 11:49 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> On Tue, Nov 29, 2016 at 1:03 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
...
> I meant that we can unify OVL_XATTR_INO with "redirect/fh"
> functionality and get something good out of it.
>
>> Perhaps you meant for non-dir:
>>
>> 5. If redirect_dir=fh, *propagate* lowest-handle on non-dir copy up
>> 6. In ovl_lookup() of non-dir, decode lowest-handle to set oe->ino
>
> Yes.
>
> OVL_XATTR_FH would be safe to ignore, so this is back and forward
> compatible.. And the cost is probably not prohitive, since copy ups
> should be relatively rare.
>
> After a backup + restore it is not expected that we get back the old
> inode numbers so it's fine to ignore the stale file handles.
>
FYI, there are 2 interesting corner case of "semi stale" handles:
- Copy of layers to same fs (without deleting old layers)
- Old layers are deleted but an old deleted file is still open
I have handled both these cases in the last version of redirect_fh
that I pushed yesterday, but not 100% sure that I handled them
correctly.
Anyway, I will get to work on adjusting redirect_fh for use by
stable inodes.
> The following issues are left:
>
> - performance of readdir;
Here is one very simple optimization for WIP:
@@ -157,6 +157,8 @@ static int ovl_fill_lowest(struct ovl_readdir_data *rdd,
list_move_tail(&p->l_node, &rdd->middle);
} else {
p = ovl_cache_entry_new(rdd, name, namelen, ino, d_type);
+ if (p)
+ p->ino = ino;
For non-lowets entry, we can provide mount option 'readdir_ino'.
With readdir_ino, readdir pays a penalty of getxattr for any non-lowest
entry (either OVL_XATTR_FH or OVL_XATTR_INO).
Without readdir_ino, readdir will get d_ino = 0, in which case, at least
`find <path> --inum=<n>` does the right thing (fallback to fstat for
this dirent).
> - what to do if not all layers are on the same fs;
Same as what I did for redirect_fh - turn the feature off.
We can also export this state in /proc/mounts options and maybe allow to
explicitly turn off stable inodes, but I don't think that we should, because
there shouldn't be a program which relies on inode numbers NOT being stable.
> - hard link copy ups.
>
I'll start by setting up a TODO Wiki page and writing xfstests for all those
issues. Maybe even track them on github..
Amir.