Re: [PATCH v2] ovl: introduce new "index=nouuid" option for inodes index feature

From: Amir Goldstein
Date: Thu Sep 24 2020 - 06:19:58 EST


On Thu, Sep 24, 2020 at 11:40 AM Pavel Tikhomirov
<ptikhomirov@xxxxxxxxxxxxx> wrote:
>
>
>
> On 9/23/20 7:09 PM, Amir Goldstein wrote:
> > On Wed, Sep 23, 2020 at 6:23 PM Pavel Tikhomirov
> > <ptikhomirov@xxxxxxxxxxxxx> wrote:
> >>
> >> This relaxes uuid checks for overlay index feature. It is only possible
> >> in case there is only one filesystem for all the work/upper/lower
> >> directories and bare file handles from this backing filesystem are uniq.
> >> In case we have multiple filesystems here just fall back to normal
> >> "index=on".
> >>
> >> This is needed when overlayfs is/was mounted in a container with
> >> index enabled (e.g.: to be able to resolve inotify watch file handles on
> >> it to paths in CRIU), and this container is copied and started alongside
> >> with the original one. This way the "copy" container can't have the same
> >> uuid on the superblock and mounting the overlayfs from it later would
> >> fail.
> >>
> >> That is an example of the problem on top of loop+ext4:
> >>
> >> dd if=/dev/zero of=loopbackfile.img bs=100M count=10
> >> losetup -fP loopbackfile.img
> >> losetup -a
> >> #/dev/loop0: [64768]:35 (/loop-test/loopbackfile.img)
> >> mkfs.ext4 loopbackfile.img
> >> mkdir loop-mp
> >> mount -o loop /dev/loop0 loop-mp
> >> mkdir loop-mp/{lower,upper,work,merged}
> >> mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\
> >> upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
> >> umount loop-mp/merged
> >> umount loop-mp
> >> e2fsck -f /dev/loop0
> >> tune2fs -U random /dev/loop0
> >>
> >> mount -o loop /dev/loop0 loop-mp
> >> mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\
> >> upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
> >> #mount: /loop-test/loop-mp/merged:
> >> #mount(2) system call failed: Stale file handle.
> >>
> >> mount -t overlay overlay -oindex=nouuid,lowerdir=loop-mp/lower,\
> >> upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged
> >>
> >> If you just change the uuid of the backing filesystem, overlay is not
> >> mounting any more. In Virtuozzo we copy container disks (ploops) when
> >> crate the copy of container and we require fs uuid to be uniq for a new
> >> container.
> >>
> >> v2: in v1 I missed actual uuid check skip - add it
> >>
> >> CC: Amir Goldstein <amir73il@xxxxxxxxx>
> >> CC: Vivek Goyal <vgoyal@xxxxxxxxxx>
> >> CC: Miklos Szeredi <miklos@xxxxxxxxxx>
> >> CC: linux-unionfs@xxxxxxxxxxxxxxx
> >> CC: linux-kernel@xxxxxxxxxxxxxxx
> >> Signed-off-by: Pavel Tikhomirov <ptikhomirov@xxxxxxxxxxxxx>
> >> ---
> >
> > Look reasonable, but you need to rebase over
> > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
> >
> > Which makes a lot of your work unneeded.
> > ofs is propagated to most of the relevant helpers
> > and you should propagate it down to ovl_decode_real_fh().
>
> Thanks! Will do.
>
> >
> > Some minor comments below...
> >
> >> fs/overlayfs/Kconfig | 16 +++++++++++
> >> fs/overlayfs/export.c | 6 ++--
> >> fs/overlayfs/namei.c | 35 +++++++++++++++--------
> >> fs/overlayfs/overlayfs.h | 23 +++++++++++----
> >> fs/overlayfs/ovl_entry.h | 2 +-
> >> fs/overlayfs/super.c | 61 +++++++++++++++++++++++++++++-----------
> >> 6 files changed, 106 insertions(+), 37 deletions(-)
> >>
> >> diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
> >> index dd188c7996b3..b00fd44006f9 100644
> >> --- a/fs/overlayfs/Kconfig
> >> +++ b/fs/overlayfs/Kconfig
> >> @@ -61,6 +61,22 @@ config OVERLAY_FS_INDEX
> >>
> >> If unsure, say N.
> >>
> >> +config OVERLAY_FS_INDEX_NOUUID
> >> + bool "Overlayfs: relax uuid checks of inodes index feature"
> >> + depends on OVERLAY_FS
> >> + depends on OVERLAY_FS_INDEX
> >> + help
> >> + If this config option is enabled then overlay will skip uuid checks
> >> + for index lower to upper inode map, this only can be done if all
> >> + upper and lower directories are on the same filesystem where basic
> >> + fhandles are uniq.
> >> +
> >> + It is needed to overcome possible change of uuid on superblock of the
> >> + backing filesystem, e.g. when you copied the virtual disk and mount
> >> + both the copy of the disk and the original one at the same time.
> >> +
> >> + If unsure, say N.
> >> +
> >
> > Please do not add a config option for this.
> > Isn't a mount option sufficient for your needs?
>
> Users inside Virtuozzo container can mount overlayfs inside the CT (we
> assume that they do "regular" mounts without any "index=" option as
> docker does) so we wan't to setup the default in kernel config, so that
> all "regular" mounts of the user become "index=nouuid" automatically,
> and thus we would be able to both migrate (CRIU inotify resolution by
> fhandle on dump) and copy (copy disk with uuid change) the container
> without problem.
>

That is a problem.
So far, mount options always override module and kernel config options.
So if mount option says inodex=on explicitly, it should be index=on.

The only way I see for you to tackle this is if nouuid option is independent
of index option, e.g. index=on,anon_uuid (or uuid=off).

But if kernel config has a way to turn this on, then module param and
mount option should have a way to turn it off.

[...]
> >> @@ -1889,9 +1911,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
> >> if (err)
> >> goto out_free_oe;
> >>
> >> + if (ofs->config.index == OVL_INDEX_NOUUID && ofs->numfs > 1) {
> >> + pr_warn("The index=nouuid requires a single fs for lower and upper, falling back to index=on.\n");
> >> + ofs->config.index = OVL_INDEX_ON;
> >> + }
> >> +
> >
> > It's too late for this fallback now, you already did relaxed ovl_verify_origin()
> > and now we will continue as if all is ok.
> > Please fail the mount in this case.
> >
> > I don't think that users that specifically requested index=nouuid would care to
> > fallback to index=on.
>
> No, it's we who will force users to switch to index=nouuid in our
> Virtuozzo case through kernel config default, so probably having a
> fallback is a good thing, as users will be able to use their overlay at
> least until we "copy" their container.
>
> Maybe I can just move my check before ovl_get_indexdir (as far as I can
> see it is the only place from ovl_fill_super where we reach
> ovl_verify_fh or ovl_decode_real_fh) and after ovl_get_lowerstack where
> numfs is calculated. - Will do.

Yes, that would be fine techicaly.
I am not sure if the fallback is helpful though.

Anyway, you would need to get approval from Miklos
on the option itself and how it behaves.

Thanks,
Amir.