Re: [PATCH 0/2] overlayfs: C/R enhancements
From: Amir Goldstein
Date: Fri Jun 05 2020 - 10:36:25 EST
> > While at it, you copy pasted the text:
> > For more information, see Documentation/filesystems/overlayfs.txt
> > but there is no more information to be found.
>
> As far as I know documentation patches must be send to another mailing list.
> Of course I have plan to add information to overlayfs documentation about new feature.
>
Please send documentation patch together with the series
to this list. its fine to wait with that until the concept is approved though.
> > > > And if this works for you, you don't have to export the layers ovl_fh in
> > > > /proc/mounts, you can export them in numerous other ways.
> > > > One way from the top of my head, getxattr on overlay root dir.
> > > > "trusted.overlay" xattr is anyway a reserved prefix, so "trusted.overlay.layers"
> > > > for example could work.
> > >
> > > Thanks xattr might be a good option, but still don't forget about (a)
> > > and (b), users like to know all information about mount from
> > > /proc/pid/mountinfo.
> > >
> >
> > Let's stick to your use cases requirements. If you have other use cases
> > for this functionality lay them out explicitly.
>
> Requirements is very simple, at "dump stage" we need to save all overlayfs mount options
> sufficient to fully reconstruct overlayfs mount state on "restore stage". We already
> have proof of concept implementation of Docker overlayfs mounts when docker is running in
> OpenVZ container. In this case we fully dump all tree of mounts and all mount namespaces.
> CRIU mounts restore procedure at first reconstruct mount tree in special separate subtrees
> called "yards", then when all mounts is reconstructed we do "pivot_root" syscall. And
> with overlayfs it was a problem, because we mounted overlayfs with lowerdir,workdir,upperdir
> paths with mount namespace "yard" path prefix, and after restore in mount options user may see
> that lowerdir,workdir,upperdir paths were changed... It's a problem. Also it makes second C/R
> procedure is impossible, because after first C/R lowerdir,workdir,upperdir paths is invalidated
> after pivot_root.
>
> Example for Docker (after first C/R procedure):
>
> options lowerdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/l/4BLZ4WH6GZIVKJE5QF62QUUKVZ:/var/lib/docker/overlay2/l/7FYRGAXT35JMKTXCHDNCQO3HKT,upperdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/30aa26fb5e5671fc0126f2fc0e84cc740ce6bf06ca6ad4ac877a3c60f5aceaf1/diff,workdir=/tmp/.criu.mntns.owMo9C/9-0000000000//var/lib/docker/overlay2/30aa26fb5e5671fc0126f2fc0e84cc740ce6bf06ca6ad4ac877a3c60f5aceaf1/work
>
That reminds me.
I've read somewhere that thoses symlinks l/4BLZ4WH6GZIVKJE5QF62QUUKVZ
are meant to shorten the mount option string, because the mount
options are limited by
page size and with many lower layers limitation can reach.
That is one of the reasons that new mount API was created for (i.e. fsconfig()).
I wonder if /proc/mounts also has a similar limitation on options size.
I also wonder why docker doesn't chdir into /var/lib/docker/overlay2/
before mounting overlay and use relative paths, though that would have
been worse for CRIU.
So at least for the docker use case CRIU knows very well where the
underlying filesytem is mounted (/var/lib/docker/overlay2/ or above).
So if you got any API from overlayfs something like:
getxattr("/var/lib/docker/overlay2/XYZ/merged",
"trusted.overlay.layers.0.fh",..)
which reads the ovl_fh encoding of layer 0 (upper) rootdir, CRIU
can verify that uuid matches the filesystem mounted at /var/vol/docker/overlay2/
and then call open_by_handle_at() to open fd and resolve it to a path
under /var/vol/docker/overlay2.
I don't know if that provides what CRIU needs, but it would be no more
than a few lines of code in overlayfs:
if (i < ofs->numlayer)
fh = ovl_encode_real_fh(ofs->layers[i].mnt->mnt_root, ...
Thanks,
Amir.