Re: [PATCH V8 1/3] fuse: Definitions and ioctl() for passthrough

From: Alessio Balsini
Date: Tue Sep 22 2020 - 08:15:13 EST


On Fri, Sep 18, 2020 at 10:59:16PM +0300, Amir Goldstein wrote:
> On Fri, Sep 18, 2020 at 7:33 PM Alessio Balsini <balsini@xxxxxxxxxxx> wrote:
> [...]
> > > ... for example:
> > >
> > > if (fs_stack_depth && passthrough_sb->s_type == fuse_fs_type) {
> > > pr_err("FUSE: stacked passthrough file\n");
> > > goto out;
> > > }
> > >
> > > But maybe we want to ban passthrough to any lower FUSE at least for start.
> >
> >
> > Yes, what I proposed here is very conservative, and your solution sounds
> > good to me. Unfortunately I don't have a clear idea of what could go wrong
> > if we relax this constraint. I need some guidance from you experts here.
> >
>
> I guess the main concern would be locking order and deadlocks.
> With my suggestion I think deadlocks are avoided and I am less sure
> but think that lockdep should not have false positives either.
>
> If people do need the 1-level stacking, I can try to think harder
> if it is safe and maybe add some more compromise limitations.
>
> > What do you think if we keep this overly strict rule for now to avoid
> > unintended behaviors and come back as we find affected use case?
> >
>
> I can live with that if other designated users don't mind the limitation.
>
> I happen to be developing a passthrough FUSE fs [1] myself and
> I also happen to be using it to pass through to overlayfs.
> OTOH, the workloads for my use case are mostly large sequential IO,
> and the hardware can handle the few extra syscalls, so the passthrough
> fd feature is not urgent for my use case at this point in time.


This is something that only happens if the FUSE daemon opens a connection
wanting FUSE_PASSTHROUGH, so shouldn't affect existing use cases. Or am I
wrong?
If some users find this limitation to be an issue, we can rethink/relax
this policy in the future... Switching to something like the solution you
proposed does not break the current behavior, so we would be able to change
this with minimal effort.


>
>
> >
> > >
> > > > + ret = -EEXIST;
> > >
> > > Why EEXIST? Why not EINVAL?
> > >
> >
> >
> > Reaching the stacking limit sounded like an error caused by the undesired
> > existence of something, thus EEXIST sounded like a good fit.
> > No problem in changing that to EINVAL.
> >
> >
> >
> > > > + if (fs_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
> > > > + pr_err("FUSE: maximum fs stacking depth exceeded for passthrough\n");
> > > > + goto out;
> > > > + }
> > > > +
> > > > + req->args->passthrough_filp = passthrough_filp;
> > > > + return 0;
> > > > +out:
> > > > + fput(passthrough_filp);
> > > > + return ret;
> > > > +}
> > > > +
> > >
> > > And speaking of overlayfs, I believe you may be able to test your code with
> > > fuse-overlayfs (passthrough to upper files).
> > >
> ...
> >
> > This is indeed a project with several common elements to what we are doing
> > in Android,
>
> Are you in liberty to share more information about the Android project?
> Is it related to Incremental FS [2]?
>
> Thanks,
> Amir.
>
> [1] https://github.com/amir73il/libfuse/commits/cachegwfs
> [2] https://lore.kernel.org/linux-fsdevel/20190502040331.81196-1-ezemtsov@xxxxxxxxxx/

Thanks for the pointer to cachegwfs, I'm glad I'm seeing more and more
passthrough file systems in addition to our use case in Android.

I'm not directly involved in the Incremental FS project, but, as far as I
remember, only for the first PoC was actually developed as a FUSE file
system. Because of the overhead introduced by the user space round-trips,
that design was left behind and the whole Incremental FS infrastructure
switched to becoming a kernel module.
In general, the FUSE passthrough patch set proposed in this series wouldn't
be helpful for that use case because, for example, Incremental FS requires
live (de)compression of data, that can only be performed by the FUSE
daemon.

The specific use case I'm trying to improve with this FUSE passthrough
series is instead related to the scoped storage feature that we introduced
in Android 11, that is based on FUSE, and affects those folders that are
shared among all the Apps (e.g., DCIM, Downloads, etc). More details here:

https://developer.android.com/about/versions/11/privacy/storage

With FUSE we now have a flexible way to specify more fine-grained
permissions (e.g., specify if an App is allowed to access files depenind on
their type), create private App folders, maintain legacy paths for old
Apps, manipulate pieces of files at run-time, etc. Forgive me if I forgot
something. :)
These extra operations may slower the file system access comprared to a
native, direct access, but if:
- the file being accessed requires special treatment before being passed to
the requesting App, then further tests will be performed at every
read/write operation (with some optimizations). This overhead is of
course annoying, but is something we are happy to pay because is
beneficial to the user (i.e., improves privacy and security).
- Instead, if at open time a file is recognized as safe to access and does
not require any further enforcement from the FUSE daemon, there's no need
to pay for future read/write operations overheads, that wouldn't do
anything more than just copying data (possibly with the help of
splicing). In this case the FUSE passthrough feature proposed in this
series can be enabled to reduce this overhead.

Moreover, some Apps use big files that contain all their resources, then
access these files at random offsets, not taking advantage of read-ahead
cache. The same happens for files containing databases.
In addition, our specific use case involves a FUSE daemon is probably
heavier than the average passthrough file system (for example those that
are in libfuse/examples), so reducing user space round trips thanks to the
patchset proposed here gives a strong improvement.

Anyway, only showing the improvements in our extreme use case would have
brought a limited case for upstreaming, and that is why the benchmarks I
ran (presented in the cover letter), are based on the vanilla
passthrough_hp daemon managing kind of standard storage workloads, and
still show evident performance improvements.
Running and sharing benchmarks on Android is also tricky because at the
time of first submission the most recent public real device was running a
4.14 kernel, so that would have been maybe nice to see, but not that
interesting... :)

I hope I answered all your questions, please let me know if I missed
something and feel free to ask for more details!

Thanks again,
Alessio