Re: [PATCH v6] fuse: Add support for passthrough read/write

From: Alessio Balsini
Date: Tue Aug 18 2020 - 09:53:22 EST


Thank you both for the important feedback,

I tried to consolidate all your suggestions in the new version of the
patch, shared below.

As you both recommended, that tricky ki_filp swapping has been removed,
taking overlayfs as a reference for the management of asynchronous
requests.
The V7 below went again through some testing with fio (sync and libaio
engines with several jobs in randrw), fsstress, and a bunch of kernel
builds to trigger both the sync and async paths. Gladly everything worked
as expected.

What I didn't get from overlayfs are the temporary cred switchings, as I
think that in the FUSE use case it is still the FUSE daemon that should
take care of identifying if the requesting process has the right
permissions before allowing the passthrough feature.

On Thu, Aug 13, 2020 at 08:30:21PM +0200, 'Jann Horn' via kernel-team wrote:
> On Thu, Aug 13, 2020 at 3:28 PM Alessio Balsini <balsini@xxxxxxxxxxx> wrote:
> [...]
>
> My point is that you can use this security issue to steal file
> descriptors from processes that are _not_ supposed to act as FUSE
> daemons. For example, let's say there is some setuid root executable
> that opens /etc/shadow and then writes a message to stdout. If I
> invoke this setuid root executable with stdout pointing to a FUSE
> device fd, and I can arrange for the message written by the setuid
> process to look like a FUSE message, I can trick the setuid process to
> install its /etc/shadow fd as a passthrough fd on a FUSE file, and
> then, through the FUSE filesystem, mess with /etc/shadow.
>
> Also, for context: While on Android, access to /dev/fuse is
> restricted, desktop Linux systems allow unprivileged users to use
> FUSE.
> [...]
> The new fuse_dev_ioctl() command would behave just like
> fuse_dev_write(), except that the ioctl-based interface would permit
> OPEN/CREATE replies with FOPEN_PASSTHROUGH, while the write()-based
> interface would reject them.
> [...]
> I can't think of any checks that you could do there to make it safe,
> because fundamentally what you have to figure out is _userspace's
> intent_, and you can't figure out what that is if userspace just calls
> write().
> [...]
> In this case, an error indicates that the userspace programmer made a
> mistake. So even if the userspace programmer is not looking at kernel
> logs, we should indicate to them that they messed up - and we can do
> that by returning an error code from the syscall. So I think we should
> ideally abort the operation in this case.

I've been thinking about the security issue you mentioned, but I'm thinking
that besides the extra Android restrictions, it shouldn't be that easy to
play with /dev/fuse, also for a privileged process in Linux.
In order for a process to successfully send commands to /dev/fuse, the
command has to match a pending request coming from the FUSE filesystem,
meaning that the same privileged process should have mounted the FUSE
filesystem, actually becoming a FUSE daemon itself.
Is this still an eventually exploitable scenario?

>
> No, call_write_iter() doesn't do any of the required checking and
> setup and notification steps, it just calls into the file's
> ->write_iter handler:
>
> static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
> struct iov_iter *iter)
> {
> return file->f_op->write_iter(kio, iter);
> }
>
> Did you confuse call_write_iter() with vfs_iocb_iter_write(), or
> something like that?

Ops, you are right, I was referring to vfs_iocb_iter_write(), that was
actually part of my first design, but ended up simplifying in favor of
call_write_iter().

> Requests can complete asynchronously. That means call_write_iter() can
> return more or less immediately, while the request is processed in the
> background, and at some point, a callback is invoked. That's what that
> -EIOCBQUEUED return value is about. In that case, the current code
> will change ->ki_filp while the request is still being handled in the
> background.
>
> I recommend looking at ovl_write_iter() in overlayfs to see how
> they're dealing with this case.

That was a great suggestion, I hopefully picked the right things from
overlayfs, without overlooking some security aspects.

Thanks again,
Alessio

---8<---