Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

From: Greg KH
Date: Fri Feb 12 2021 - 03:39:13 EST


On Fri, Feb 12, 2021 at 04:20:17PM +0800, Nicolas Boichat wrote:
> On Fri, Feb 12, 2021 at 3:46 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Feb 12, 2021 at 12:44:00PM +0800, Nicolas Boichat wrote:
> > > Filesystems such as procfs and sysfs generate their content at
> > > runtime. This implies the file sizes do not usually match the
> > > amount of data that can be read from the file, and that seeking
> > > may not work as intended.
> > >
> > > This will be useful to disallow copy_file_range with input files
> > > from such filesystems.
> > >
> > > Signed-off-by: Nicolas Boichat <drinkcat@xxxxxxxxxxxx>
> > > ---
> > > I first thought of adding a new field to struct file_operations,
> > > but that doesn't quite scale as every single file creation
> > > operation would need to be modified.
> >
> > Even so, you missed a load of filesystems in the kernel with this patch
> > series, what makes the ones you did mark here different from the
> > "internal" filesystems that you did not?
>
> Which ones did I miss? (apart from configfs, see the conversation on
> patch 6/6). Anyway, hopefully auditing all filesystems is an order of
> magnitude easier task, and easier to maintain in the longer run ,-)

Are you going to be the one to add this to each new filesystem that is
added to the kernel?

I see filesystems in drivers/char/mem.c and
drivers/infiniband/hw/qib/qib_fs.c and drivers/misc/ibmasm/ibmasmfs.c
and loads of other driver-specific filesystems. Do all of those "just
work" somehow?

> > This feels wrong, why is userspace suddenly breaking? What changed in
> > the kernel that caused this? Procfs has been around for a _very_ long
> > time :)
>
> Some of this is covered in the cover letter 0/6. To expand a bit:
> copy_file_range has only supported cross-filesystem copies since 5.3
> [1], before that the kernel would return EXDEV and userspace
> application would fall back to a read/write based copy. After 5.3,
> copy_file_range copies no data as it thinks the input file is empty.

So older kernels work fine with this system call on procfs, but newer
ones do not? If so, that's a regression that should be fixed in the
original area, but should not require a new filesystem flag for every
individual one out there. That way lies madness and constant auditing
that I do not see anyone signing up for for the next 20 years, right?

> [1] I think it makes little sense to try to use copy_file_range
> between 2 files in /proc, but technically, I think that has been
> broken since copy_file_range fallback to do_splice_direct was
> introduced (eac70053a141 ("vfs: Add vfs_copy_file_range() support for
> pagecache copies", ~4.4).

Why are people trying to use copy_file_range on simple /proc and /sys
files in the first place? They can not seek (well most can not), so
that feels like a "oh look, a new syscall, let's use it everywhere!"
problem that userspace should not do.

thanks,

greg k-h