Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

From: Greg KH
Date: Fri Feb 12 2021 - 10:47:09 EST


On Fri, Feb 12, 2021 at 07:33:57AM -0800, Ian Lance Taylor wrote:
> On Fri, Feb 12, 2021 at 12:38 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Why are people trying to use copy_file_range on simple /proc and /sys
> > files in the first place? They can not seek (well most can not), so
> > that feels like a "oh look, a new syscall, let's use it everywhere!"
> > problem that userspace should not do.
>
> This may have been covered elsewhere, but it's not that people are
> saying "let's use copy_file_range on files in /proc." It's that the
> Go language standard library provides an interface to operating system
> files. When Go code uses the standard library function io.Copy to
> copy the contents of one open file to another open file, then on Linux
> kernels 5.3 and greater the Go standard library will use the
> copy_file_range system call. That seems to be exactly what
> copy_file_range is intended for. Unfortunately it appears that when
> people writing Go code open a file in /proc and use io.Copy the
> contents to another open file, copy_file_range does nothing and
> reports success. There isn't anything on the copy_file_range man page
> explaining this limitation, and there isn't any documented way to know
> that the Go standard library should not use copy_file_range on certain
> files.

But, is this a bug in the kernel in that the syscall being made is not
working properly, or a bug in that Go decided to do this for all types
of files not knowing that some types of files can not handle this?

If the kernel has always worked this way, I would say that Go is doing
the wrong thing here. If the kernel used to work properly, and then
changed, then it's a regression on the kernel side.

So which is it?

> So ideally the kernel will report EOPNOTSUPP or EINVAL when using
> copy_file_range on a file in /proc or some other file system that
> fails (and, minor side note, the copy_file_range man page should
> document that it can return EOPNOTSUPP or EINVAL in some cases, which
> does already happen on at least some kernel versions using at least
> some file systems).

Documentation is good, but what the kernel does is the true "definition"
of what is going right or wrong here.

thanks,

greg k-h