Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

From: Ian Lance Taylor
Date: Fri Feb 12 2021 - 11:00:21 EST


On Fri, Feb 12, 2021 at 7:45 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Feb 12, 2021 at 07:33:57AM -0800, Ian Lance Taylor wrote:
> > On Fri, Feb 12, 2021 at 12:38 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Why are people trying to use copy_file_range on simple /proc and /sys
> > > files in the first place? They can not seek (well most can not), so
> > > that feels like a "oh look, a new syscall, let's use it everywhere!"
> > > problem that userspace should not do.
> >
> > This may have been covered elsewhere, but it's not that people are
> > saying "let's use copy_file_range on files in /proc." It's that the
> > Go language standard library provides an interface to operating system
> > files. When Go code uses the standard library function io.Copy to
> > copy the contents of one open file to another open file, then on Linux
> > kernels 5.3 and greater the Go standard library will use the
> > copy_file_range system call. That seems to be exactly what
> > copy_file_range is intended for. Unfortunately it appears that when
> > people writing Go code open a file in /proc and use io.Copy the
> > contents to another open file, copy_file_range does nothing and
> > reports success. There isn't anything on the copy_file_range man page
> > explaining this limitation, and there isn't any documented way to know
> > that the Go standard library should not use copy_file_range on certain
> > files.
>
> But, is this a bug in the kernel in that the syscall being made is not
> working properly, or a bug in that Go decided to do this for all types
> of files not knowing that some types of files can not handle this?
>
> If the kernel has always worked this way, I would say that Go is doing
> the wrong thing here. If the kernel used to work properly, and then
> changed, then it's a regression on the kernel side.
>
> So which is it?

I don't work on the kernel, so I can't tell you which it is. You will
have to decide.

>From my perspective, as a kernel user rather than a kernel developer,
a system call that silently fails for certain files and that provides
no way to determine either 1) ahead of time that the system call will
fail, or 2) after the call that the system call did fail, is a useless
system call. I can never use that system call, because I don't know
whether or not it will work. So as a kernel user I would say that you
should fix the system call to report failure, or document some way to
know whether the system call will fail, or you should remove the
system call. But I'm not a kernel developer, I don't have all the
information, and it's obviously your call.

I'll note that to the best of my knowledge this failure started
happening with the 5.3 kernel, as before 5.3 the problematic calls
would report a failure (EXDEV). Since 5.3 isn't all that old I
personally wouldn't say that the kernel "has always worked this way."
But I may be mistaken about this.


> > So ideally the kernel will report EOPNOTSUPP or EINVAL when using
> > copy_file_range on a file in /proc or some other file system that
> > fails (and, minor side note, the copy_file_range man page should
> > document that it can return EOPNOTSUPP or EINVAL in some cases, which
> > does already happen on at least some kernel versions using at least
> > some file systems).
>
> Documentation is good, but what the kernel does is the true "definition"
> of what is going right or wrong here.

Sure. The documentation comment was just a side note. I hope that we
can all agree that accurate man pages are better than inaccurate ones.

Ian