Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

From: Amir Goldstein
Date: Mon Feb 15 2021 - 00:59:16 EST


On Mon, Feb 15, 2021 at 3:27 AM Nicolas Boichat <drinkcat@xxxxxxxxxxxx> wrote:
>
> On Mon, Feb 15, 2021 at 9:12 AM Ian Lance Taylor <iant@xxxxxxxxxx> wrote:
> >
> > On Sun, Feb 14, 2021 at 4:38 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > >
> > > On Fri, Feb 12, 2021 at 03:54:48PM -0800, Darrick J. Wong wrote:
> > > > On Sat, Feb 13, 2021 at 10:27:26AM +1100, Dave Chinner wrote:
> > > >
> > > > > If you can't tell from userspace that a file has data in it other
> > > > > than by calling read() on it, then you can't use cfr on it.
> > > >
> > > > I don't know how to do that, Dave. :)
> > >
> > > If stat returns a non-zero size, then userspace knows it has at
> > > least that much data in it, whether it be zeros or previously
> > > written data. cfr will copy that data. The special zero length
> > > regular pipe files fail this simple "how much data is there to copy
> > > in this file" check...
> >
> > This suggests that if the Go standard library sees that
> > copy_file_range reads zero bytes, we should assume that it failed, and
> > should use the fallback path as though copy_file_range returned
> > EOPNOTSUPP or EINVAL. This will cause an extra system call for an
> > empty file, but as long as copy_file_range does not return an error
> > for cases that it does not support we are going to need an extra
> > system call anyhow.
> >
> > Does that seem like a possible mitigation? That is, are there cases
> > where copy_file_range will fail to do a correct copy, and will return
> > success, and will not return zero?
>
> I'm a bit worried about the sysfs files that report a 4096 bytes file
> size, for 2 reasons:
> - I'm not sure if the content _can_ actually be longer (I couldn't
> find a counterexample)
> - If you're unlucky enough to have a partial write in the output
> filesystem, you'll get a non-zero return value and probably corrupted
> content.
>

First of all, I am in favor of fixing the regression in the kernel caused
by the change of behavior in v5.3.

Having said that, I don't think it is a good idea to use ANY tool to copy
a zero size pseudo file.
rsync doesn't copy any data either if you try to use it to copy tracing into
a temp file.
I think it is perfectly sane for any tool to check file size before trying
to read/copy.

w.r.t. short write/short read, did you consider using the off_in/off_out
arguments? I *think* current kernel CFR should be safe to use as long
as the tool:
1. Checks file size before copy
2. Does not try to copy a range beyond EOF
3. Pass off_in/off_out args and verify that off_in/off_out advances
as expected

Thanks,
Amir.