Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it
From: NeilBrown
Date: Mon Apr 03 2017 - 23:04:26 EST
On Mon, Apr 03 2017, Jeff Layton wrote:
> On Mon, 2017-04-03 at 12:16 -0700, Matthew Wilcox wrote:
>> On Mon, Apr 03, 2017 at 01:47:37PM -0400, Jeff Layton wrote:
>> > > I wonder whether it's even worth supporting both EIO and ENOSPC for a
>> > > writeback problem. If I understand correctly, at the time of write(),
>> > > filesystems check to see if they have enough blocks to satisfy the
>> > > request, so ENOSPC only comes up in the writeback context for thinly
>> > > provisioned devices.
>> >
>> > No, ENOSPC on writeback can certainly happen with network filesystems.
>> > NFS and CIFS have no way to reserve space. You wouldn't want to have to
>> > do an extra RPC on every buffered write. :)
>>
>> Aaah, yes, network filesystems. I would indeed not want to do an extra
>> RPC on every write to a hole (it's a hole vs non-hole question, rather
>> than a buffered/unbuffered question ... unless you're WAFLing and not
>> reclaiming quickly enough, I suppose).
>>
>> So, OK, that makes sense, we should keep allowing filesystems to report
>> ENOSPC as a writeback error. But I think much of the argument below
>> still holds, and we should continue to have a prior EIO to be reported
>> over a new ENOSPC (even if the program has already consumed the EIO).
>>
>
> I'm fine with that (though I'd like Neil's thoughts before we decide
> anything) there.
I'd like there be a well defined time when old errors were forgotten.
It does make sense for EIO to persist even if ENOSPC or EDQUOT is
received, but not forever.
Clearing the remembered errors when put_write_access() causes
i_writecount to reach zero is one option (as suggested), but I'm not
sure I'm happy with it.
Local filesystems, or network filesystems which receive strong write
delegations, should only ever return EIO to fsync. We should
concentrate on them first, I think. As there is only one possible
error, the seq counter is sufficient to "clear" it once it has been
reported to fsync() (or write()?).
Other network filesystems could return a whole host of errors: ENOSPC
EDQUOT ESTALE EPERM EFBIG ...
Do we want to limit exactly which errors are allowed in generic code, or
do we just support EIO generically and expect the filesystem to sort out
the details for anything else?
One possible approach a filesystem could take is just to allow a single
async writeback error. After that error, all subsequent write()
system calls become synchronous. As write() or fsync() is called on each
file descriptor (which could possibly have sent the write which caused
the error), an error is returned and that fact is counted. Once we have
returned as many errors as there are open file descriptors
(i_writecount?), and have seen a successful write, the filesystem
forgets all recorded errors and switches back to async writes (for that
inode). NFS does this switch-to-sync-on-error. See nfs_need_check_write().
The "which could possibly have sent the write which caused the error" is
an explicit reference to NFS. NFS doesn't use the AS_EIO/AS_ENOSPC
flags to return async errors. It allocates an nfs_open_context for each
user who opens a given inode, and stores an error in there. Each dirty
pages is associated with one of these, so errors a sure to go to the
correct user, though not necessarily the correct fd at present.
When we specify the new behaviour we should be careful to be as vague as
possible while still saying what we need. This allows filesystems some
flexibility.
If an error happens during writeback, the next write() or fsync() (or
....) on the file descriptor to which data was written will return -1
with errno set to EIO or some other relevant error. Other file
descriptors open on the same file may receive EIO or some other error
on a subsequent appropriate system call.
It should not be assumed that close() will return an error. fsync()
must be called before close() if writeback errors are important to the
application.
Thanks,
NeilBrown
Attachment:
signature.asc
Description: PGP signature