Re: [PATCH v11 2/2] zonefs: Add documentation

From: Damien Le Moal
Date: Wed Feb 05 2020 - 20:42:47 EST


On 2020/02/06 7:29, Dave Chinner wrote:
> On Wed, Feb 05, 2020 at 09:08:37PM +0900, Damien Le Moal wrote:
>> Add the new file Documentation/filesystems/zonefs.txt to document
>> zonefs principles and user-space tool usage.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx>
> .....
>
> Just looking at the error handling text...
>
>> +Several optional features of zonefs can be enabled at format time.
>> +* Conventional zone aggregation: ranges of contiguous conventional zones can be
>> + aggregated into a single larger file instead of the default one file per zone.
>> +* File ownership: The owner UID and GID of zone files is by default 0 (root)
>> + but can be changed to any valid UID/GID.
>> +* File access permissions: the default 640 access permissions can be changed.
>> +
>> +zonefs mount options
>> +--------------------
>
> This section is really all about error handling, not so much mount
> options in general...

Indeed. Section title fix is needed.

>
>> +
>> +zonefs defines several mount options allowing the user to control the file
>> +system behavior when write I/O errors occur and when inconsistencies between a
>> +file size and its zone write pointer position are discovered. The handling of
>> +read I/O errors is not changed by these options as long as no inode size
>> +corruption is detected.
>> +
>> +These options are as follows.
>> +* errors=remount-ro (default)
>> + All write IO errors and errors due to a zone of the device going "bad"
>> + (condition changed to offline or read-only), the file system is remounted
>> + read-only after fixing the size and access permissions of the inode that
>> + suffered the IO error.
>
> What does "fixing the size and access permissions of the inode"
> mean?
>
>> +* errors=zone-ro
>> + Any write IO error to a file zone result in the zone being considered as in a
>> + read-only condition, preventing any further modification to the file. This
>> + option does not affect the handling of errors due to offline zones. For these
>> + zones, all accesses (read and write) are disabled.
>
> If the zone is marked RO, then shouldn't reads still work?. Oh, hold
> on, you're now talking about errors that take the zone oflfine at
> the device level?
>
> Perhaps a table describing what IO can be done to a zone vs the
> device once an error occurs would be a clearer way of describing the
> behaviour.
>
>
> It seems to me that a table might be a better way of decribing all
> the different conditions
>
> Post error access permissions
> zone device
> mountopt zone state read write read write
> -------- ---------- ---- ----- ---- -----
> remount-ro good yes no yes no
> RO yes no yes no
> Offline no no yes no
>
> zone-ro good yes no yes yes
> RO yes no yes yes
> Offline no no yes yes
>
> zone-offline good no no yes yes
> RO no no yes yes
> Offline no no yes yes
>
> repair good yes yes yes yes
> RO yes no yes yes
> Offline no no yes yes
>
> And then you can document that an offline zone will always appear to
> have a size of 0, be immutable, etc, while a read-only zone will
> have a size that reflects the amount of valid data in the zone that
> can be read.
>
> IOWs, you don't need to mix the definitions of zone state appearence
> and behaviour with descriptions of what actions the different mount
> options take when a write error occurs.

Excellent idea ! That will clarify things a lot.

>> +* errors=zone-offline
>> + Any write IO error to a file zone result in the zone being considered as in
>> + an offline condition. This implies that the file size is changed to 0 and all
>> + read/write accesses to the file disabled, preventing all accesses by the user.
>> +* errors=repair
>> + Any inconsistency between an inode size and its zone amount of written data
>> + due to IO errors or external corruption are fixed without any change to file
>> + access rights. This option does not affect the processing of zones that were
>> + signaled as read-only or offline by the device. For read-only zones, the file
>> + read accesses are disabled and for offline zones, all access permissions are
>> + removed.
>> +
>> +For sequential zone files, inconsistencies between an inode size and the amount
>> +of data writen in its zone, that is, the position of the file zone write
>> +pointer, can result from different events:
>> +* When the device write cache is enabled, a differed write error can occur
>
> "a different write error"?

Nope. I really meant differed, as in "delayed" since the write command
succeeded but the cache flush for the data passed by the already completed
write command fails later. The sentence is not clear. I will clarify this
error pattern.

>
>> + resulting in the amount of data written in the zone being less than the inode
>> + size.
>
> Though I suspect that what you really mean to say is that errors can
> occur in ranges of previously completed writes can occur when the
> cache is flushed, hence less data being physically written than the
> OS has previously be told was written by the hardware. i.e. visible
> inode size goes backwards.

Yes, exactly. I will copy-paste this very clear explanation :)

>> +Finally, defective drives may change the condition of any zone to offline (zone
>> +dead) or read-only. Such changes, when discovered with the IO errors they can
>> +cause, are handled automatically regardless of the options specified at mount
>> +time. For offline zones, the action taken is similar to the action defined by
>> +the errors=zone-offline mount option. For read-only zones, the action used is
>> +as defined by the errors=zone-ro mount option.
>
> Hmmmm. I think that's over-complicating things and takes control of
> error handling away from the user. That is, regardless of the reason
> for the error, if we get a write error and the user specified
> errors=remount-ro, the entire device should go read-only because
> that's what the user has told the filesystem to do on write error.

Yes, and that is the case. Any IO error with errors=remount-ro will turn
the FS read-only. What I tried to say here is that this option will not
affect the handling of zones that went offline (done by the device). For
these, the file will not even be read-only. The table will clarify that. I
also need to clarify the different causes for errors, e.g. "regular"
read-write errors due to bad sectors, excessive vibrations, etc, which are
generally recoverable (rewrite over bad sectors fixes the sector most of
the time) and the ones due to the device changing zones condition, which
are not recoverable (no condition can get these zones out of their bad state).

> This seems pretty user-unfriendly - giving them a way to control
> error handling behaviour and then ignoring it for specific errors
> despite the fact they mean exactly the same thing to the user: the
> write failed because a zone has gone bad since the last time that
> zone was accessed by the application....

I think it is only the explanation that is bad. The error control mount
options are enforced correctly as defined.

Thanks !

>
> Cheers,
>
> Dave.
>


--
Damien Le Moal
Western Digital Research