On Wed 08-01-25 22:44:42, Baokun Li wrote:Indeed, we cannot confirm that all users will check the return value.
On 2025/1/8 21:43, Jan Kara wrote:Yes, they *should* check the return value of write(2) and take appropriate
On Wed 08-01-25 11:43:08, Baokun Li wrote:This is not quite right. Regardless of whether it is a BIO write or a DIO
On 2025/1/6 22:32, Jan Kara wrote:So I agree that direct IO users will generally notice the IO error so the
Okay, I will update the semantics of data_err=abort in the next version.But as you said, we don't track overwrite writes for performance reasons.I agree it makes sense to make the semantics of data_err=abort more
But compared to the poor performance of journal_data and the risk of the
drop cache exposing stale, not being able to sense data errors on overwrite
writes is acceptable.
After enabling ‘data_err=abort’ in dioread_nolock mode, after drop_cache
or remount, the user will not see the unexpected all-zero data in the
unwritten area, but rather the earlier consistent data, and the data in
the file is trustworthy, at the cost of some trailing data.
On the other hand, adding a new written extents and converting an
unwritten extents to written both expose the data to the user, so the user
is concerned about whether the data is correct at that point.
In general, I think we can update the semantics of “data_err=abort” to,
“Abort the journal if the file fails to write back data on extended writes
in ORDERED mode”. Do you have any thoughts on this?
obvious. Based on the usecase you've described - i.e., rather take the
filesystem down on write IO error than risk returning old data later - it
would make sense to me to also do this on direct IO writes.
For direct I/O writes, I think we don't need it because users can
perceive errors in time.
chances for bugs due to missing the IO error is low. But I think the
question is really the other way around: Is there a good reason to make
direct IO writes different? Because if I as a sysadmin want to secure a
system from IO error handling bugs, then having to think whether some
application uses direct IO or not is another nuissance. Why should I be
bothered?
write, users will check the return value of the write operation, because
errors can occur not only when data is written to disk.
action. But do all of them check and mainly do they do something meaningful
with the error? That's what I'm not so sure about :).
Okay, thanks for asking ted for his opinion on this.
It's just that when a DIO write returns successfully, users can be sureI understand including DIO need not be interesting for your usecase but I
that the data has been written to the disk.
However, when a BIO write returns successfully, it only means that the
data has been copied into the buffer. Whether it has been successfully
written back to the disk is unknown to the user.
That's why we need data_err=abort to ensure that users are aware when the
page writeback fails and to prevent data corruption from spreading.
still think it may be more consistent overall decision. But perhaps I'll
ask Ted what he thinks about it.
Yeah, indeed.
I agree properly checking for errors from buffered writes is much moreI see your point. I concur that it is indeed meaningful forWell, they don't care about data consistency after a crash. But theyAlso I would doFor data=journal mode, the journal itself will abort when data is abnormal.
this regardless of data=writeback/ordered/journalled mode because although
users wanting data_err=abort behavior will also likely want the guarantees
of data=ordered mode, these are two different things
However, as you pointed out, the above bug may cause errors to be missed.
Therefore, we can perform this check by default for journaled files.
and I can imagine useUsers using data=writeback often do not care about data consistency.
cases for setups with data=writeback and data_err=abort as well (e.g. for
scratch filesystems which get recreated on each system startup).
I did not understand your example. Could you please explain it in detail?
usually do care about data consistency while the system is running. And
unhandled IO errors can lead to data consistency problems without crashing
the system (for example if writeback fails and page gets evicted from
memory later, you have lost the new data and may see old version of it).
data_err=abort to be supported in data=writeback mode.
Thank you for your explanation!
And I see data_err=abort as a way to say: "I don't trust my applications toI still prefer to think of this as a supplement for users not being able
handle IO errors well. Rather take the filesystem down in that case than
risk data consistency issues".
Honza
to perceive page writeback in a timely manner. The fsync operation is
complex, requires frequent waiting, and may have omissions.
painful.
In addition, because ext4_end_bio() runs in interrupt context, we can'tSo how I imagined this would work is that if we get error in ext4_end_bio()
abort the journal directly there due to potential locking issues.
Instead, we now add write-back error checks and journal abortion logic
to ext4_end_io_end(), which is called by a kworker during unwritten
extent conversion.
Consequently, for modes that don't support unwritten extents (e.g.,
nodelalloc, journal_data, see ext4_should_dioread_nolock()), only the
check in journal_submit_data_buffers() will be effective. Should we
call the kworker for all files in ext4_end_bio()?
and data_err=abort is set, we will queue work (probably stored in the
superblock) to abort the filesystem. Alternatively, a bit more generic
approach might be to store the error state in the io_end and implement
something like:
static bool ext4_io_end_need_defered_completion(ext4_io_end_t *io_end)
{
return io_end->flag & (EXT4_IO_END_UNWRITTEN | EXT4_IO_END_ERROR);
}
and use it in ext4_end_bio() and ext4_put_io_end_defer() to determine
whether the io_end needs processing in the workqueue or not. And
ext4_put_io_end() can then abort the filesystem if EXT4_IO_END_ERROR is
set.
Honza