Re: [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error
From: Dave Chinner
Date: Mon Jan 16 2012 - 19:36:23 EST
On Mon, Jan 16, 2012 at 10:55:55AM -0800, Linus Torvalds wrote:
> On Mon, Jan 16, 2012 at 8:01 AM, Jan Kara <jack@xxxxxxx> wrote:
> >
> > Hum, let me understand this. I understand the meaning of buffer_uptodate
> > bit as "the buffer has at least as new content as what is on disk". Now
> > when storage cannot write the block under the buffer, the contents of the
> > buffer is still "at least as new as what is (was) on disk".
>
> No.
>
> Stop making crap up.
Jan is right, Linus. His definition of what up-to-date means for
dirty buffers is correct, especially in the case of write errors.
> If the write fails, the buffer contents have *nothing* to do with what
> is on disk.
The dirty buffer contains what is *supposed* to be on disk. If we
fail to write it, we corrupt some application's data.
> You don't know what the disk contents are.
But *we don't care* what is on disk after a write error because
there is no guarantee that after a write error we can even read the
previous data that was on disk. IOWs, the contents of the region on
disk where the write failed is -undefined- and cannot be trusted.
> So clearly the buffer cannot be up-to-date.
What we have in memory is what is *supposed* to be on disk, and the
error is telling us that the disk is failing to be made up-to-date.
IOWs, the disk is stale after a write error, not what is in memory.
So clearly the buffer contains the up-to-date version of the data
after a write error.
How the filesystem handles that error is now up to the filesystem.
For example, the filesystem can chose to allocate new blocks for the
failed write and write the valid, up-to-date in-memory data to a
different location and continue onwards without errors. From this
example, it's pretty obvious that the data in memory contains the
data that what we need to care about after a write error, not what
is on disk.
> Now, feel free to use *other* arguments for why we shouldn't clear the
> up-to-date bit, but using the disk contents as one is pure and utter
> garbage. And it is *obviously* pure and utter garbage.
For the read case you are correct, but that logic (that the disk
version is always correct) does not apply to handling write errors.
It's an important distinction....
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/