Re: [PATCH] clear PageError bit in msync & fsync
From: Jeff Layton
Date: Fri Nov 12 2010 - 10:53:10 EST
On Thu, 11 Nov 2010 23:36:39 -0500
Rik van Riel <riel@xxxxxxxxxx> wrote:
> On 11/09/2010 04:41 PM, Andrew Morton wrote:
>
> > yup. It's a userspace bug, really. Although that bug might be
> > expressed as "userspace didn't know about linux-specific EIO
> > behaviour".
>
> Looking at this some more, I am not convinced this is a userspace
> bug.
>
> First, let me describe the problem scenario:
> 1) process A calls write
> 2) process B calls write
> 3) process A calls fsync, runs into an IO error, returns -EIO
> 4) process B calls fsync, returns success
> (even though data could have been lost!)
>
> Common sense, as well as these snippets from the fsync man
> page, suggest that this behaviour is incorrect:
>
> DESCRIPTION
> fsync() transfers ("flushes") all modified in-core data of (i.e.,
> modified buffer cache pages for) the file referred to by the file
> descriptor fd to the disk device
> ...
> RETURN VALUE
> On success, these system calls return zero. On error, -1 is
> returned, and errno is set appropriately.
>
I'll agree that that situation sucks for userspace but I'm not sure
that problem scenario is technically wrong. The error got reported to
userspace after all, just not to both processes that had done writes.
The root cause here is that we don't track the file descriptor that was
used to dirty specific pages. The reason is simple, IMO -- it would be
an unmanageable rabbit-hole.
Here's another related "problem" scenario (for purposes of argument):
Suppose between steps 2 and 3, the VM decides to flush out the pages
dirtied by process A, but not the ones from process B. That succeeds,
but just afterward the disk goes toes-up.
Now, process A issues an fsync. He gets an error but his data was
flushed to disk just fine. Is that also incorrect behavior?
--
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/