Re: [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220

From: Darrick J. Wong
Date: Mon Nov 13 2017 - 16:56:27 EST


[cc xfs list]

[We're discussing the WARN_ON in *dio_complete when invalidation fails]

[yes, again]

On Mon, Nov 13, 2017 at 11:21:50AM -0800, Linus Torvalds wrote:
> On Mon, Nov 13, 2017 at 11:16 AM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> >
> > I would tend to agree with you, it's annoying to dump a full stack trace
> > for an expected (even for a rare situation) condition. But it's not the
> > first one, there's also one in XFS that always triggers for test runs. I
> > complained about that one in the past.
>
> Yeah, we should always consider a WARN_ON() that is triggerable from
> user space to be a kernel bug.
>
> If it's a "cannot happen", then the bug should be fixed.
>
> If it's a "can happen, but I want to see the trace", then you just got
> the trace and you're done, and the WARN_ON() should be removed.
>
> It could possibly be replaced with a "pr_warn()" or something, so that
> it still shows up as a "the user did something dodgy", but honestly,
> even that is questionable. We do that for things like "we just removed
> support for this, we want to see if somebody is using it"
>
> So in no case is "let's just keep things as is" the right answer.

Wellll... the WARN_ON in question happens when:

a) two programs race to write to the same part of a file, one via the page
cache and the other via directio
b) the dio write completes, tries to invalidate the page cache, and fails
because the corresponding page cannot be invalidated

At this point, the page cache contents don't reflect what's on disk, so
I don't think we can quietly ignore the situation. Clearly, enough
people dislike the WARN to complain repeatedly, so perhaps we should try
to barf evidence of this situation up to userspace? The dio write
succeeded, which is why we don't turn err into ret; but now that we can
store and forward error codes through the mapping, how about we just:

errseq_set(dio->inode->i_mapping->wb_error, -EIO);

and then let the writers pick up the EIO the next time they fsync?
Though I can already imagine the complaints about writes that used to
work and suddenly start returning error codes.

--D

> Linus