Re: WARNING at fs/nfs/write.c:743 nfs_inode_remove_request with -rc6

From: Will Deacon
Date: Tue Sep 23 2014 - 09:59:40 EST


On Tue, Sep 23, 2014 at 02:33:06PM +0100, Weston Andros Adamson wrote:
> On Sep 23, 2014, at 9:03 AM, Will Deacon <will.deacon@xxxxxxx> wrote:
> > I've been running into the following warning on an arm64 system running
> > 3.17-rc6 with 64k pages. I've been unable to reproduce with a smaller page
> > size (4k).
> >
> > I don't yet have a concrete reproducer, but I've seen it hit a few times
> > today just running a machine with an NFS root filesystem and using ssh.
> > The warning seems to happen in parallel on the two CPUs, but I'm pretty
> > confident that our test_and_clear_bit implementation has the relevant
> > atomic instructions and memory barriers.
> >
> > Any ideas?
>
> So it looks like weâre either calling nfs_inode_remove_request twice on a request,
> or somehow not grabbing the inode reference for some request that is in the async
> write path. Itâs interesting that these come in pairs - that has to mean something!

Indeed. I have 6 CPUs on this system too, so it's not a per-cpu thing.

> Any more info on how to reproduce this would be really great. Unfortunately I donât
> have access to an arm64 system.

I've not spotted a pattern other than using 64k pages, yet. If I manage to
get a reproducer, I'll let you know.

> If itâs possible, could we get a packet trace around when this happens? This is pure
> speculation, but this might have something to do the resend path - a commit fails
> and all the requests on the commit list have to be resent.

Sure, once I can reproduce it reliably, then I'll try to do that.

> Have you noticed any side effects from this? That WARN_ON_ONCE was added
> to sanity test the new page group code and we need to fix this, but Iâm wondering
> if anything âbadâ happensâ

I've not noticed anything. In fact, this happened during an LTP run and I
didn't see any regressions in the test results.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/