Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

From: J. Bruce Fields
Date: Thu Nov 09 2017 - 15:48:06 EST


On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
> Anyway, that cmovne noise makes it a bit hard to see the actual part
> that matters (and that traps) but I'm almost certain that it's the
> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
> when it then does
>
> flags_by_sb(mnt->mnt_sb->s_flags);
>
> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
> NULL, because we wouldn't have gotten this far if it was.
>
> Now, afaik, mnt->mnt_sb should never be NULL in the first place for a
> proper path. And the vfs_statfs() code itself hasn't changed in a
> while.
>
> Which does seem to implicate nfsd as having passed in a bad path to
> vfs_statfs(). But I'm not seeing any changes in nfsd either.
>
> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
> range. There is a bunch of xfs changes, though. What's the underlying
> filesystem that you are exporting?
>
> But bringing in Al Viro and Bruce Fields explicitly in case they see
> something. And Darrick, just in case it might be xfs.

Looking at https://lkml.org/lkml/2017/11/8/1086 for the actual oops...

It doesn't remind me of any known issue.

And I don't see how we can call vfs_statfs() with a bad path:
nfsd4_encode_getattr would have to have been called with nfserr 0 and
ga_fhp->fh_export bad.

Looking at nfsd4_proc_compound, I can't see how we could get there in
the op->status == 0 case without the fh_verify() in nfsd4_getattr having
succeeded and assigned the result to ga_fhp.

So either I'm overlooking something or the bug's elsewhere.

It sounds like you're varying *only* the server version, so there's not
much chance that this could be triggered by changes in client behavior?

--b.