Re: [GIT PULL] Please pull NFS client changes for Linux 4.13

From: Trond Myklebust
Date: Tue Aug 01 2017 - 13:30:45 EST


On Tue, 2017-08-01 at 10:20 -0700, Linus Torvalds wrote:
> On Tue, Aug 1, 2017 at 8:51 AM, davej@xxxxxxxxxxxxxxxxx
> <davej@xxxxxxxxxxxxxxxxx> wrote:
> > On Mon, Jul 31, 2017 at 10:35:45PM -0700, Linus Torvalds wrote:
> > > Any chance of getting the output from
> > >
> > > ./scripts/faddr2line vmlinux
> > nfs4_exchange_id_done+0x3d7/0x8e0
> >
> >
> > Hm, that points to this..
> >
> > 7463 /* Save the EXCHANGE_ID verifier session trunk
> > tests */
> > 7464 memcpy(clp->cl_confirm.data, cdata-
> > >args.verifier->data,
> > 7465 sizeof(clp->cl_confirm.data));
>
> Ok, that certainly made no sense to me, because the KASAN report made
> it look like a stale pathname access (allocated in getname, freed in
> putname), but I think the issue is more fundamental than that.
>
> That cdata->args.verifier seems to be entirely broken. AT least for
> the "xprt == NULL" case, it does the following:
>
> - use the address of a local variable ("&verifier")
>
> - wait for the rpc completion using rpc_wait_for_completion_task().
>
> That's unacceptably buggy crap. rpc_wait_for_completion_task() will
> happily exit on a deadly signal even if the rpc hasn't been
> completed,
> so now you'll have a stale pointer to a stack that has been freed.
>
> So I think the 'pathname' part may actually be entirely a red
> herring,
> and it's the underlying access itself that just picks up a random
> pointer from a stack that now contains something different. And KASAN
> didn't notice the stale stack access itself, because the stack slot
> is
> still valid - it's just no longer the original 'verifier' allocation.
>
> Or *something* like that.
>
> None of this looks even remotely new, though - the code seems to go
> back to 2009. Have you just changed what you're testing to trigger
> these things?
>
> I'm not even sure why it does that stupid stack allocation. It does a
> *real* allocation just a few lines later:
>
> struct nfs41_exchange_id_data *calldata
> ...
> calldata = kzalloc(sizeof(*calldata), GFP_NOFS);
>
> and the whole verifier structure could easily have been part of that
> same allocation as far as I can tell.
>
> And that really might seem to be the right thing to do.
>
> TOTALLY UNTESTED PROBABLY COMPLETE CRAP patch attatched.
>
> That patch compiles for me. It *might* even work. Or it might just be
> the ramblings of a diseased mind.
>
> Anna? Trond?
>

I came to the same conclusion yesterday, and have a stable patch that
does something similar. I just got distracted with the other bugs that
were introduced by the exchangeid patch series in Linux-4.9 (including
what looks like a duplicate free issue in nfs4_test_session_trunk()).

I can pass a few of the more critical patches on to Anna for merging in
this cycle, then I've got some clean ups ready for the 4.14 merge
window.

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx