Re: [2.6.31-rc5] oops: NFS4 client manager kthread...

From: Daniel J Blueman
Date: Mon Aug 17 2009 - 09:53:19 EST


Hi Trond,

On Mon, Aug 17, 2009 at 2:12 PM, Trond
Myklebust<Trond.Myklebust@xxxxxxxxxx> wrote:
> On Sun, 2009-08-16 at 23:40 +0100, Daniel J Blueman wrote:
>> After losing and regaining ethernet link a few times with 2.6.31-rc5
>> [1], I've hit an oops in the NFS4 client manager kthread [2] on my
>> client with NFS4 homedir mount.
>>
>> Do you have a frequent test-case for when the client's manager kthread
>> gets invoked (with and without succeeding callbacks, due to eg a
>> firewall)? Server here is unpatched 2.6.30-rc6; I recall seeing
>> problems when the manager kthread gets invoked, across quite a few
>> kernel releases, just wasn't lucky enough to catch an oops.
>>
>> Oppsing in allow_signal() suggests task state corruption perhaps? I'm
>> downloading the debug kernel to match up the disassembly and line
>> numbers, if that helps? This time, the client had no firewall (but
>> have seen other issues when the callback has failed due to the
>> firewall).
>
> Those aren't Oopses. They are 'soft lockup' warnings. Basically, they're
> saying that the CPU is getting stuck waiting for a spin lock or a mutex.
>
> In this case, it is probably the fact that the state manager is going
> nuts trying to recover, while the connection to the server keeps coming
> up and going down.
>
> What does 'netstat -t' say when you get into this situation?

Whoops; it's true the stack-trace comes from the soft-lockup detector.

There was a single 200s link excursion, but the client didn't recover
as locks are held and never released it seems; I observe the
'192.168.1.250-m' NFS4 manager kthread being created and not going
away, despite IP connectivity with the server being fine after.

I'll reproduce it with stock 2.6.31-rc6 on the client and get 'netstat
-t' output.

Thanks for looking at this!
Daniel

> Cheers
>  Trond
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> Trond.Myklebust@xxxxxxxxxx
> www.netapp.com
>



--
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/