Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure

From: Robert Wimmer
Date: Thu May 06 2010 - 17:19:27 EST


I don't know if someone is still interested in this
but I think Trond isn't further interested because
the last error was of cource a "page allocation
failure" and not a "soft lookup" which Trond was
trying to solve. But the patch was for 2.6.34 and
the "soft lookup" comes up only with some 2.6.30 and
maybe some 2.6.31 kernel versions. But the first error
I reported was a "page allocation failure" which
all kernels >= 2.6.32 produces with this configuration
I use (NFSv4).

Michael suggested to first solve the "soft lookup"
before further investigating the "page allocation
failure". We know that the "soft lookup" only
pop's up with NFSv4 and not v3. I really want to
use v4 but since I'm not a kernel hacker someone
must guide me what to try next.

I know that you're all have a lot of other work to
do but if there're no ideas left what to do next
it's maybe best to close the bug for now and I stay with
kernel 2.6.30 for now or go back to NFS v3 if I
upgrade to a newer kernel. Maybe the error will
be fixed "by accident" in >= 2.6.35 ;-)

Thanks!
Robert



On 05/03/10 10:11, kernel@xxxxxxxxxxx wrote:
> Anything we can do to investigate this further?
>
> Thanks!
> Robert
>
>
> On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@xxxxxxxxxxx>
> wrote:
>
>> I've applied the patch against the kernel which I got
>> from "git clone ...." resulted in a kernel 2.6.34-rc5.
>>
>> The stack trace after mounting NFS is here:
>> https://bugzilla.kernel.org/attachment.cgi?id=26166
>> /var/log/messages after soft lockup:
>> https://bugzilla.kernel.org/attachment.cgi?id=26167
>>
>> I hope that there is any usefull information in there.
>>
>> Thanks!
>> Robert
>>
>> On 04/27/10 01:28, Trond Myklebust wrote:
>>
>>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote:
>>>
>>>
>>>>> Sure. In addition to what you did above, please do
>>>>>
>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>
>>>>> and then cat the contents of the pseudofile at
>>>>>
>>>>> /sys/kernel/debug/tracing/stack_trace
>>>>>
>>>>> Please do this more or less immediately after you've finished
>>>>>
> mounting
>
>>>>> the NFSv4 client.
>>>>>
>>>>>
>>>>>
>>>> I've uploaded the stack trace. It was generated
>>>> directly after mounting. Here are the stacks:
>>>>
>>>> After mounting:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
>>>> After the soft lockup:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
>>>> The dmesg output of the soft lockup:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
>>>>
>>>>
>>>>
>>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
>>>>>
> it
>
>>>>> use the 'refer' export option anywhere? If so, then we might have to
>>>>> test further, since those may trigger the NFSv4 submount feature.
>>>>>
>>>>>
>>>>>
>>>> The server has the following settings:
>>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
>>>>
>>>> Thanks!
>>>> Robert
>>>>
>>>>
>>>>
>>>>
>>> That second trace is more than 5.5K deep, more than half of which is
>>> socket overhead :-(((.
>>>
>>> The process stack does not appear to have overflowed, however that
>>>
> trace
>
>>> doesn't include any IRQ stack overhead.
>>>
>>> OK... So what happens if we get rid of half of that trace by forcing
>>> asynchronous tasks such as this to run entirely in rpciod instead of
>>> first trying to run in the process context?
>>>
>>> See the attachment...
>>>
>>>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/