Re: BUG in 3.19.0-rc3+

From: Chris Clayton
Date: Sun Jan 11 2015 - 03:17:16 EST


Hi,

I've done the bisect and the outcome is below, but, because I almost always forget to mention it, I'll say here that I
am running a 32 bit user space on a 64 bit kernel.

On 01/10/15 20:17, Chris Clayton wrote:
> Hi,
>
> I'm getting a bug a BUG report from a kernel built from a pull (earlier today) of the current development kernel
> (running git describe gives v3.19-rc3-169-geb74926). So that I have useable wireless networking, I have also applied the
> latest seven iwlwifi patches from the wireless-drivers git tree. Prior to today's pull, I was not seeing anything
> unusual in dmesg.
>
> The BUG reported is as follows:
>
> Jan 10 19:41:32 laptop kernel: ------------[ cut here ]------------
> Jan 10 19:41:32 laptop kernel: kernel BUG at mm/rmap.c:399!
> Jan 10 19:41:32 laptop kernel: invalid opcode: 0000 [#1] PREEMPT SMP
> Jan 10 19:41:32 laptop kernel: Modules linked in: rfcomm snd_hda_codec_via iwlmvm coretemp snd_hda_codec_hdmi
> snd_hda_codec_generic snd_hda_intel mac80211 hwmon snd_hda_controller x86_pkg_temp_thermal acpi_cpufreq iwlwifi cfg80211
> snd_hda_codec snd_hwdep
> Jan 10 19:41:32 laptop kernel: CPU: 1 PID: 353 Comm: fc-cache Not tainted 3.19.0-rc3+ #42
> Jan 10 19:41:32 laptop kernel: Hardware name: Notebook W65_67SZ /W65_67SZ
> , BIOS 1.03.05 02/26/2014
> Jan 10 19:41:32 laptop kernel: task: ffff8800da98c5c0 ti: ffff880408dd4000 task.ti: ffff880408dd4000
> Jan 10 19:41:32 laptop kernel: RIP: 0010:[<ffffffff810ef7ea>] [<ffffffff810ef7ea>] unlink_anon_vmas+0x17a/0x200
> Jan 10 19:41:33 laptop kernel: RSP: 0018:ffff880408dd7d88 EFLAGS: 00010286
> Jan 10 19:41:33 laptop kernel: RAX: ffff88040b79e150 RBX: ffff88040b79e140 RCX: 00000000ffffffff
> Jan 10 19:41:33 laptop kernel: RDX: ffffffff00000001 RSI: ffff880409f04360 RDI: ffff880409f04320
> Jan 10 19:41:33 laptop kernel: RBP: ffff88040cb13278 R08: 0000000000000000 R09: ffff88040d801c00
> Jan 10 19:41:33 laptop kernel: R10: ffff88041fa546e0 R11: ffff88040b79e160 R12: ffff880409f04320
> Jan 10 19:41:33 laptop kernel: R13: ffff88040cb13278 R14: ffff88040cb13288 R15: ffff88040cb13210
> Jan 10 19:41:33 laptop kernel: FS: 0000000000000000(0000) GS:ffff88041fa40000(0000) knlGS:0000000000000000
> Jan 10 19:41:33 laptop kernel: CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
> Jan 10 19:41:33 laptop kernel: CR2: 00000000f722c8d4 CR3: 00000004082a8000 CR4: 00000000001407e0
> Jan 10 19:41:33 laptop kernel: Stack:
> Jan 10 19:41:33 laptop kernel: ffff88040d6cfbd8 ffff88040d6cfba0 ffff88040cecd160 ffff88040cb13210
> Jan 10 19:41:33 laptop kernel: ffff88040cbbb630 00000000f7151000 ffff880408dd7e28 0000000000000000
> Jan 10 19:41:33 laptop kernel: 0000000000000000 ffffffff810e3633 0000000000000000 0000000000000000
> Jan 10 19:41:33 laptop kernel: Call Trace:
> Jan 10 19:41:33 laptop kernel: [<ffffffff810e3633>] ? free_pgtables+0x83/0xf0
> Jan 10 19:41:34 laptop kernel: [<ffffffff810ec3c3>] ? exit_mmap+0xc3/0x150
> Jan 10 19:41:34 laptop kernel: [<ffffffff8103980d>] ? __do_page_fault+0x17d/0x4b0
> Jan 10 19:41:34 laptop kernel: [<ffffffff81042a21>] ? mmput+0x21/0xc0
> Jan 10 19:41:34 laptop kernel: [<ffffffff8104673d>] ? do_exit+0x26d/0xa50
> Jan 10 19:41:34 laptop kernel: [<ffffffff8111fe89>] ? mntput_no_expire+0x9/0x140
> Jan 10 19:41:34 laptop kernel: [<ffffffff8105ca1c>] ? task_work_run+0xbc/0xf0
> Jan 10 19:41:34 laptop kernel: [<ffffffff81047d44>] ? do_group_exit+0x34/0xb0
> Jan 10 19:41:34 laptop kernel: [<ffffffff81047dcf>] ? SyS_exit_group+0xf/0x10
> Jan 10 19:41:34 laptop kernel: [<ffffffff815e0f9f>] ? sysenter_dispatch+0x7/0x1e
> Jan 10 19:41:34 laptop kernel: Code: 00 ad de 48 89 43 18 e8 c5 f9 00 00 48 8b 45 10 48 8d 55 10 48 83 e8 10 49 39 d6 74
> 54 48 8b 7d 08 48 89 eb 8b 57 34 85 d2 74 9e <0f> 0b 0f 1f 40 00 e8 6b fc ff ff eb 9a 66 0f 1f 84 00 00 00 00
> Jan 10 19:41:34 laptop kernel: RIP [<ffffffff810ef7ea>] unlink_anon_vmas+0x17a/0x200
> Jan 10 19:41:34 laptop kernel: RSP <ffff880408dd7d88>
> Jan 10 19:41:34 laptop kernel: ---[ end trace 4aa713b2a9aa664b ]---
> Jan 10 19:41:34 laptop kernel: Fixing recursive fault but reboot is needed!
> Jan 10 19:41:34 laptop kernel: nf_conntrack version 0.5.0 (16384 buckets, 65536 max)

[snip]

>
> I won't get time tonight, but I can bisect it tomorrow, so this is just a heads up in case the problem (and fix) jumps
> out at anyone. Before I bisect I'll build and run a kernel without the iwlwifi patches.

The bisect ended up at:

7a3ef208e662f4b63d43a23f61a64a129c525bbc is the first bad commit
commit 7a3ef208e662f4b63d43a23f61a64a129c525bbc
Author: Konstantin Khlebnikov <koct9i@xxxxxxxxx>
Date: Thu Jan 8 14:32:15 2015 -0800

mm: prevent endless growth of anon_vma hierarchy

Constantly forking task causes unlimited grow of anon_vma chain. Each
next child allocates new level of anon_vmas and links vma to all
previous levels because pages might be inherited from any level.

This patch adds heuristic which decides to reuse existing anon_vma
instead of forking new one. It adds counter anon_vma->degree which
counts linked vmas and directly descending anon_vmas and reuses anon_vma
if counter is lower than two. As a result each anon_vma has either vma
or at least two descending anon_vmas. In such trees half of nodes are
leafs with alive vmas, thus count of anon_vmas is no more than two times
bigger than count of vmas.

This heuristic reuses anon_vmas as few as possible because each reuse
adds false aliasing among vmas and rmap walker ought to scan more ptes
when it searches where page is might be mapped.

Link: http://lkml.kernel.org/r/20120816024610.GA5350@xxxxxxxxxxxxxxxxxxxxxxx
Fixes: 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue")
[akpm@xxxxxxxxxxxxxxxxxxxx: fix typo, per Rik]
Signed-off-by: Konstantin Khlebnikov <koct9i@xxxxxxxxx>
Reported-by: Daniel Forrest <dan.forrest@xxxxxxxxxxxxx>
Tested-by: Michal Hocko <mhocko@xxxxxxx>
Tested-by: Jerome Marchand <jmarchan@xxxxxxxxxx>
Reviewed-by: Michal Hocko <mhocko@xxxxxxx>
Reviewed-by: Rik van Riel <riel@xxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> [2.6.34+]
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

:040000 040000 ca27f69d02743e7347b19b1e07976732a49698d1 7104c9ec5eb200ee4a21548e15d2b71a5806e107 M include
:040000 040000 5440efbda5ac44c2a2da7e068f40ee6f0d4c0c7e b76fd93bffebec1acdef5f1785eb578c5f4f6cc3 M mm

I'm more than happy to provide additional diagnostics and/or test any patches, but please cc me as I'm not subscribed.

In case it helps, I've attached the xz-compressed related config file.

Chris

>
> I've attached the full kernel log file for that boot.
>
> Chris
>

Attachment: config-kbug.txt.xz
Description: Binary data