Re: OOPSes in mem_cgroup_protected
From: John Stultz
Date: Wed Jun 13 2018 - 00:08:35 EST
On Tue, Jun 12, 2018 at 6:02 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
> Hey Tejun,
> With the current linus/master, I'm able to fairly regularly trip
> OOPSes (two examples below) in mem_cgroup_protected(), which seems to
> be new. I haven't managed to trigger this sort of thing with v4.17.
>
> I've not had much time to dig in or bisect it - I only know that
> enabling most of the memory debuging config options didn't seem to
> trip anything prior to the issue. So I wanted to send you a heads up
> to see if there was already known, or if there was anything you might
> suggest to help chase this down.
So the line where we're crashing seems to be in mem_cgroup_protected():
parent_emin = READ_ONCE(parent->memory.emin);
where I'm guessing the parent->memory value is null, and emin is at
the 0x120 offset in the strucutre.
Reverting the following commits seems to avoid the issue.
bf8d5d52ffe8 ("memcg: introduce memory.min")
5f93ad67436b ("mm: treat memory.low value inclusive")
230671533d64 ("mm: memory.low hierarchical behavior")
I'm guessing I'm tripping over some path where the memory value never
gets initialized?
Any ideas or suggestions?
thanks
-john
(usually I'd trim the backtraces below, but keeping them as I added
Roman to the CC list)
> console:/ $ [ 170.530896] Unable to handle kernel read from
> unreadable memory at virtual address 0000000000000120
> [ 170.540158] Mem abort info:
> [ 170.543092] ESR = 0x96000005
> [ 170.546193] Exception class = DABT (current EL), IL = 32 bits
> [ 170.552251] SET = 0, FnV = 0
> [ 170.555444] EA = 0, S1PTW = 0
> [ 170.558698] Data abort info:
> [ 170.561624] ISV = 0, ISS = 0x00000005
> [ 170.565572] CM = 0, WnR = 0
> [ 170.568650] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000190bb04e
> [ 170.575374] [0000000000000120] pgd=0000000000000000, pud=0000000000000000
> [ 170.582297] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [ 170.587929] CPU: 7 PID: 663 Comm: kswapd0 Not tainted
> 4.17.0-11699-gb4f23f3 #411
> [ 170.595358] Hardware name: HiKey Development Board (DT)
> [ 170.600623] pstate: a0400005 (NzCv daif +PAN -UAO)
> [ 170.605478] pc : mem_cgroup_protected+0x34/0x120
> [ 170.610142] lr : shrink_node+0x120/0x478
> [ 170.614093] sp : ffffff8009d23c50
> [ 170.617438] x29: ffffff8009d23c50 x28: ffffff8009d23d48
> [ 170.622808] x27: ffffffc074ca1000 x26: ffffff8009d23e28
> [ 170.628160] x25: ffffff8009d23d88 x24: 0000000000000000
> [ 170.633481] x23: 0000000000000000 x22: ffffff8009071f80
> [ 170.638802] x21: 0000000000000012 x20: 0000000000000012
> [ 170.644124] x19: 0000000000000000 x18: 0000000000000400
> [ 170.649444] x17: 0000000000000000 x16: ffffffc074ca2000
> [ 170.654765] x15: 0000000000000000 x14: 0000000000000400
> [ 170.660087] x13: 00000000000000b1 x12: 0000000000000003
> [ 170.665408] x11: 0000000000000020 x10: 0000000000000000
> [ 170.670729] x9 : 0000000000000001 x8 : 0000000000000004
> [ 170.676050] x7 : ffffffc074d43c00 x6 : 0000000000000000
> [ 170.681370] x5 : 0000000000000000 x4 : 0000000000000000
> [ 170.686690] x3 : 000000000000dafa x2 : 0000000000000000
> [ 170.692010] x1 : ffffffc074ca1000 x0 : ffffffc0386e8000
> [ 170.697335] Process kswapd0 (pid: 663, stack limit = 0x00000000e0f0ae51)
> [ 170.704039] Call trace:
> [ 170.706497] mem_cgroup_protected+0x34/0x120
> [ 170.710775] balance_pgdat+0x1cc/0x418
> [ 170.714529] kswapd+0x180/0x3b8
> [ 170.717674] kthread+0xf8/0x128
> [ 170.720824] ret_from_fork+0x10/0x18
> [ 170.724411] Code: b40007a2 d103e042 eb02001f 540006c0 (f9409046)
> [ 170.730542] ---[ end trace 7c961b6d409886f1 ]---
> [ 170.839299] Kernel panic - not syncing: Fatal exception
> [ 170.844549] SMP: stopping secondary CPUs
> [ 170.848488] Kernel Offset: disabled
> [ 170.851982] CPU features: 0x24802004
> [ 170.855556] Memory Limit: none
> [ 170.888494] Rebooting in 5 seconds..
>
>
>
>
> console:/ # [ 348.612152] Unable to handle kernel read from
> unreadable memory at virtual address 0000000000000120
> [ 348.617384] Unable to handle kernel access to user memory outside
> uaccess routines at virtual address 0000000000000120
> [ 348.621360] Mem abort info:
> [ 348.632086] Mem abort info:
> [ 348.634870] ESR = 0x96000005
> [ 348.634885] Exception class = DABT (current EL), IL = 32 bits
> [ 348.637686] ESR = 0x96000005
> [ 348.640785] SET = 0, FnV = 0
> [ 348.646740] Exception class = DABT (current EL), IL = 32 bits
> [ 348.649799] EA = 0, S1PTW = 0
> [ 348.652892] SET = 0, FnV = 0
> [ 348.652901] EA = 0, S1PTW = 0
> [ 348.652913] Data abort info:
> [ 348.658905] Data abort info:
> [ 348.662041] ISV = 0, ISS = 0x00000005
> [ 348.662050] CM = 0, WnR = 0
> [ 348.662071] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000697cecc4
> [ 348.665129] ISV = 0, ISS = 0x00000005
> [ 348.668298] [0000000000000120] pgd=000000003a915003, pud=000000003a915003
> [ 348.671224] CM = 0, WnR = 0
> [ 348.671242] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000c568bd29
> [ 348.674193] , pmd=0000000000000000
> [ 348.678021] [0000000000000120] pgd=0000000000000000, pud=0000000000000000
> [ 348.691540] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [ 348.723733] CPU: 5 PID: 3246 Comm: CrRendererMain Not tainted
> 4.17.0-11699-gb4f23f3 #412
> [ 348.731857] Hardware name: HiKey Development Board (DT)
> [ 348.737121] pstate: a0400005 (NzCv daif +PAN -UAO)
> [ 348.741975] pc : mem_cgroup_protected+0x34/0x120
> [ 348.746640] lr : shrink_node+0x120/0x478
> [ 348.750590] sp : ffffff800ac9b8a0
> [ 348.753934] x29: ffffff800ac9b8a0 x28: ffffff800ac9b9d8
> [ 348.759304] x27: ffffffc071982480 x26: ffffff800ac9bb30
> [ 348.764673] x25: ffffff800ac9ba18 x24: 0000000000000000
> [ 348.770038] x23: 0000000000000000 x22: ffffff8009113d00
> [ 348.775404] x21: 000000000000000f x20: 000000000000000f
> [ 348.780769] x19: 0000000000000000 x18: 0000000000000000
> [ 348.786134] x17: 0000000000000000 x16: ffffffc071985a80
> [ 348.791500] x15: 0000000000000000 x14: 00000000d5e75c2f
> [ 348.796868] x13: 00000000d7237d18 x12: 0000000000000003
> [ 348.802233] x11: 0000000000000020 x10: 0000000000000000
> [ 348.807598] x9 : 0000000000000001 x8 : 0000000000000004
> [ 348.812963] x7 : ffffffc072d58c80 x6 : 0000000000000000
> [ 348.818311] x5 : 0000000000000000 x4 : 0000000000000000
> [ 348.823626] x3 : 000000000000e1fc x2 : 0000000000000000
> [ 348.828941] x1 : ffffffc071982480 x0 : ffffffc038700080
> [ 348.834258] Process CrRendererMain (pid: 3246, stack limit =
> 0x00000000b82069c1)
> [ 348.841652] Call trace:
> [ 348.844100] mem_cgroup_protected+0x34/0x120
> [ 348.848370] do_try_to_free_pages+0xd0/0x3c0
> [ 348.852639] try_to_free_pages+0xf8/0x120
> [ 348.856651] __alloc_pages_nodemask+0x460/0xb68
> [ 348.861181] do_huge_pmd_anonymous_page+0x328/0x7d8
> [ 348.866061] __handle_mm_fault+0x57c/0xea0
> [ 348.870157] handle_mm_fault+0x128/0x1f8
> [ 348.874082] do_page_fault+0x1d0/0x490
> [ 348.877830] do_translation_fault+0x5c/0x68
> [ 348.882012] do_mem_abort+0x54/0x118
> [ 348.885587] el0_da+0x20/0x24
> [ 348.888557] Code: b40007a2 d103e042 eb02001f 540006c0 (f9409046)
> [ 348.894651] ---[ end trace 58afd90183767ac2 ]---
> [ 348.942150] Kernel panic - not syncing: Fatal exception
> [ 348.947448] SMP: stopping secondary CPUs
> [ 349.784747] SMP: failed to stop secondary CPUs 2,5
> [ 349.789569] Kernel Offset: disabled
> [ 349.793089] CPU features: 0x24802004
> [ 349.796691] Memory Limit: none
> [ 349.909567] Rebooting in 5 seconds..