[nfsd] page allocation failure -> kernel oops, even local fs hangs.
From: Ian Kumlien
Date: Wed Jul 04 2018 - 12:58:41 EST
Hi,
I just had this happen a little while ago, got different weird
deadlocks but this one actually generated a oops..
This is basic operation, a machine with 16 gb memory mainly doing NFS traffic.
I'm currently playing with RDMA for this, which is why mlx4 is included.
When the crash occurs, the local filesystem is just as inaccessible as
trying to use it remotely - i couldn't find any change that looked
related to it in v4.18-rc3 so here goes:
(the same report is attached as well)
[1609167.131537] nfsd: page allocation failure: order:8,
mode:0x14000c0(GFP_KERNEL), nodemask=(null)
[1609167.131548] nfsd cpuset=/ mems_allowed=0
[1609167.131554] CPU: 5 PID: 1310 Comm: nfsd Not tainted 4.17.1 #196
[1609167.131557] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./X370 Taichi, BIOS P4.70 04/17/2018
[1609167.131561] Call Trace:
[1609167.131567] dump_stack+0x46/0x5b
[1609167.131571] warn_alloc+0xf8/0x180
[1609167.131575] ? __alloc_pages_direct_compact+0x8b/0x110
[1609167.131578] __alloc_pages_slowpath+0xb86/0xbc0
[1609167.131583] ? set_next_entity+0x47b/0x680
[1609167.131586] __alloc_pages_nodemask+0x1f9/0x230
[1609167.131589] dma_direct_alloc+0xbd/0x120
[1609167.131593] alloc_coherent+0x60/0x160
[1609167.131599] mlx4_buf_direct_alloc.isra.9+0xc1/0x170 [mlx4_core]
[1609167.131604] mlx4_buf_alloc+0x158/0x1d0 [mlx4_core]
[1609167.131608] ? _raw_spin_lock_irqsave+0x1d/0x50
[1609167.131612] create_qp_common.isra.48+0x4bc/0x10d0 [mlx4_ib]
[1609167.131616] ? _raw_spin_unlock_irqrestore+0xf/0x30
[1609167.131620] ? mlx4_free_cmd_mailbox.part.20+0x17/0x20 [mlx4_core]
[1609167.131624] mlx4_ib_create_qp+0x128/0x880 [mlx4_ib]
[1609167.131627] ? mlx4_ib_create_cq+0x25f/0x420 [mlx4_ib]
[1609167.131633] ib_create_qp+0x9c/0x310 [ib_core]
[1609167.131636] rdma_create_qp+0x39/0xb0 [rdma_cm]
[1609167.131641] svc_rdma_accept+0x325/0x4d0 [rpcrdma]
[1609167.131644] ? _raw_spin_unlock_irqrestore+0xf/0x30
[1609167.131648] ? try_to_del_timer_sync+0x47/0x70
[1609167.131651] ? svc_rdma_detach+0x10/0x10 [rpcrdma]
[1609167.131655] svc_recv+0x2cc/0x7d0
[1609167.131659] ? nfsd_destroy+0x80/0x80
[1609167.131661] nfsd+0xd5/0x150
[1609167.131665] kthread+0x109/0x120
[1609167.131668] ? kthread_create_worker_on_cpu+0x70/0x70
[1609167.131671] ret_from_fork+0x22/0x40
[1609167.131674] Mem-Info:
[1609167.131678] active_anon:16522 inactive_anon:21289 isolated_anon:0
active_file:736374 inactive_file:2641993 isolated_file:23
unevictable:0 dirty:2 writeback:123602 unstable:0
slab_reclaimable:103077 slab_unreclaimable:76545
mapped:10454 shmem:318 pagetables:2054 bounce:0
free:42337 free_pcp:12 free_cma:0
[1609167.136395] Node 0 active_anon:66088kB inactive_anon:85156kB
active_file:2945496kB inactive_file:10567972kB unevictable:0kB
isolated(anon):0kB isolated
(file):92kB mapped:41816kB dirty:8kB writeback:494408kB shmem:1272kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 98304kB
writeback_tmp:0kB unstable:0kB al
l_unreclaimable? no
[1609167.138299] Node 0 DMA free:15900kB min:64kB low:80kB high:96kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB wri
tepending:0kB present:15984kB managed:15900kB mlocked:0kB
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB
[1609167.140313] lowmem_reserve[]: 0 3449 15946 15946
[1609167.141305] Node 0 DMA32 free:75596kB min:14604kB low:18252kB
high:21900kB active_anon:7208kB inactive_anon:10828kB
active_file:651676kB inactive_file:
2600688kB unevictable:0kB writepending:151540kB present:3597884kB
managed:3597884kB mlocked:0kB kernel_stack:288kB pagetables:20kB
bounce:0kB free_pcp:0kB l
ocal_pcp:0kB free_cma:0kB
[1609167.143453] lowmem_reserve[]: 0 0 12497 12497
[1609167.144600] Node 0 Normal free:77852kB min:52912kB low:66140kB
high:79368kB active_anon:58880kB inactive_anon:74328kB
active_file:2293820kB inactive_fi
le:7967284kB unevictable:0kB writepending:342876kB present:13094400kB
managed:12802572kB mlocked:0kB kernel_stack:5760kB pagetables:8196kB
bounce:0kB free_p
cp:48kB local_pcp:48kB free_cma:0kB
[1609167.146913] lowmem_reserve[]: 0 0 0 0
[1609167.148033] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB
2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M)
3*4096kB (M) = 15900
kB
[1609167.149199] Node 0 DMA32: 564*4kB (UME) 640*8kB (UME) 465*16kB
(UME) 358*32kB (UME) 275*64kB (UME) 119*128kB (UME) 45*256kB (UM)
10*512kB (M) 0*1024kB
0*2048kB 0*4096kB = 75744kB
[1609167.150389] Node 0 Normal: 1517*4kB (UMEH) 1231*8kB (UMEH)
840*16kB (UMEH) 545*32kB (UM) 304*64kB (UM) 62*128kB (UM) 14*256kB (M)
1*512kB (M) 0*1024kB
0*2048kB 0*4096kB = 78284kB
[1609167.151592] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[1609167.152783] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[1609167.153980] 3379126 total pagecache pages
[1609167.155155] 368 pages in swap cache
[1609167.156328] Swap cache stats: add 19348, delete 18980, find 6477/7599
[1609167.157509] Free swap = 20942168kB
[1609167.158679] Total swap = 20971516kB
[1609167.159839] 4177067 pages RAM
[1609167.161028] 0 pages HighMem/MovableOnly
[1609167.162177] 72978 pages reserved
[1609167.163319] 0 pages cma reserved
[1609167.164443] 0 pages hwpoisoned
[1609167.131537] nfsd: page allocation failure: order:8, mode:0x14000c0(GFP_KERNEL), nodemask=(null)
[1609167.131548] nfsd cpuset=/ mems_allowed=0
[1609167.131554] CPU: 5 PID: 1310 Comm: nfsd Not tainted 4.17.1 #196
[1609167.131557] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P4.70 04/17/2018
[1609167.131561] Call Trace:
[1609167.131567] dump_stack+0x46/0x5b
[1609167.131571] warn_alloc+0xf8/0x180
[1609167.131575] ? __alloc_pages_direct_compact+0x8b/0x110
[1609167.131578] __alloc_pages_slowpath+0xb86/0xbc0
[1609167.131583] ? set_next_entity+0x47b/0x680
[1609167.131586] __alloc_pages_nodemask+0x1f9/0x230
[1609167.131589] dma_direct_alloc+0xbd/0x120
[1609167.131593] alloc_coherent+0x60/0x160
[1609167.131599] mlx4_buf_direct_alloc.isra.9+0xc1/0x170 [mlx4_core]
[1609167.131604] mlx4_buf_alloc+0x158/0x1d0 [mlx4_core]
[1609167.131608] ? _raw_spin_lock_irqsave+0x1d/0x50
[1609167.131612] create_qp_common.isra.48+0x4bc/0x10d0 [mlx4_ib]
[1609167.131616] ? _raw_spin_unlock_irqrestore+0xf/0x30
[1609167.131620] ? mlx4_free_cmd_mailbox.part.20+0x17/0x20 [mlx4_core]
[1609167.131624] mlx4_ib_create_qp+0x128/0x880 [mlx4_ib]
[1609167.131627] ? mlx4_ib_create_cq+0x25f/0x420 [mlx4_ib]
[1609167.131633] ib_create_qp+0x9c/0x310 [ib_core]
[1609167.131636] rdma_create_qp+0x39/0xb0 [rdma_cm]
[1609167.131641] svc_rdma_accept+0x325/0x4d0 [rpcrdma]
[1609167.131644] ? _raw_spin_unlock_irqrestore+0xf/0x30
[1609167.131648] ? try_to_del_timer_sync+0x47/0x70
[1609167.131651] ? svc_rdma_detach+0x10/0x10 [rpcrdma]
[1609167.131655] svc_recv+0x2cc/0x7d0
[1609167.131659] ? nfsd_destroy+0x80/0x80
[1609167.131661] nfsd+0xd5/0x150
[1609167.131665] kthread+0x109/0x120
[1609167.131668] ? kthread_create_worker_on_cpu+0x70/0x70
[1609167.131671] ret_from_fork+0x22/0x40
[1609167.131674] Mem-Info:
[1609167.131678] active_anon:16522 inactive_anon:21289 isolated_anon:0
active_file:736374 inactive_file:2641993 isolated_file:23
unevictable:0 dirty:2 writeback:123602 unstable:0
slab_reclaimable:103077 slab_unreclaimable:76545
mapped:10454 shmem:318 pagetables:2054 bounce:0
free:42337 free_pcp:12 free_cma:0
[1609167.136395] Node 0 active_anon:66088kB inactive_anon:85156kB active_file:2945496kB inactive_file:10567972kB unevictable:0kB isolated(anon):0kB isolated
(file):92kB mapped:41816kB dirty:8kB writeback:494408kB shmem:1272kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 98304kB writeback_tmp:0kB unstable:0kB al
l_unreclaimable? no
[1609167.138299] Node 0 DMA free:15900kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB wri
tepending:0kB present:15984kB managed:15900kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[1609167.140313] lowmem_reserve[]: 0 3449 15946 15946
[1609167.141305] Node 0 DMA32 free:75596kB min:14604kB low:18252kB high:21900kB active_anon:7208kB inactive_anon:10828kB active_file:651676kB inactive_file:
2600688kB unevictable:0kB writepending:151540kB present:3597884kB managed:3597884kB mlocked:0kB kernel_stack:288kB pagetables:20kB bounce:0kB free_pcp:0kB l
ocal_pcp:0kB free_cma:0kB
[1609167.143453] lowmem_reserve[]: 0 0 12497 12497
[1609167.144600] Node 0 Normal free:77852kB min:52912kB low:66140kB high:79368kB active_anon:58880kB inactive_anon:74328kB active_file:2293820kB inactive_fi
le:7967284kB unevictable:0kB writepending:342876kB present:13094400kB managed:12802572kB mlocked:0kB kernel_stack:5760kB pagetables:8196kB bounce:0kB free_p
cp:48kB local_pcp:48kB free_cma:0kB
[1609167.146913] lowmem_reserve[]: 0 0 0 0
[1609167.148033] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900
kB
[1609167.149199] Node 0 DMA32: 564*4kB (UME) 640*8kB (UME) 465*16kB (UME) 358*32kB (UME) 275*64kB (UME) 119*128kB (UME) 45*256kB (UM) 10*512kB (M) 0*1024kB
0*2048kB 0*4096kB = 75744kB
[1609167.150389] Node 0 Normal: 1517*4kB (UMEH) 1231*8kB (UMEH) 840*16kB (UMEH) 545*32kB (UM) 304*64kB (UM) 62*128kB (UM) 14*256kB (M) 1*512kB (M) 0*1024kB
0*2048kB 0*4096kB = 78284kB
[1609167.151592] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[1609167.152783] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[1609167.153980] 3379126 total pagecache pages
[1609167.155155] 368 pages in swap cache
[1609167.156328] Swap cache stats: add 19348, delete 18980, find 6477/7599
[1609167.157509] Free swap = 20942168kB
[1609167.158679] Total swap = 20971516kB
[1609167.159839] 4177067 pages RAM
[1609167.161028] 0 pages HighMem/MovableOnly
[1609167.162177] 72978 pages reserved
[1609167.163319] 0 pages cma reserved
[1609167.164443] 0 pages hwpoisoned