Re: kernel BUG at mm/huge_memory.c:242 (was put_page vs release_pages)
From: Hillf Danton
Date: Fri Sep 30 2016 - 04:38:51 EST
On Thursday, September 29, 2016 10:42 PM Nicolas Morey-Chaisemartin wrote
>
> Hi,
>
> Can someone explain to me what is the difference between put_page and release_pages and which one should be used when ?
>
> To explain my case:
> We have a PCIe driver than transfers data between user process (read/write scalls) to a PCI device, using DMA.
> We followed the DMA guide so the code basically does this:
>
> ----
> get_user_pages()
> pci_map_sg()
>
> // All DMA setup/wait for completion
>
> pci_unmap_sg
> if (mode == PCIE_DMA_FROMDEVICE)
> foreach(page) { set_page_dirty_lock(page) }
> release_pages(pages, count, 0)
> ----
> It works nicely on centos 7 and few other kernels we tried.
>
> However we switch to the latest kernel from el-repo (4.7.5-1.el7.elrepo.x86_64), and one of our tests fails with the following kernel
> bug. The bug happens systematically.
> ---------------------
> [ 1905.322093] ------------[ cut here ]------------
> [ 1905.322119] kernel BUG at mm/huge_memory.c:242!
> [ 1905.322129] invalid opcode: 0000 [#1] SMP
> [ 1905.322137] Modules linked in: mppapcie_tty(O) mppapcie(O) fuse netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE
> nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
> nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cts nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs
> fscache dm_mirror dm_region_hash dm_log dm_mod edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel kvm snd_hda_codec snd_hda_core irqbypass snd_hwdep
> crct10dif_pclmul crc32_pclmul snd_seq ghash_clmulni_intel snd_seq_device aesni_intel lrw gf128mul glue_helper snd_pcm
> snd_timer snd ftdi_sio ablk_helper cryptd eeepc_wmi intel_cstate intel_rapl_perf soundcore asus_wmi sparse_keymap rfkill mei_me
> video iTCO_wdt mei i2c_algo_bit iTCO_vendor_support mxm_wmi sg lpc_ich mfd_core i2c_i801 shpchp wmi nfsd nfs_acl lockd grace
> auth_rpcgss sunrpc
> ip_tables ext4 jbd2 mbcache sr_mod sd_mod cdrom crc32c_intel serio_raw ahci libahci libata e1000e ptp pps_core fjes i2c_dev
> [ 1905.322441] CPU: 0 PID: 15051 Comm: simultaneous_wr Tainted: G O 4.7.5-1.el7.elrepo.x86_64 #1
> [ 1905.322455] Hardware name: KALRAY DEVELOPER/RAMPAGE IV GENE, BIOS 3204-1 11/19/2012
> [ 1905.322468] task: ffff88042c179680 ti: ffff8803dab30000 task.ti: ffff8803dab30000
> [ 1905.322480] RIP: 0010:[<ffffffff811f5de4>] [<ffffffff811f5de4>] put_huge_zero_page+0x14/0x20
> [ 1905.322504] RSP: 0018:ffff8803dab33ce8 EFLAGS: 00010046
> [ 1905.322512] RAX: ffffea000f690000 RBX: 0000000000000001 RCX: ffffea000f690000
> [ 1905.322521] RDX: ffff8803dab33d18 RSI: ffff8803dab33d18 RDI: ffffea000fe424e0
> [ 1905.322529] RBP: ffff8803dab33ce8 R08: ffff8803dab33d18 R09: 0000000000000000
> [ 1905.322537] R10: ffffea000fe424e0 R11: 0000000000000000 R12: ffff8803dab33e68
> [ 1905.322546] R13: ffffea000fe424c0 R14: ffff8803dab33e38 R15: ffff88043ffdbd80
> [ 1905.322555] FS: 00007f07c76f6740(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000
> [ 1905.322568] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1905.322576] CR2: 00007f07c70600d0 CR3: 00000003dab39000 CR4: 00000000000406f0
> [ 1905.322585] Stack:
> [ 1905.322592] ffff8803dab33d58 ffffffff811990b9 ffff88042f01c350 0000000000000001
> [ 1905.322611] ffffea000fe424e0 0000000000000246 ffffea000fe424e0 ffffea000fe424e0
> [ 1905.322629] 000000009b255e4f ffff8803dab33e68 ffffea000f690000 ffff8803dab33e68
> [ 1905.322649] Call Trace:
> [ 1905.322664] [<ffffffff811990b9>] release_pages+0x2a9/0x340
> [ 1905.322676] [<ffffffff811d199a>] free_pages_and_swap_cache+0x8a/0xa0
> [ 1905.322686] [<ffffffff811b8ba6>] tlb_flush_mmu_free+0x36/0x60
> [ 1905.322695] [<ffffffff811b9e6c>] tlb_finish_mmu+0x1c/0x50
> [ 1905.322705] [<ffffffff811c20d4>] unmap_region+0xf4/0x130
> [ 1905.322716] [<ffffffff811c4707>] do_munmap+0x217/0x370
> [ 1905.322726] [<ffffffff811c4e21>] SyS_munmap+0x51/0x70
> [ 1905.322738] [<ffffffff81003b12>] do_syscall_64+0x62/0x110
> [ 1905.322749] [<ffffffff817227e1>] entry_SYSCALL64_slow_path+0x25/0x25
> [ 1905.322758] Code: 02 00 00 00 48 8b 3d 3c c9 b6 00 eb 8a 65 48 ff 05 02 d5 e1 7e eb 80 66 66 66 66 90 55 48 89 e5 f0 ff 0d 20 1b fb 00 74
> 02 5d c3 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 f6 46 50 02
> [ 1905.322896] RIP [<ffffffff811f5de4>] put_huge_zero_page+0x14/0x20
> [ 1905.322908] RSP <ffff8803dab33ce8>
> [ 1905.322923] ---[ end trace 9b58553cd98c0019 ]---
> [ 1905.322932] Kernel panic - not syncing: Fatal exception
> [ 1905.322990] Kernel Offset: disabled
> [ 1905.323000] ---[ end Kernel panic - not syncing: Fatal exception
> -----------------
>
> I tried replacing the release_pages by a call to put_page for each page and it works. Also disabling thp works with both version of the
> driver.
>
> Is it just pure luck if the crash disappeared ?
> Should we not use release_pages in this case?
>
See if the following diff(not tested) can help only after trying all others including 4.8-rc8.
thanks
Hillf
--- linux-4.7/mm/huge_memory.c Mon Jul 25 03:23:50 2016
+++ b/mm/huge_memory.c Fri Sep 30 15:55:03 2016
@@ -1409,6 +1409,7 @@ struct page *follow_trans_huge_pmd(struc
{
struct mm_struct *mm = vma->vm_mm;
struct page *page = NULL;
+ bool is_huge_zero;
assert_spin_locked(pmd_lockptr(mm, pmd));
@@ -1424,6 +1425,7 @@ struct page *follow_trans_huge_pmd(struc
goto out;
page = pmd_page(*pmd);
+ is_huge_zero = is_huge_zero_page(page);
VM_BUG_ON_PAGE(!PageHead(page), page);
if (flags & FOLL_TOUCH)
touch_pmd(vma, addr, pmd);
@@ -1450,8 +1452,12 @@ struct page *follow_trans_huge_pmd(struc
}
page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT;
VM_BUG_ON_PAGE(!PageCompound(page), page);
- if (flags & FOLL_GET)
- get_page(page);
+ if (flags & FOLL_GET) {
+ if (is_huge_zero)
+ page = get_huge_zero_page();
+ else
+ get_page(page);
+ }
out:
return page;
--