[RFC PATCH] mm: kasan: suppress soft lockup in slub when !CONFIG_PREEMPT
From: Yang Shi
Date: Thu Dec 07 2017 - 18:30:35 EST
When running stress test with KASAN enabled, the below softlockup may
happen occasionally:
NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s!
[blogbench:2997]
Modules linked in: dm_mod(E) binfmt_misc(E)
salsa20_generic(E) salsa20_x86_64(E) camellia_generic(E)
camellia_aesni_avx_x86_64(E) camellia_x86_64(E) cast6_avx_x86_64(E)
cast6_generic(E) cast_common(E) serpent_avx_x86_64(E)
serpent_sse2_x86_64(E) serpent_generic(E) twofish_generic(E)
twofish_avx_x86_64(E) twofish_x86_64_3way(E) twofish_x86_64(E)
twofish_common(E) xts(E) tgr192(E) wp512(E) rmd320(E) rmd256(E)
rmd160(E) rmd128(E) md4(E) sha512_ssse3(E) sha512_generic(E) tcp_diag(E)
inet_diag(E) bonding(E) intel_rapl(E) iosf_mbi(E)
x86_pkg_temp_thermal(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
iTCO_wdt(E) iTCO_vendor_support(E) crct10dif_pclmul(E) dcdbas(E)
crc32_pclmul(E) sg(E) ipmi_devintf(E) ghash_clmulni_intel(E)
aesni_intel(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E)
cryptd(E) ipmi_si(E) mei_me(E) pcspkr(E) lpc_ich(E) mfd_core(E)
mei(E) shpchp(E) wmi(E) ipmi_msghandler(E) acpi_power_meter(E) nfsd(E)
auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ip_tables(E)
ext4(E) jbd2(E) mbcache(E) sd_mod(E) mgag200(E) drm_kms_helper(E)
syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ixgbe(E)
igb(E) mdio(E) ttm(E) ptp(E) drm(E) pps_core(E) crc32c_intel(E)
i2c_algo_bit(E) i2c_core(E) megaraid_sas(E) dca(E)
irq event stamp: 0
hardirqs last enabled at (0): [<     (null)>]   Â(null)
hardirqs last disabled at (0): [] copy_process.part.30+0x5c6/0x1f50
softirqs last enabled at (0): [] copy_process.part.30+0x5c6/0x1f50
softirqs last disabled at (0): [<Â Â Â Â Â (null)>] Â Â Â(null)
CPU: 7 PID: 2997 Comm: blogbench Tainted: GÂ Â Â Â W Â E
4.9.44-003.ali3000.alios7.x86_64.debug #1
Hardware name: Dell Inc. PowerEdge R720xd/0X6FFV, BIOS 1.6.0 03/07/2013
task: ffff884f7ca93340 task.stack: ffff884f7caa0000
RIP: 0010:[]Â [] put_cpu_partial+0xf5/0x160
RSP: 0018:ffff884f7caa7c90Â EFLAGS: 00000202
RAX: ffff884f7ca93340 RBX: ffffea0146720a00 RCX: 0000000000000002
RDX: ffff884f7ca93340 RSI: 0000000000000001 RDI: 0000000000000202
RBP: ffff884f7caa7cb0 R08: 0000000000000000 R09: 0000000000000630
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8852bb0f5f80
R13: 0000000000000001 R14: 0000000000000202 R15: 0000000000000000
FS:Â 00007fb74867c700(0000) GS:ffff8852bc400000(0000) knlGS:0000000000000000
CS:Â 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fded6b49c90 CR3: 0000005032bc6000 CR4: 00000000000406e0
Stack:
0000000000000000 0000000000070006 ffff8852bb0f5f80 ffffea0146720a00
ffff884f7caa7d60 ffffffff8126798c 0000000000000286 ffff88519c828000
ffff88525334a200 0000000100070001 ffff88519c828000 ffff8852bb0f0001
Call Trace:
[] __slab_free+0x19c/0x270
[] ___cache_free+0xa6/0xb0
[] qlist_free_all+0x47/0x80
[] quarantine_reduce+0x159/0x190
[] kasan_kmalloc+0xaf/0xc0
[] kasan_slab_alloc+0x12/0x20
[] kmem_cache_alloc+0xfa/0x360
[] ? getname_flags+0x4f/0x1f0
[] getname_flags+0x4f/0x1f0
[] getname+0x12/0x20
[] do_sys_open+0xf9/0x210
[] SyS_open+0x1e/0x20
[] entry_SYSCALL_64_fastpath+0x1f/0xc2
Code: ff 31 c9 e9 65 ff ff ff 41 8b 44 24 24 85 c0 74 2d
65 ff 0d 16 17 db 7e 5b 41 5c 41 5d 41 5e 5d c3 e8 60 c3 e9 ff 4c 89 f7
57 9d <66> 66 90 66 90 ba 01 00 00 00 31 ff 31 c9 e9 2f ff ff ff e8 83
The code is run in irq disabled or preempt disabled context, so
cond_resched() can't be used in this case. Touch softlockup watchdog when
KASAN is enabled to suppress the warning.
Signed-off-by: Yang Shi <yang.s@xxxxxxxxxxxxxxx>
---
mm/slub.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/slub.c b/mm/slub.c
index cfd56e5..4ae435e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -35,6 +35,7 @@
#include <linux/prefetch.h>
#include <linux/memcontrol.h>
#include <linux/random.h>
+#include <linux/nmi.h>
#include <trace/events/kmem.h>
@@ -2266,6 +2267,10 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
page->pobjects = pobjects;
page->next = oldpage;
+#ifdef CONFIG_KASAN
+ touch_softlockup_watchdog();
+#endif
+
} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
!= oldpage);
if (unlikely(!s->cpu_partial)) {
--
1.8.3.1