Re: kernel panics with 4.14.X versions

From: Pavlos Parissis
Date: Tue Apr 17 2018 - 07:18:58 EST


On 17/04/2018 01:31 ÏÎ, Pavlos Parissis wrote:
> On 16/04/2018 04:40 ÎÎ, Jan Kara wrote:
>> On Mon 16-04-18 15:25:50, Guillaume Morin wrote:
>>> Fwiw, there have been already reports of similar soft lockups in
>>> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038
>>>
>>> We have also noticed similar softlockups with 4.14.22 here.
>>
>> Yeah.
>>
>>> On 16 Apr 13:54, Pavlos Parissis wrote:
>>>>
>>>> Hi all,
>>>>
>
> [..snip..]
>
>>>> [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [kube-apiserver:24261]
>>>> [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag
>>>> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal
>>>> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
>>>> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses
>>>> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei
>>>> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace
>>>> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
>>>> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class
>>>> pps_core scsi_transport_sas
>>>> [373782.516807] dm_mirror dm_region_hash dm_log dm_mod dax
>>>> [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 4.14.32-1.el7.x86_64 #1
>>>> [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
>>>> [373782.567486] task: ffff882f66d28000 task.stack: ffffc9002120c000
>>>> [373782.583441] RIP: 0010:fsnotify+0x197/0x510
>>>> [373782.597319] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
>>>> [373782.615308] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002
>>>> [373782.632950] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0
>>>> [373782.650616] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000
>>>> [373782.668287] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>> [373782.685918] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>> [373782.703302] FS: 000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000
>>>> [373782.721887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [373782.737741] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0
>>>> [373782.755247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [373782.772722] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [373782.790043] Call Trace:
>>>> [373782.802041] vfs_write+0x151/0x1b0
>>>> [373782.815081] ? syscall_trace_enter+0x1cd/0x2b0
>>>> [373782.829175] SyS_write+0x55/0xc0
>>>> [373782.841870] do_syscall_64+0x79/0x1b0
>>>> [373782.855073] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>>
>> Can you please run RIP through ./scripts/faddr2line to see where exactly
>> are we looping? I expect the loop iterating over marks to notify but better
>> be sure.
>>
>
> I am very newbie on this and I tried with:
> ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64
> 0010:fsnotify+0x197/0x510
> readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
> size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
> size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag
> IMAGE_SCN_MEM_NOT_PAGED in section .bss
> nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols
> no match for 0010:fsnotify+0x197/0x510
>
> Obviously, I am doing something very wrong.
>

I produced an uncompressed image(the error above caused by giving a compressed
image to faddr2line) by compiling 4.14.32 with config which we have in production
and now faddr2line reports:

../repo/Linux/linux/scripts/faddr2line ./vmlinux 0010:fsnotify+0x197/0x510
no match for 0010:fsnotify+0x197/0x510


../repo/Linux/linux/scripts/faddr2line ./vmlinux fsnotify+0x197/0x510



skipping fsnotify address at 0xffffffff8129baf7 due to size mismatch (0x510 != 0x520)
no match for fsnotify+0x197/0x510

what am I doing wrong?

Cheers,
Pavlos


Attachment: signature.asc
Description: OpenPGP digital signature