BUG: kernel panic after jbd bugs / kernel paging request

From: kardan
Date: Tue Aug 20 2013 - 02:55:17 EST


Dear developers,

At first thanks for all your work!

kernel version: 3.9.9-t23
kernel config: https://paste.debian.net/27351/
lspci -nvv output is attached.

I merged two kernel issues into one mail to find relations easier.

As both appeared only once I did not invest more time to try with
newer kernels. But I will do so for testing patches. Please give me
pointers where to dig in for reproducing.

1) jbd2_journal_dirty_metadata

I reported this in #linuxfs and was confirmed to forward it here.

12:55 < kardan:#linuxfs> it seems like my hdd is hanging (hdd led
turned. jbd is buzy for over an hour now:
1326 be/3 root 0.00 B 16.00 K 0.00 % 98.47 % [jbd2/sda1-8]

The load was caused by iceape (or something stacked below)
#1 0xb764680e in wait4 () at ../sysdeps/unix/syscall-template.S:81
#2 0xb76467e7 in __wait3 (stat_loc=..., options=0, usage=0x0)
at ../sysdeps/unix/bsd/bsd4.4/wait3.c:3312:55

This led to several jbd related kernel bugs and a kernel panic in
the end. I attached the jbd-schedulings-bugs to avoid wrapping issues.

jbd2_journal_dirty_metadata+0x162/0x188
kmem_cache_alloc+0x26/0x9f
spin_unlock.isra.6+0x1e/0x1e
ext4_file_open+0x13e/0x1b2
spin_lock.isra.7+0xa/0xb
__d_instantiate+0x59/0x63
fsnotify_perm+0x4d/0x58

__schedule_bug+0x39/0x49
__schedule+0x54/0x4e4
ttwu_do_wakeup.constprop.111+0x39/0x56
try_to_wake_up+0xe7/0xef
autoremove_wake_function+0xd/0x29
activate_page+0xae/0xfc
__cond_resched+0xf/0x19
_cond_resched+0x10/0x18
Aug 15 18:06:10 delight
unmap_single_vma+0x3fc/0x49c
unmap_vmas+0x30/0x4d
exit_mmap+0x68/0xcb
get_signal_to_deliver+0x202/0x4d1

kmem_cache_alloc+0x26/0x9f
spin_unlock.isra.6+0x1e/0x1e
ext4_file_open+0x13e/0x1b2
fsnotify+0x1fa/0x22c
__d_instantiate+0x59/0x63

__schedule_bug+0x39/0x49
_schedule+0x54/0x4e4
blk_peek_request+0x16f/0x1a4
scsi_request_fn+0x35d/0x3fe
activate_page+0xae/0xfc
__cond_resched+0xf/0x19
_cond_resched+0x10/0x18
unmap_single_vma+0x3fc/0x49c
unmap_vmas+0x30/0x4d
exit_mmap+0x68/0xcb
get_signal_to_deliver+0x202/0x4d1

__schedule_bug+0x39/0x49
__schedule+0x54/0x4e4
__free_one_page+0xeb/0x1c4
free_pcppages_bulk+0xbb/0x103
__cond_resched+0xf/0x19
_cond_resched+0x10/0x18
unmap_single_vma+0x3fc/0x49c
unmap_vmas+0x30/0x4d
exit_mmap+0x68/0xcb
get_signal_to_deliver+0x202/0x4d1

__schedule_bug+0x39/0x49
__schedule+0x54/0x4e4
smp_apic_timer_interrupt+0x58/0x60
apic_timer_interrupt+0x34/0x3c
activate_page+0xae/0xfc
__cond_resched+0xf/0x19
_cond_resched+0x10/0x18
unmap_single_vma+0x3fc/0x49c
unmap_vmas+0x30/0x4d
exit_mmap+0x68/0xcb

__schedule_bug+0x39/0x49
__schedule+0x54/0x4e4
vm_acct_memory+0x26/0x3c
__cache_free.isra.57+0xf/0x8f
percpu_counter_add.constprop.21+0x26/0x3e
spin_lock.isra.7+0xa/0xb
dput+0x11/0x96
spin_unlock.isra.11+0xa/0x1e
__fput+0x15f/0x17e
mnt_add_count.isra.16+0x1c/0x34
__cond_resched+0xf/0x19
_cond_resched+0x10/0x18
task_work_run+0x4f/0x5a
do_exit+0x2c6/0x796
kmsg_dump+0x1d/0xcc
oops_end+0x86/0x8a
do_bounds+0x4c/0x4c

Full log: https://paste.debian.net/27347/

2) unable to handle kernel paging request

INFO: task kswapd0:21 blocked for more than 120 seconds.
[289200.502665] [<c10b4258>] ? kmem_cache_alloc+0x2f/0x9f
[289200.502677] [<c108b07f>] ? mempool_alloc+0x3b/0xee
[289200.502690] [<c104c01f>] ? timekeeping_get_ns.constprop.
[289200.502703] [<c13310e9>] ? io_schedule+0x34/0x47
[289200.502715] [<c117e062>] ? get_request+0x416/0x4ae
[289200.502728] [<c1005a8f>] ? native_sched_clock+0x48/0x94
[289200.502741] [<c11811f1>] ? ioc_lookup_icq+0x41/0x49

[289800.503037] INFO: task kswapd0:21 blocked for more than 120 seconds.
[289800.503126] [<c104300b>] ? sched_slice.isra.36+0x67/0x85
[289800.503139] [<c104c01f>] ? timekeeping_get_ns.constprop.
[289800.503153] [<c13310e9>] ? io_schedule+0x34/0x47
[289800.503165] [<c117e062>] ? get_request+0x416/0x4ae
[289800.503178] [<c11811f1>] ? ioc_lookup_icq+0x41/0x49
[289800.503189] [<c1038faa>] ? abort_exclusive_wait+0x64/0x64
[289800.503199] [<c117f938>] ? blk_queue_bio+0x185/0x26d

This issue dates back some weeks, sorry for not reporting earlear.

I had two occurances of this with several days in between.
One week before the first occurence a new ram bank and a PCMCIA card
usb hub was added to the laptop.

Some days ago I saw a lot of IO errors once, they did not reappear.
On #linux-fs it was said the first one looks like use-after-free or
some other type of software-induced memory corruption.

"Those tend to be nasty problems that can take months to track down
some of the crazy-looking problems end up as bad hardware.

have you also experienced crashes of userspace programs?"
kswap/kworker were followed by Xorg, iceweasel, claws and Xorg.

Awesome was inresponsive afterwards and I needed the restart lightdm.
In a new X session parts of old windows reappeared, this was
reproducable.

Log is attached.

--
Kardan <kardan@xxxxxxxxxx>
Encrypt your email: http://gnupg.org/documentation
Public GPG key 9D6108AE58C06558 at hkp://pool.sks-keyservers.net
fpr: F72F C4D9 6A52 16A1 E7C9 AE94 9D61 08AE 58C0 6558

Attachment: kernel-paging-bug
Description: Binary data

Attachment: lspci
Description: Binary data

Attachment: signature.asc
Description: PGP signature