BUG: assertion failure in fs/jbd/checkpoint.c persists in 2.6.11.12

From: David Wilk
Date: Thu Jun 16 2005 - 13:15:01 EST


Howdy all,

We've been plagued buy this ext3 bug since 2.6.10, and it only happens
on heavily loaded postgres systems. We run our postgres DB on ext3
data=journal on a dmcrypt partition. Our kernel is also patched with
grsec, but that doesn't appear to play any role.

After upgrading to 2.6.11.12 (specifically for the ext3 checkpoint.c
fix) we noticed two things. The assertion failure persists, and now
we get a condition where a postgres process will spin in state 'D'
forever and hog 100% of a CPU (in system, not user).

I've attached the trace in plain text so the formatting doesn't get screwed.

Let me know if anyone would like more information. I'm no programmer,
but I'd like to help in any way that I can.
Jun 14 17:21:24 qacluster4 kernel: Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:627: "transaction->t_forget == NULL"
Jun 14 17:21:24 qacluster4 kernel: ------------[ cut here ]------------
Jun 14 17:21:26 qacluster4 kernel: kernel BUG at fs/jbd/checkpoint.c:627!
Jun 14 17:21:26 qacluster4 kernel: invalid operand: 0000 [#1]
Jun 14 17:21:26 qacluster4 kernel: SMP
Jun 14 17:21:26 qacluster4 kernel: Modules linked in: memstat
Jun 14 17:21:26 qacluster4 kernel: CPU: 0
Jun 14 17:21:26 qacluster4 kernel: EIP: 0060:[<c0266520>] Tainted: P VLI
Jun 14 17:21:26 qacluster4 kernel: EFLAGS: 00010292 (2.6.11.12-grsec)
Jun 14 17:21:26 qacluster4 kernel: EIP is at __journal_drop_transaction+0x2d0/0x384
Jun 14 17:21:26 qacluster4 kernel: eax: 00000071 ebx: f7f7b500 ecx: c05cdff0 edx: 00000286
Jun 14 17:21:26 qacluster4 kernel: esi: f7101200 edi: c5ab532c ebp: f762a000 esp: f762ad68
Jun 14 17:21:26 qacluster4 kernel: ds: 007b es: 007b ss: 0068
Jun 14 17:21:26 qacluster4 kernel: Process kjournald (pid: 6953, threadinfo=f762a000 task=f7628020)
Jun 14 17:21:26 qacluster4 kernel: Stack: c0529ac0 c05016f5 c051ddd3 00000273 c051de23 f7f7b500 f7101200 c026611a
Jun 14 17:21:26 qacluster4 kernel: f7101200 f7f7b500 f24b668c f7f7b500 c026584a c5ab532c c5ab532c c02660c8
Jun 14 17:21:26 qacluster4 kernel: c5ab532c c5ab532c 00000001 da2cf780 da2cf780 e5911d4c f764c480 00000000
Jun 14 17:21:26 qacluster4 kernel: Call Trace:
Jun 14 17:21:26 qacluster4 kernel: [<c026611a>] __journal_remove_checkpoint+0x4a/0xa0
Jun 14 17:21:26 qacluster4 kernel: [<c026584a>] __try_to_free_cp_buf+0x5a/0x90
Jun 14 17:21:26 qacluster4 kernel: [<c02660c8>] __journal_clean_checkpoint_list+0x98/0xa0
Jun 14 17:21:26 qacluster4 kernel: [<c0263ee9>] journal_commit_transaction+0x1d9/0x11d0
Jun 14 17:21:26 qacluster4 kernel: [<c01bcc50>] autoremove_wake_function+0x0/0x60
Jun 14 17:21:26 qacluster4 kernel: [<c01bcc50>] autoremove_wake_function+0x0/0x60
Jun 14 17:21:26 qacluster4 kernel: [<c01a3b93>] scheduler_tick+0x63/0x320
Jun 14 17:21:26 qacluster4 kernel: [<c01a30c4>] find_busiest_group+0xd4/0x300
Jun 14 17:21:26 qacluster4 kernel: [<c01a337c>] find_busiest_queue+0x8c/0xb0
Jun 14 17:21:26 qacluster4 kernel: [<c01a35ab>] load_balance_newidle+0x8b/0xa0
Jun 14 17:21:26 qacluster4 kernel: [<c01a275c>] finish_task_switch+0x3c/0x90
Jun 14 17:21:26 qacluster4 kernel: [<c04f3142>] schedule+0x3e2/0xc90
Jun 14 17:21:26 qacluster4 kernel: [<c01b0b3c>] del_timer_sync+0x9c/0xe0
Jun 14 17:21:26 qacluster4 kernel: [<c02672d5>] kjournald+0xe5/0x250
Jun 14 17:21:26 qacluster4 kernel: [<c01bcc50>] autoremove_wake_function+0x0/0x60
Jun 14 17:21:26 qacluster4 kernel: [<c01bcc50>] autoremove_wake_function+0x0/0x60
Jun 14 17:21:26 qacluster4 kernel: [<c018a076>] ret_from_fork+0x6/0x20
Jun 14 17:21:26 qacluster4 kernel: [<c02671d0>] commit_timeout+0x0/0x10
Jun 14 17:21:26 qacluster4 kernel: [<c02671f0>] kjournald+0x0/0x250
Jun 14 17:21:26 qacluster4 kernel: [<c0188375>] kernel_thread_helper+0x5/0x10
Jun 14 17:21:26 qacluster4 kernel: Code: 52 c0 b8 23 de 51 c0 89 44 24 10 b8 73 02 00 00 89 44 24 0c b8 d3 dd 51 c0 89 44 24 08 b8 f5 16 50 c0 89 44 24 04 e8 60 19 f4 ff <0f> 0b 73 02 d3 dd 51 c0 e9 ba fd ff ff 8d 76 00 c7 04 24 c0 9a
Jun 15 12:21:43 qacluster4 syslogd 1.4.1: restart.
Jun 15 12:21:43 qacluster4 syslog: syslogd startup succeeded
Jun 15 12:21:43 qacluster4 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jun 15 12:21:43 qacluster4 kernel: Linux version 2.6.11.12-grsec (root@xxxxxxxxxxxxxxx) (gcc version 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)) #1 SMP Mon Jun 13 11:17:39 MDT 2005