2.6.10-ac10 oops in journal_commit_transaction

From: Brice Figureau
Date: Thu Mar 03 2005 - 08:51:52 EST


Hi,


I'm reporting an oops on a bi-Xeon database server under 2.6.10-ac10
quite similar to:
http://marc.theaimsgroup.com/?l=ext3-users&m=110848085314238&w=2

I also got another server crashing (a mail server this time), but I
couldn't get the oops/panic.

This was after more than two weeks of uptime, I was running 2.6.10-ac1
before and never got this problem.

Here are the oops information:

Unable to handle kernel NULL pointer dereference at virtual address 0000000c
printing eip:
c01a858d
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP
Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables
CPU: 2
EIP: 0060:[journal_commit_transaction+877/5264] Not tainted VLI
EFLAGS: 00010286 (2.6.10-ac10)
EIP is at journal_commit_transaction+0x36d/0x1490
eax: db38a56c ebx: 00000000 ecx: 00000000 edx: f7779480
esi: f76fa000 edi: db38a56c ebp: f76fbf60 esp: f76fbdc8
ds: 007b es: 007b ss: 0068
Process kjournald (pid: 1206, threadinfo=f76fa000 task=f7454020)
Stack: f191fadc f191fadc 00000008 00000aa2 f76fbe04 f7fea4c0 f7c305b0 00000000
f77794b8 f7fea414 00000000 00000000 00000000 00000000 00000000 db313efc
f7779480 e4079c2c 00000aa2 00000001 f76fbe28 c01239b0 00000001 f76fbea8
Call Trace:
[show_stack+127/160] show_stack+0x7f/0xa0
[show_registers+351/464] show_registers+0x15f/0x1d0
[die+256/400] die+0x100/0x190
[do_page_fault+672/1712] do_page_fault+0x2a0/0x6b0
[error_code+43/48] error_code+0x2b/0x30
[kjournald+212/576] kjournald+0xd4/0x240
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Code: 8b 85 a0 fe ff ff 85 c0 0f 85 4f 0e 00 00 8b 95 a8 fe ff ff 8b 42 18 85 c0 0f 84 85 00 00 00 be 00 e0 ff ff 21 e6 8b 78 20 8b 1f <f0> ff 43 0c 8b 03 a8 04 0f 85 de 0d 00 00 89 5c 24 04 8b 4d 08
<6>note: kjournald[1206] exited with preempt_count 1

The code crashes in fs/jbd/commit.c journal_commit_transaction in this
particular area at line 314:

...
/*
* Wait for all previously submitted IO to complete.
*/
while (commit_transaction->t_locked_list) {
struct buffer_head *bh;

jh = commit_transaction->t_locked_list->b_tprev;
bh = jh2bh(jh);
get_bh(bh); <--- crash here because bh is NULL
if (buffer_locked(bh)) {
spin_unlock(&journal->j_list_lock);
wait_on_buffer(bh);
if (unlikely(!buffer_uptodate(bh)))
err = -EIO;
spin_lock(&journal->j_list_lock);
}
if (!inverted_lock(journal, bh)) {
put_bh(bh);
spin_lock(&journal->j_list_lock);
continue;
}
if (buffer_jbd(bh) && jh->b_jlist == BJ_Locked) {
__journal_unfile_buffer(jh);
jbd_unlock_bh_state(bh);
journal_remove_journal_head(bh);
put_bh(bh);
} else {
jbd_unlock_bh_state(bh);
}
put_bh(bh);
if (need_resched()) {
spin_unlock(&journal->j_list_lock);
cond_resched();
spin_lock(&journal->j_list_lock);
}
}
...

And more precisely at this stage of the code:

jh = commit_transaction->t_locked_list->b_tprev;
8b 78 20 mov 0x20(%eax),%edi

bh = jh2bh(jh);
8b 1f mov (%edi),%ebx

get_bh(bh);
f0 ff 43 0c lock incl 0xc(%ebx) <-- crash because ebx is null
8b 03 mov (%ebx),%eax

Unfortunately I don't have the knowledge (and time to aquire it) that
will help me chase down this bug/problem.

If you need more information (including .config and other) I'll be happy
to provide it.

Can you CC: me as I'm not subscribed to the list.

Regards,
--
Brice Figureau <brice+lklm@xxxxxxxxxxxxxxxx>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/