XFS: xfs_trans_cancel Filesystem shutdown

From: Carsten Menke
Date: Tue Jan 06 2009 - 15:08:57 EST


Hi,

Yesterday I was called for a problem, one of the servers hat hung up during
the christmas days. When I looked into it, I saw there were a filesystem error.

This is the first time I expierence this with an XFS Filesystem, however as the machine is in full production I had to solve the problem and what I post here is what I have as debug information to investigate. The machine was running
on a 2.6.23.17 for 8 month without a problem. Then on Dec 22th, the machine first reported a Filesystem shutdown to syslog for the /opt partition and later hung up completley.

I have unmounted the filesystem, run xfs_repair on it, (didn't create any files in lost+found) and did not report any errors, then mounted the filesystem again
no problem so far. But as soon as I started the mailserver the Filesystem was shutdown again. I did the steps several times always with the same result. I then updated to Kernel 2.6.27.10 with the same symphtoms. This is a Debian 3.0 system with xfs_reapir Version 2.6.20. The /opt partition is on a soft raid
/dev/md4 (/dev/sda8, /dev/sdb8). After destroying the partition, the raid and repartition and reformat with xfsprogs 2.10.2 and restore from backup, the server now runs again stable. The /opt partition is only 51% full

/dev/md4 47G 24G 24G 51% /opt

Following are the first debug message and then the next after inital xfs_repair and clean mount

Dec 22 17:40:19 hauptserver kernel: Filesystem "md4": XFS internal error xfs_trans_cancel at line 1169 of file fs/xfs/xfs_trans.c. Caller 0xc
Dec 22 17:40:19 hauptserver kernel: [xfs_trans_cancel+93/231] xfs_trans_cancel+0x5d/0xe7
Dec 22 17:40:19 hauptserver kernel: [xfs_create+1438/1499] xfs_create+0x59e/0x5db
Dec 22 17:40:19 hauptserver kernel: [xfs_create+1438/1499] xfs_create+0x59e/0x5db
Dec 22 17:40:19 hauptserver kernel: [xfs_acl_vhasacl_default+54/71] xfs_acl_vhasacl_default+0x36/0x47
Dec 22 17:40:19 hauptserver kernel: [xfs_vn_mknod+379/654] xfs_vn_mknod+0x17b/0x28e
Dec 22 17:40:19 hauptserver kernel: [xfs_vn_permission+15/19] xfs_vn_permission+0xf/0x13
Dec 22 17:40:19 hauptserver kernel: [permission+209/214] permission+0xd1/0xd6
Dec 22 17:40:19 hauptserver kernel: [vfs_create+198/261] vfs_create+0xc6/0x105
Dec 22 17:40:19 hauptserver kernel: [open_namei_create+71/136] open_namei_create+0x47/0x88
Dec 22 17:40:19 hauptserver kernel: [open_namei+342/1294] open_namei+0x156/0x50e
Dec 22 17:40:19 hauptserver kernel: [vfs_stat_fd+25/64] vfs_stat_fd+0x19/0x40
Dec 22 17:40:19 hauptserver kernel: [do_filp_open+37/57] do_filp_open+0x25/0x39
Dec 22 17:40:19 hauptserver kernel: [get_unused_fd_flags+89/194] get_unused_fd_flags+0x59/0xc2
Dec 22 17:40:19 hauptserver kernel: [do_sys_open+68/197] do_sys_open+0x44/0xc5
Dec 22 17:40:19 hauptserver kernel: [sys_open+26/28] sys_open+0x1a/0x1c
Dec 22 17:40:19 hauptserver kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Dec 22 17:40:19 hauptserver kernel: =======================
Dec 22 17:40:19 hauptserver kernel: xfs_force_shutdown(md4,0x8) called from line 1170 of file fs/xfs/xfs_trans.c. Return address = 0xc022041e
Dec 22 17:40:19 hauptserver kernel: Filesystem "md4": Corruption of in-memory data detected. Shutting down filesystem: md4
Dec 22 17:40:19 hauptserver kernel: Please umount the filesystem, and rectify the problem(s)

Jan 5 16:36:25 hauptserver kernel: BUG: soft lockup - CPU#1 stuck for 11s! [mailserver:8172]
Jan 5 16:36:25 hauptserver kernel:
Jan 5 16:36:25 hauptserver kernel: Pid: 8172, comm: mailserver
Jan 5 16:36:25 hauptserver kernel: EIP: 0060:[block_read_full_page+499/737] CPU: 1
Jan 5 16:36:25 hauptserver kernel: EIP is at block_read_full_page+0x1f3/0x2e1
Jan 5 16:36:25 hauptserver kernel: EFLAGS: 00210246 Not tainted (2.6.23.17 #1)
Jan 5 16:36:25 hauptserver kernel: EAX: 00000000 EBX: c710c8d0 ECX: 000003d0 EDX: 00001000
Jan 5 16:36:25 hauptserver kernel: ESI: c10610a0 EDI: c30850c0 EBP: 00000000 DS: 007b ES: 007b FS: 00d8
Jan 5 16:36:25 hauptserver kernel: CR0: 8005003b CR2: 08d86b18 CR3: 01bee000 CR4: 00000690
Jan 5 16:36:25 hauptserver kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jan 5 16:36:25 hauptserver kernel: DR6: ffff0ff0 DR7: 00000400
Jan 5 16:36:25 hauptserver kernel: [update_curr+271/310] update_curr+0x10f/0x136
Jan 5 16:36:25 hauptserver kernel: [xfs_get_blocks+0/45] xfs_get_blocks+0x0/0x2d
Jan 5 16:36:25 hauptserver kernel: [update_curr+271/310] update_curr+0x10f/0x136
Jan 5 16:36:25 hauptserver kernel: [do_mpage_readpage+1338/1364] do_mpage_readpage+0x53a/0x554
Jan 5 16:36:25 hauptserver kernel: [__check_preempt_curr_fair+67/103] __check_preempt_curr_fair+0x43/0x67
Jan 5 16:36:25 hauptserver kernel: [run_timer_softirq+48/370] run_timer_softirq+0x30/0x172
Jan 5 16:36:25 hauptserver kernel: [__rcu_process_callbacks+143/161] __rcu_process_callbacks+0x8f/0xa1
Jan 5 16:36:25 hauptserver kernel: [tasklet_action+77/145] tasklet_action+0x4d/0x91
Jan 5 16:36:25 hauptserver kernel: [mpage_readpage+75/94] mpage_readpage+0x4b/0x5e
Jan 5 16:36:25 hauptserver kernel: [xfs_get_blocks+0/45] xfs_get_blocks+0x0/0x2d
Jan 5 16:36:25 hauptserver kernel: [find_lock_page+26/126] find_lock_page+0x1a/0x7e
Jan 5 16:36:25 hauptserver kernel: [filemap_fault+608/661] filemap_fault+0x260/0x295
Jan 5 16:36:25 hauptserver kernel: [__do_fault+104/764] __do_fault+0x68/0x2fc
Jan 5 16:36:25 hauptserver kernel: [handle_mm_fault+248/579] handle_mm_fault+0xf8/0x243
Jan 5 16:36:25 hauptserver kernel: [do_page_fault+565/1466] do_page_fault+0x235/0x5ba
Jan 5 16:36:25 hauptserver kernel: [do_futex+93/242] do_futex+0x5d/0xf2
Jan 5 16:36:25 hauptserver kernel: [do_page_fault+0/1466] do_page_fault+0x0/0x5ba
Jan 5 16:36:25 hauptserver kernel: [error_code+114/120] error_code+0x72/0x78
Jan 5 16:36:25 hauptserver kernel: [xfrm_state_find+665/1175] xfrm_state_find+0x299/0x497
Jan 5 16:36:25 hauptserver kernel: =======================

Carsten
--
"There are two major products that came out of Berkeley: LSD and UNIX.
We don't believe this to be a coincidence." --Jeremy S. Anderson
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/