Re: Btrfs: blocked for more than 120 seconds, made worse by 3.2 rc7

From: Konstantinos Skarlatos
Date: Wed Dec 28 2011 - 16:58:20 EST


On ÎÎÏÎÏÏÎ, 28 ÎÎÎÎÎÎÏÎÎÏ 2011 11:48:32 ÎÎ, Dave Chinner wrote:
On Wed, Dec 28, 2011 at 09:26:07PM +0200, Konstantinos Skarlatos wrote:
Hello all:
I have two machines with btrfs, that give me the "blocked for more
than 120 seconds" message. After that I cannot write anything to
disk, i am unable to unmount the btrfs filesystem and i can only
reboot with sysrq-trigger.

It always happens when i write many files with rsync over network.
When i used 3.2rc6 it happened randomly on both machines after
50-500gb of writes. with rc7 it happens after much less writes,
probably 10gb or so, but only on machine 1 for the time being.
machine 2 has not crashed yet after 200gb of writes and I am still
testing that.

machine 1: btrfs on a 6tb sparse file, mounted as loop, on a xfs
filesystem that lies on a 10TB md raid5. mount options
compress=zlib,compress-force

machine 2: btrfs over md raid 5 (4x2TB)=5.5TB filesystem. mount
options compress=zlib,compress-force

pastebins:

machine1:
3.2rc7 http://pastebin.com/u583G7jK
3.2rc6 http://pastebin.com/L12TDaXa

These two are caused by it taking longer than 120s for XFS to fsync
the loop file. Writing a signficant chunk of a sparse 6TB file on a
software RAID5 volume is going to take some time. However, if IO
is not occurring, then somewhere below XFS an IO has gone missing
(MD or hardware problem) because the fsync on the XFS file is
blocked waiting for an IO completion.

machine2:
3.2rc6 http://pastebin.com/khD0wGXx
3.2rc7 (not crashed yet)
Crashed a few hours ago, here is the rc7 pastebin
http://pastebin.com/gvfUm0az

These don't have XFS in the picture, but also appear to be hung
waiting on IO completion with MD stuck in
make_request()->get_active_stripe(). That, to me, indicates an MD
problem.....

Added the linux-raid mailing list
Please reply to me too, because i am not subscribed.

Cheers,

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/