Re: spurious -ENOSPC on XFS

From: Dave Chinner
Date: Sat Jan 31 2009 - 19:01:58 EST

Next message: Rafael J. Wysocki: "Re: 2.6.29-rc3: tg3 dead after resume"
Previous message: Linus Torvalds: "Re: What should PCI core do during suspend-resume? (was: Re:2.6.29-rc3: tg3 dead after resume)"
In reply to: Mikulas Patocka: "Re: spurious -ENOSPC on XFS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jan 29, 2009 at 11:39:00AM -0500, Mikulas Patocka wrote:
> On Sat, 24 Jan 2009, Dave Chinner wrote:
> > On Fri, Jan 23, 2009 at 03:14:30PM -0500, Mikulas Patocka wrote:
> > > If I placed
> > > xfs_sync_inodes(ip->i_mount, SYNC_DELWRI);
> > > xfs_sync_inodes(ip->i_mount, SYNC_DELWRI | SYNC_IOWAIT);
> > > directly to xfs_flush_device, I got lock dependency warning (though not a
> > > real deadlock).
> >
> > Same reason memory reclaim gives lockdep warnings on XFS - we're
> > recursing into operations that take inode locks while we currently
> > hold an inode lock. However, it shouldn't deadlock because
> > we should ever try to take the iolock on the inode that we current
> > hold it on.
> >
> > > So synchronous flushing definitely needs some thinking and lock
> > > rearchitecting.
> >
> > No, not at all. At most the grabbing of the iolock in
> > xfs_sync_inodes_ag() needs to become a trylock....
>
> You are wrong, the comments in the code are right. It really
> deadlocks if it is modified to use synchronous wait for the end of
> the flush and if the flush is done with xfs_sync_inodes:
>
> xfssyncd D 0000000000606808 0 4819 2
> Call Trace:
> [0000000000606788] rwsem_down_failed_common+0x1ac/0x1d8
> [0000000000606808] rwsem_down_read_failed+0x24/0x34
> [0000000000606848] __down_read+0x30/0x40
> [0000000000468a64] down_read_nested+0x48/0x58
> [00000000100e6cc8] xfs_ilock+0x48/0x8c [xfs]
> [000000001011018c] xfs_sync_inodes_ag+0x17c/0x2dc [xfs]
> [000000001011034c] xfs_sync_inodes+0x60/0xc4 [xfs]
> [00000000101103c4] xfs_flush_device_work+0x14/0x2c [xfs]
> [000000001010ff1c] xfssyncd+0x110/0x174 [xfs]
> [000000000046556c] kthread+0x54/0x88
> [000000000042b2a0] kernel_thread+0x3c/0x54
> [00000000004653b8] kthreadd+0xac/0x160

So it is stuck:

127 /*
128 * If we have to flush data or wait for I/O completion
129 * we need to hold the iolock.
130 */
131 if ((flags & SYNC_DELWRI) && VN_DIRTY(inode)) {
132 >>>>>>>> xfs_ilock(ip, XFS_IOLOCK_SHARED);
133 lock_flags |= XFS_IOLOCK_SHARED;
134 error = xfs_flush_pages(ip, 0, -1, fflag, FI_NONE);
135 if (flags & SYNC_IOWAIT)
136 xfs_ioend_wait(ip);
137 }

Given that we are stuck on the iolock in xfs_sync_inodes_ag(), I
suspect you should re-read my comments above about "lock
re-architecting" ;).

If you make the xfs_ilock() there xfs_ilock_nowait() and avoid data
writeback if we don't get the lock the deadlock goes away, right?

BTW, can you post the patch you are working on?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rafael J. Wysocki: "Re: 2.6.29-rc3: tg3 dead after resume"
Previous message: Linus Torvalds: "Re: What should PCI core do during suspend-resume? (was: Re:2.6.29-rc3: tg3 dead after resume)"
In reply to: Mikulas Patocka: "Re: spurious -ENOSPC on XFS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]