Re: EXT3 deadlock in 2.4.22 and 2.4.23-pre7 - quota related?

From: Andrew Morton
Date: Mon Oct 27 2003 - 05:03:05 EST


Neil Brown <neilb@xxxxxxxxxxxxxxx> wrote:
>
>
> Hi all, and particularly Andrew and Stephen,
>
> I recently "upgraded' one of my NFS fileservers from (patched)2.4.18
> to 2.4.23-pre7 (in order to resolve a HIMEM related memory pressure
> problem).
>
> Unfortunately I have experienced what appears to be a deadlock.
>

I don't recall a time when quotas on ext3 were not deadlocky :(

> ...
>
> rquotad Call Trace: [sleep_on+75/124]
> [start_this_handle+205/368] [journal_start+149/196]
> [ext3_dirty_inode+116/268] [__mark_inode_dirty+50/168]
> [update_atime+75/80] [do_generic_file_read+1158/1172]
> [generic_file_read+147/400] [file_read_actor+0/224]
> [do_get_write_access+1382/1420] [v1_read_dqblk+121/196]
> [read_dqblk+76/128] [dqget+344/484] [vfs_get_dqblk+21/64]
> [v1_get_dqblk+39/172] [link_path_walk+2680/2956]
> [do_compat_quotactl+417/688] [resolve_dev+89/108]
> [sys_quotactl+166/275] [system_call+51/56]

read_dqblk() took dqio_sem, then ext3_dirty_inode() did journal_start().

> At the same time, "sync" is running:
>
> sync Call Trace: [__down+109/208] [__down_failed+8/12]
> [.text.lock.dquot+73/286] [ext3_sync_dquot+337/462]
> [vfs_quota_sync+102/372] [sync_dquots_dev+194/260]
> [fsync_dev+66/128] [sys_sync+7/16] [system_call+51/56]

ext3_sync_dquot() did journal_start(), then someone (commit_dqblk?) tried
to take dqio_sem.

Probably it is not too hard to flip the ordering in the latter case, but all
other code paths need to be checked, and perhaps even documented.

Jan, didn't we fix this very problem in 2.6 recently?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/