Re: Problem in "prune_icache"

From: Jan Kara
Date: Thu Apr 02 2009 - 11:29:13 EST


Hi,

> I'am from Lustre, which is a product of SUN Mirocsystem to implement
> Scaled Distributed FileSystem, and we encounter a deadlock problem
> in prune_icache, the detailed is,
>
> during truncating a file, a new update in current journal transaction
> will be created, but it found memory in low level during processing,
> then it call try_to_free_pages to free some pages, which finially call
> shrink_icache_memory/prune_icache to free cache memory occupied by inodes.
> Note: prune_icache will get and hold "iprune_mutex" during its whole pruning work.
>
> but at the same time, kswapd have called shrink_icache_memory/prune_icache with
> "iprune_mutex" locked, which found some inodes to dispose and call
> clear_inode/DQUOT_DROP/fs-specific-quota-drop-op(say "ldiskfs_dquot_drop" in our case)
> to drop dquot, and this fs-specific-quota-drop-op can call journal_start to
> start a new update, but it found the buffers in current transaction is up to
> j_max_transaction_buffers, so it wake up kjournald to commit the transaction.
> so kjournald will call journal_commit_transaction to commit the transcation,
> which set the state of the transaction as T_LOCKED then check whether there are
> still pending updates for the committing transaction, and it found there is a
> pending update(started in truncating operation, see above), so it will wait
> the update to complete, BUT the update won't be completed for it can't get the
> "iprune_mutex" hold by kswapd, so the deadlock is triggered.
Yes, this has happened with other filesystems as well (ext3,
ext4,...). The usual solution for this problem is to specify GFP_NOFS to
all allocations that happen while the transaction is open. That way we
never get to recursing back to the filesystem in the allocation. Is
there some reason why that is no-go for you?

Honza

--
Jan Kara <jack@xxxxxxx>
SuSE CR Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/