Re: [Ext2-devel] [RFC] [PATCH] Reducing average ext2 fsck time through fs-wide dirty bit]

From: Andreas Dilger
Date: Fri Mar 24 2006 - 14:25:21 EST


On Mar 24, 2006 06:32 -0800, Valerie Henson wrote:
> However, half the reason
> I'm working on ext2 is the simplicity of the code - stubbing it out
> would solve the performance problem but not the complexity problem.

But by the same token, adding the ext3 reservation code to ext2 isn't
doing anything to improve the simplicity of the ext2 code. That is
one reason why we've frowned upon adding any features to ext2, except
critical disk-format compatibility ones.

> Note that ext3's habit of clearing indirect blocks on truncate would
> break some things I want to do in the future. (Insert secret plans
> here.)

Ah, this is a long-standing ext3 wart that I've wanted to fix. In the
vast majority of cases (especially when there is a large journal in use)
it is possible to do the truncate in a single transaction. The only issue
is figuring out how big the transaction should be.

The good news, is that fixing the "ext3 clearing indirect blocks" problem
not only allows undelete to work again, but also improves truncate
performance because (a) we only modify 1/32 of the blocks we would in the
old case (we don't need to modify any {d,t,}indirect blocks), (b) we do
indirect block walking in forward direction, and could submit {d,}indirect
block requests in a batch instead of one-at-a-time.

Fix for this problem (inode is locked already):
- create a modified ext3_free_branches() to do tree walking and call a
method instead of always calling ext3_free_data->ext3_clear_blocks
- walk inode {d,t,}indirect blocks in forward direction, count bitmaps and
groups that will be modified (essentially NULL ext3_free_branches method)
- try to start a journal handle for this many blocks + 1 (inode) +
1 (super) + quota + EXT3_RESERVE_TRANS_BLOCKS
- if journal handle is too large (journal_start() returns -ENOSPC) fall
back to old zero-in-steps method (vast majority of cases will be OK
because number of modified blocks is much fewer)
- walk inode {d,t,}indirect blocks again deleting blocks via
ext3_free_blocks_sb() (updates group descriptor, bitmaps, quota), but
not journaling or modifying the indirect blocks
- update i_size/i_disksize/i_blocks to new value, like ext2
- close transaction

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/