Re: [Ext2-devel] [PATCH] ext3: Extends blocksize up to pagesize

From: Andreas Dilger
Date: Mon Jan 23 2006 - 00:37:01 EST


On Jan 20, 2006 23:10 -0800, Andrew Morton wrote:
> Andreas Dilger <adilger@xxxxxxxxxxxxx> wrote:
> >
> > On Jan 18, 2006 22:06 +0900, Takashi Sato wrote:
> > > As a disk tends to get large, a disk storage has had a capacity to
> > > supply multi-TB. But now, ext3 can't support more than 8TB filesystem
> > > when blocksize is 4KB. That's why I think ext3 needs to be
> > > more than 8TB.
> > >
> > > Therefore I think filesystem size can increase on architectures
> > > which has more than 4KB pagesize by extending blocksize to pagesize on
> > > ext3. For example, the following is in case of ia64. (Blocksize have
> > > already been supported up to pagesize on ext2. Why is the max blocksize
> > > restricted to 4KB on ext3?)
> > >
> > > Max filesystem size on ia64:
> > > Original :4096(blocksize) * 2^31 = 8TB
> > > After modification [pagesize=16KB(default)]:16384(blocksize) * 2^31 = 32TB
> > > After modification [pagesize=64KB(max)] :65536(blocksize) * 2^31 = 128TB
> >
> > Just for others' info - the fill_super change has been tested in the past
> > by Sonny Rao at IBM also. e2fsprogs has supported this for a long time
> > already.
>
> I have a vague memory that there's some piece of metadata (per-block-group
> info, I think) which will overflow at 8kb blocksize. I say this in the
> hope that you'll remmeber what it was ;)

This is the group descriptor per-group blocks and inode free counts. The
current mke2fs code limits the number of blocks and inodes per group to
EXT2_MAX_BLOCKS_PER_GROUP (2^16 - 8) and (2^16 - EXT2_INODES_PER_BLOCK)
so that this won't overflow. We still get linear growth of filesystem
limits with blocksize, but not cubic growth that would otherwise be there.

Bull has a patch which enlarges the group descriptor fields to allow
more blocks and inodes per group.

> I've rewritten ext3 directory readahead to use the generic pagecache
> functions, so changes here shouldn't be needed.
>
> But I haven't yet got around to performance-testing it.
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc1/2.6.16-rc1-mm2/broken-out/ext3_readdir-use-generic-readahead.patch

Hmm, I'll have to take a look at that when I get a chance.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/