Re: [Ext2-devel] [PATCH 1/2] ext2/3: Support 2^32-1 blocks(Kernel)

From: Stephen C. Tweedie
Date: Mon Mar 20 2006 - 17:36:34 EST


Hi,

On Sun, 2006-03-19 at 23:36 -0700, Andreas Dilger wrote:

> What happens to existing filesystems with large inodes that don't have
> enough space for the extra timestamps in the first place?

Sadly, they are basically out of luck, unless we change the way that
space in the extended inode is used.

In retrospect, perhaps we goofed. We added that space into the inode,
but there is no guarantee that it can be used on demand for anything
other than xattrs --- precisely because xattrs can grow to use all
available space both in the external xattr block *and* in the inode.

We could have defined things such that you could either use the in-inode
space, OR the external space, for xattrs, but not both. But that would
be a performance compromise at best, for some of the most important
xattrs (like SELinux labels, which are always there and are always
needed) really want to be accelerated in the inode.

We really ought to have reserved *some* space in the extended inode for
non-xattr fields, for compatibility purposes.

But it's probably not too late. I would expect that the vast majority
of filesystems won't have any inodes that have fully-occupied xattr
space. It would be easy enough to define a new flag that indicates that
there is always X amount of space reserved for inode fields, and to set
that in fsck if all inodes on the fs obey that restriction. Then it
just comes down to picking a number X that is likely to satisfy all the
short-term demands for new inode fields.

> Also, if files
> are created while the filesystem is mounted without usecond timestamps
> they would get no usecond fields anyways. I agree that there are some
> unlikely corner conditions that might be hit (large inode filesystem, on
> older kernel without usec support, fills both the in-inode and external
> block so much that there isn't 12 bytes left for the usecond timestamps,
> and that file happens to depend on the exact accuracy of the timestamp).
> IMHO the inconvenience of the ROCOMPAT outweighs the benefits.

That's precisely the corner case that concerns me. The question is, do
we want the filesystem to behave correctly in all cases, or do we take
short-cuts?

I think we're probably early enough in the adoption of large inodes that
we don't have to make that compromise, and we can reserve some space for
guaranteed use by inode fields with a single minimally-invasive compat
change (say, a flag enabling a field in the superblock which defines how
many bytes we can always safely use for extended inode fields.)

--Stephen


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/