Re: Filesize limitation

Albert D. Cahalan (acahalan@cs.uml.edu)
Sat, 8 Nov 1997 14:31:18 -0500 (EST)


The fields I wish to use are unused by all 3 operating systems.
In any case, operating systems should not define conflicting
fields. It would be dangerous and incompatible.

>> Linux and Masix have mostly reserved space, but look at what
>> is already filled in for the hurd:
>
>> __u8 h_i_frag; /* Fragment number */
>> __u8 h_i_fsize; /* Fragment size */
>> __u16 h_i_mode_high;
>> __u16 h_i_uid_high;
>> __u16 h_i_gid_high;
>> __u32 h_i_author;
>
>> Without trouble, we have 32-bit UID and GID support.
>> Now look at h_i_frag and h_i_fsize. Those exist to support
>> a feature that was never implemented in ext2. Since the feature
>> does more harm than good, the osd2 union (for Linux) can become:
>
> The nice thing about ext2 is that it is available for
> Linux/Masix/Hurd; I think it would be foolish to add support for
> larger files to the Linux port, when this would make it
> incompatible with the other OSs that use ext2.

If it would be incompatible, maybe. It's not incompatible
though, so you can continue to pretend that there are other
useful OSs that run native on ext2.

There are at least 48 bits we can grab _without_ causing any
trouble for various research OSs. If we get rid of i_dtime as
Linus Torvalds once suggested (for performance) then there
are 80 bits available.

Before anybody gets too excited about 64-bit file sizes and
64-bit block counts, think about the future a bit. We will
run out of time stamp bits before file sizes exceed 48 bits.
It would be wise to avoid wasting bits! Going all the way to
64-bit file sizes is far beyond what ext2 was designed for.
Going past 48 bits without also extending the block numbers
(stored all over the disk in indirect blocks) is pointless.

I have two suggested inode layouts that are compatible.
>From a quick grep of the source, I think these modifications
are read-write compatible even.

Suggestion 1 is nice and clean:

struct ext2_inode {
__u16 i_mode; /* File mode */
__u16 i_uid; /* Owner Uid */
__u32 i_size; /* Size in bytes */
__u32 i_atime; /* Access time */
__u32 i_ctime; /* Creation time */
__u32 i_mtime; /* Modification time */
__u32 i_dtime; /* Deletion Time */
__u16 i_gid; /* Group Id */
__u16 i_links_count; /* Links count */
__u32 i_blocks; /* Blocks count */
__u32 i_flags; /* File flags */
__u32 i_translator; /* Translator (Hurd only) */
__u32 i_block[EXT2_N_BLOCKS];/* Pointers to blocks */
__u32 i_version; /* File version (for NFS) */
__u32 i_file_acl; /* File ACL */
__u32 i_dir_acl; /* Directory ACL */
__u32 i_reserved;
__u16 i_size_high; /* high 16 bits of 48-bit file size */
__u16 i_mode_high; /* Extra mode bits (Hurd only) */
__u16 i_uid_high; /* high 16 bits of UID */
__u16 i_gid_high; /* high 16 bits of GID */
__u32 i_author; /* Author info (Hurd only) */
};

Suggestion 2 is ugly, but some people might like it.
The layout on disk is exactly the same.

struct ext2_inode {
__u16 i_mode; /* File mode */
__u16 i_uid; /* Owner Uid */
__u32 i_size; /* Size in bytes */
__u32 i_atime; /* Access time */
__u32 i_ctime; /* Creation time */
__u32 i_mtime; /* Modification time */
__u32 i_dtime; /* Deletion Time */
__u16 i_gid; /* Group Id */
__u16 i_links_count; /* Links count */
__u32 i_blocks; /* Blocks count */
__u32 i_flags; /* File flags */
union {
struct {
__u32 l_i_reserved1;
} linux1;
struct {
__u32 h_i_translator;
} hurd1;
struct {
__u32 m_i_reserved1;
} masix1;
} osd1; /* OS dependent 1 */
__u32 i_block[EXT2_N_BLOCKS];/* Pointers to blocks */
__u32 i_version; /* File version (for NFS) */
__u32 i_file_acl; /* File ACL */
__u32 i_dir_acl; /* Directory ACL */
__u32 i_reserved; /* was for fragments */
union {
struct {
__u16 l_i_size_high;
__u16 i_pad1;
__u16 l_i_uid_high;
__u16 l_i_gid_high;
__u32 l_i_reserved2[1];
} linux2;
struct {
__u16 h_i_size_high;
__u16 h_i_mode_high;
__u16 h_i_uid_high;
__u16 h_i_gid_high;
__u32 h_i_author;
} hurd2;
struct {
__u16 m_i_size_high;
__u16 m_pad1;
__u32 m_i_reserved2[2];
} masix2;
} osd2; /* OS dependent 2 */
};

Looking beyond this: if we get rid of i_dtime and do some bit
bashing, each of the 3 standard time stamps can get an extra
high bit (another century) and 20 low bits for microsecond
resolution. Being less aggressive, we use 2 flag bits and the
remaining __u32 to give all 4 time stamps an extra high bit
and 10 low bits for millisecond resolution.

We can mostly ignore time stamps for now. The only concern is
to not waste the bits we will need. This filesystem is great,
so it should be extended in a compatible way for many years.

BTW, Solaris has a file type code for ACL inodes.
Unless there is a good reason to be incompatible...