Re: [PATCH 09/20] ext4: Initialize timestamps limits
From: Andreas Dilger
Date: Wed Aug 07 2019 - 14:04:51 EST
On Aug 3, 2019, at 2:24 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Sat, Aug 3, 2019 at 6:03 PM Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
>> On Sat, Aug 03, 2019 at 11:30:22AM +0200, Arnd Bergmann wrote:
>>> I see in the ext4 code that we always try to expand i_extra_size
>>> to s_want_extra_isize in ext4_mark_inode_dirty(), and that
>>> s_want_extra_isize is always at least s_min_extra_isize, so
>>> we constantly try to expand the inode to fit.
>> Yes, we *try*. But we may not succeed. There may actually be a
>> problem here if the cause is due to there simply is no space in the
>> external xattr block, so we might try and try every time we try to
>> modify that inode, and it would be a performance mess. If it's due to
>> there being no room in the current transaction, then it's highly
>> likely it will succeed the next time.
>>> Did older versions of ext4 or ext3 ignore s_min_extra_isize
>>> when creating inodes despite
>>> or is there another possibility I'm missing?
>> s_min_extra_isize could get changed in order to make room for some new
>> file system feature --- such as extended timestamps.
> Ok, that explains it. I assumed s_min_extra_isize was meant to
> not be modifiable, and did not find a way to change it using the
> kernel or tune2fs, but now I can see that debugfs can set it.
>> If you want to pretend that file systems never get upgraded, then life
>> is much simpler. The general approach is that for less-sophisticated
>> customers (e.g., most people running enterprise distros) file system
>> upgrades are not a thing. But for sophisticated users, we do try to
>> make thing work for people who are aware of the risks / caveats /
>> rough edges. Google won't have been able to upgrade thousands and
>> thousands of servers in data centers all over the world if we limited
>> ourselves to Red Hat's support restrictions. Backup / reformat /
>> restore really isn't a practical rollout strategy for many exabytes of
>> file systems.
>> It sounds like your safety checks / warnings are mostly targeted at
>> low-information customers, no?
> Yes, that seems like a reasonable compromise: just warn based
> on s_min_extra_isize, and assume that anyone who used debugfs
> to set s_min_extra_isize to a higher value from an ext3 file system
> during the migration to ext4 was aware of the risks already.
> That leaves the question of what we should set the s_time_gran
> and s_time_max to on a superblock with s_min_extra_isize<16
> and s_want_extra_isize>=16.
> If we base it on s_min_extra_isize, we never try to set a timestamp
> later than 2038 and so will never fail, but anyone with a grandfathered
> s_min_extra_isize from ext3 won't be able to set extended
> timestamps on any files any more. Based on s_want_extra_isize
> we would keep the current behavior, but could add a custom
> warning in the ext4 code about the small s_min_extra_isize
> indicating a theoretical problem.
I think it makes the most sense to always try to set timestamps on
inodes that have enough space for them. The chance of running into
a filesystem with 256-byte inode size but *no* space in the inode to
store an extended timestamp, but is *also* being modified by a new
kernel after 2038 is vanishingly small. This would require formatting
the filesystem with non-default mke2fs for ext3, using the filesystem
and storing enough xattrs on inodes that there isn't space for 12 bytes
of extra isize, and using it for 30+ years without upgrading to use
ext4 (which will also try to expand the inode to store the nsec
timestamps) and then modifying the inode after 2038.
Rather than printing a warning at mount time (which may be confusing
to users for a problem they may never see), it makes sense to only
print such a warning in the vanishingly small case that someone actually
tries to modify the inode timestamp but it doesn't fit, rather than on
the theoretical case that may never happen.
Description: Message signed with OpenPGP