Re: [PATCH 09/20] ext4: Initialize timestamps limits

From: Arnd Bergmann
Date: Fri Aug 02 2019 - 15:01:13 EST

On Fri, Aug 2, 2019 at 5:43 PM Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> On Fri, Aug 02, 2019 at 12:39:41PM +0200, Arnd Bergmann wrote:
> > Is it correct to assume that this kind of file would have to be
> > created using the ext3.ko file system implementation that was
> > removed in linux-4.3, but not usiing ext2.ko or ext4.ko (which
> > would always set the extended timestamps even in "-t ext2" or
> > "-t ext3" mode)?
> Correct. Some of the enterprise distro's were using ext4 to support
> "mount -t ext3" even before 4.3. There's a CONFIG option to enable
> using ext4 for ext2 or ext3 if they aren't enabled.
> > If we check for s_min_extra_isize instead of s_inode_size
> > to determine s_time_gran/s_time_max, we would warn
> > at mount time as well as and consistently truncate all
> > timestamps to full 32-bit seconds, regardless of whether
> > there is actually space or not.
> >
> > Alternatively, we could warn if s_min_extra_isize is
> > too small, but use i_inode_size to determine
> > s_time_gran/s_time_max anyway.
> Even with ext4, s_min_extra_isize doesn't guarantee that will be able
> to expand the inode. This can fail if (a) we aren't able to expand
> existing the transaction handle because there isn't enough space in
> the journal, or (b) there is already an external xattr block which is
> also full, so there is no space to evacuate an extended attribute out
> of the inode's extra space.

I must have misunderstood what the field says. I expected that
with s_min_extra_isize set beyond the nanosecond fields, there
would be a guarantee that all inodes have at least as many
extra bytes already allocated. What circumstances would lead to
an i_extra_isize smaller than s_min_extra_isize?

> We could be more aggressive by trying to expand make room in the inode
> in ext4_iget (when we're reading in the inode, assuming the file
> system isn't mounted read/only), instead of in the middle of
> mark_inode_dirty(). That will eliminate failure mode (a) --- which is
> statistically rare --- but it won't eliminate failure mode (b).
> Ultimately, the question is which is worse: having a timestamp be
> wrong, or randomly dropping an xattr from the inode to make room for
> the extended timestamp. We've come down on it being less harmful to
> have the timestamp be wrong.
> But again, this is a pretty rare case. I'm not convinced it's worth
> stressing about, since it's going to require multiple things to go
> wrong before a timestamp will be bad.

Agreed, I'm not overly worried about this happening frequently,
I'd just feel better if we could reliably warn about the few instances
that might theoretically be affected.