Re: [PATCH RFC 5/5] ext4: Add fallocate2() support

From: Andreas Dilger
Date: Sat Feb 29 2020 - 15:13:07 EST


On Feb 28, 2020, at 2:16 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Fri, Feb 28, 2020 at 08:35:19AM -0700, Andreas Dilger wrote:
>> On Feb 27, 2020, at 5:24 AM, Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote:
>>>
>>> So, this interface is 3-in-1:
>>>
>>> 1)finds a placement for inodes extents;
>>
>> The target allocation size would be sum(size of inodes), which should
>> be relatively small in your case).
>>
>>> 2)assigns this space to some temporary donor inode;
>>
>> Maybe yes, or just reserves that space from being allocated by anyone.
>>
>>> 3)calls ext4_move_extents() for each of them.
>>
>> ... using the target space that was reserved earlier
>>
>>> Do I understand you right?
>>
>> Correct. That is my "5 minutes thinking about an interface for grouping
>> small files together without exposing kernel internals" proposal for this.
>
> You don't need any special kernel interface with XFS for this. It is
> simply:
>
> mkdir tmpdir
> create O_TMPFILEs in tmpdir
>
> Now all the tmpfiles you create and their data will be co-located
> around the location of the tmpdir inode. This is the natural
> placement policy of the filesystem. i..e the filesystem assumes that
> files in the same directory are all related, so will be accessed
> together and so should be located in relatively close proximity to
> each other.

Sure, this will likely get inodes allocate _close_ to each other on
ext4 as well (the new directory will preferentially be located in a
group that has free space), but it doesn't necessarily result in
all of the files being packed densely. For 1MB+4KB and 1MB-4KB files
they will still prefer to be aligned on 1MB boundaries rather than
packed together.

>>> Can we introduce a flag, that some of inode is unmovable?
>>
>> There are very few flags left in the ext4_inode->i_flags for use.
>> You could use "IMMUTABLE" or "APPEND_ONLY" to mean that, but they
>> also have other semantics. The EXT4_NOTAIL_FL is for not merging the
>> tail of a file, but ext4 doesn't have tails (that was in Reiserfs),
>> so we might consider it a generic "do not merge" flag if set?
>
> Indeed, thanks to XFS, ext4 already has an interface that can be
> used to set/clear a "no defrag" flag such as you are asking for.
> It's the FS_XFLAG_NODEFRAG bit in the FS_IOC_FS[GS]ETXATTR ioctl.
> In XFS, that manages the XFS_DIFLAG_NODEFRAG on-disk inode flag,
> and it has special meaning for directories. From the 'man 3 xfsctl'
> man page where this interface came from:
>
> Bit 13 (0x2000) - XFS_XFLAG_NODEFRAG
> No defragment file bit - the file should be skipped during a
> defragmentation operation. When applied to a directory,
> new files and directories created will inherit the no-defrag
> bit.

The interface is not the limiting factor here, but rather the number
of flags available in the inode. Since chattr/lsattr from e2fsprogs
was used as "common ground" for a few years, there are a number of
flags in the namespace that don't actually have any meaning for ext4.

One of those flags is:

#define EXT4_NOTAIL_FL 0x00008000 /* file tail should not be merged */

This was added for Reiserfs, but it is not used by any other filesystem,
so generalizing it slightly to mean "no migrate" is reasonable. That
doesn't affect Reiserfs in any way, and it would still be possible to
also wire up the XFS_XFLAG_NODEFRAG bit to be stored as that flag.

It wouldn't be any issue at all to chose an arbitrary unused flag to
store this in ext4 inode internally, except that chattr/lsattr are used
by a variety of different filesystems, so whatever flag is chosen will
immediately also apply to any other filesystem that users use those
tools on.

Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP