AFAICS, forcealign behaviour is same as RT, so then this would be a mainline
bug, right?
No, I don't think so. I think this is a case where forcealign and RT
behaviours differ, primarily because RT doesn't actually care about
file offset -> physical offset alignment and can do unaligned IO
whenever it wants. Hence having an unaligned written->unwritten
extent state transition doesnt' affect anything for rt files...
We change the space reservation in xfs-setattr_size() for this case(patch 9) but then don't do any alignment there - it relies on
xfs_itruncate_extents_flags() to do the right thing w.r.t. extent
removal alignment w.r.t. the new EOF.
i.e. The xfs_setattr_size() code takes care of EOF block zeroing and
page cache removal so the user doesn't see old data beyond EOF,
whilst xfs_itruncate_extents_flags() is supposed to take care of the
extent removal and the details of that operation (e.g. alignment).
So we should roundup the unmap block to the alloc unit, correct? I have my
doubts about that, and thought that xfs_bunmapi_range() takes care of any
alignment handling.
The start should round up, yes. And, no, xfs_bunmapi_range() isn't
specifically "taking care" of alignment. It's just marking a partial
alloc unit range as unwritten, which means we now have -unaligned
extents- in the BMBT.
Patch 10 also modifies xfs_can_free_eofblocks() to take alignment
into account for the post-eof block removal, but doesn't change
xfs_free_eofblocks() at all. i.e it also relies on
xfs_itruncate_extents_flags() to do the right thing for force
aligned inodes.
What state should the blocks post-EOF blocks be? A simple example of
partially truncating an alloc unit is:
$xfs_io -c "extsize" mnt/file
[16384] mnt/file
$xfs_bmap -vvp mnt/file
mnt/file:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..20479]: 192..20671 0 (192..20671) 20480 000000
$truncate -s 10461184 mnt/file # 10M - 6FSB
$xfs_bmap -vvp mnt/file
mnt/file:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..20431]: 192..20623 0 (192..20623) 20432 000000
1: [20432..20447]: 20624..20639 0 (20624..20639) 16 010000
FLAG Values:
0010000 Unwritten preallocated extent
Is that incorrect state?
Think about it: what happens if you now truncate it back up to 10MB
(i.e. aligned length) and then do an aligned atomic write on it.
First: What happens when you truncate up?
......
Yes, iomap_zero_range() will see the unwritten extent and skip it.
i.e. The unwritten extent stays as an unwritten extent, it's now
within EOF. That written->unwritten extent boundary is not on an
aligned file offset.
Second: What happens when you do a correctly aligned atomic write
that spans this range now?
......
Iomap only maps a single extent at a time, so it will only map the
written range from the start of the IO (aligned) to the start of the
unwritten extent (unaligned). Hence the atomic write will be
rejected because we can't do the atomic write to such an unaligned
extent.
That's not a bug in the atomic write path - this failure occurs
because of the truncate behaviour doing post-eof unwritten extent
conversion....
Yes, I agree that the entire -physical- extent is still correctly
aligned on disk so you could argue that the unwritten conversion
that xfs_bunmapi_range is doing is valid forced alignment behaviour.
However, the fact is that breaking the aligned physical extent into
two unaligned contiguous extents in different states in the BMBT
means that they are treated as two seperate unaligned extents, not
one contiguous aligned physical extent.
This extent state discontiunity is results in breaking physical IO
across the extent state boundary. Hence such an unaligned extent
state change violates the physical IO alignment guarantees that
forced alignment is supposed to provide atomic writes...
So how do you think that we should actually implement xfs_itruncate_extents_flags() properly for forcealign? Would it simply be like:
This is the reason why operations like hole punching round to extent
size for forced alignment at the highest level. i.e. We cannot allow
xfs_bunmapi_range() to "take care" of alignment handling because
managing partial extent unmappings with sub-alloc-unit unwritten
extents does not work for forced alignment.
Again, this is different to the traditional RT file behaviour - it
can use unwritten extents for sub-alloc-unit alignment unmaps
because the RT device can align file offset to any physical offset,
and issue unaligned sector sized IO without any restrictions. Forced
alignment does not have this freedom, and when we extend forced
alignment to RT files, it will not have the freedom to use
unwritten extents for sub-alloc-unit unmapping, either.