On Wed, Jun 05, 2024 at 03:26:11PM +0100, John Garry wrote:
Hi Dave,
I still think that there is a problem with this code or some other allocator
code which gives rise to unexpected -ENOSPC. I just highlight this code,
above, as I get an unexpected -ENOSPC failure here when the fs does have
many free (big enough) extents. I think that the problem may be elsewhere,
though.
Initially we have a file like this:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..127]: 62592..62719 0 (62592..62719) 128
1: [128..895]: hole 768
2: [896..1023]: 63616..63743 0 (63616..63743) 128
3: [1024..1151]: 64896..65023 0 (64896..65023) 128
4: [1152..1279]: 65664..65791 0 (65664..65791) 128
5: [1280..1407]: 68224..68351 0 (68224..68351) 128
6: [1408..1535]: 76416..76543 0 (76416..76543) 128
7: [1536..1791]: 62720..62975 0 (62720..62975) 256
8: [1792..1919]: 60032..60159 0 (60032..60159) 128
9: [1920..2047]: 63488..63615 0 (63488..63615) 128
10: [2048..2303]: 63744..63999 0 (63744..63999) 256
forcealign extsize is 16 4k fsb, so the layout looks ok.
Then we truncate the file to 454 sectors (or 56.75 fsb). This gives:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..127]: 62592..62719 0 (62592..62719) 128
1: [128..455]: hole 328
We have 57 fsb.
Then I attempt to write from byte offset 232448 (454 sector) and a get a
write failure in xfs_bmap_select_minlen() returning -ENOSPC; at that point
the file looks like this:
So you are doing an unaligned write of some size at EOF and EOF is
not aligned to the extsize?
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..127]: 62592..62719 0 (62592..62719) 128
1: [128..447]: hole 320
2: [448..575]: 62720..62847 0 (62720..62847) 128
That hole in ext #1 is 40 fsb, and not aligned with forcealign granularity.
This means that ext #2 is misaligned wrt forcealign granularity.
OK, so the command to produce this would be something like this?
# xfs_io -fd -c "truncate 0" \
-c "chattr +<forcealign>" -c "extsize 64k" \
-c "pwrite 0 64k -b 64k" -c "pwrite 448k 64k -b 64k" \
-c "bmap -vvp" \
-c "truncate 227k" \
-c "bmap -vvp" \
-c "pwrite 227k 64k -b 64k" \
-c "bmap -vvp" \
/mnt/scratch/testfile
This is strange.
I notice that we when allocate ext #2, xfs_bmap_btalloc() returns
ap->blkno=7840, length=16, offset=56. I would expect offset % 16 == 0, which
it is not.
IOWs, the allocation was not correctly rounded down to an aligned
start offset. What were the initial parameters passed to this
allocation?
i.e. why didn't it round the start offset down to 48?
Answering that question will tell you where the bug is.
Of course, if the allocation start is rounded down to 48, then
the length should be rounded up to 32 to cover the entire range we
are writing new data to.
In the following sub-io block zeroing, I note that we zero the front padding
from pos=196608 (or fsb 48 or sector 384) for len=35840, and back padding
from pos=263680 for len=64000 (upto sector 640 or fsb 80). That seems wrong,
as we are zeroing data in the ext #1 hole, right?
The sub block zeroing is doing exactly the right thing - it is
demonstrating the exact range that the force aligned allocation
should have covered.
As above, the problem seems in the processing fix-up.
Now the actual -ENOSPC comes from xfs_bmap_btalloc() -> ... ->
xfs_bmap_select_minlen() with initially blen=32 args->alignment=16
ap->minlen=1 args->maxlen=8. There xfs_bmap_btalloc() has ap->length=8
initially. This may be just a symptom.
Yeah, now the allocator is trying to fix up the mess that the first unaligned
allocation created, and it's tripping over ENOSPC because it's not
allowed to do sub-extent size hint allocations when forced alignment
is enabled....
I guess that there is something wrong in the block allocator for ext #2. Any
idea where to check?
Start with tracing exactly what range iomap is requesting be
allocated, and then follow that through into the allocator to work
out why the offset being passed to the allocation never gets rounded
down to be aligned. There's a mistake in the logic somewhere that is
failing to apply the start alignment to the allocation request (i.e.
the bug will be in the allocation setup code path. i.e. somewhere
in the xfs_bmapi_write -> xfs_bmap_btalloc path *before* we get to
the xfs_alloc_vextent...() calls.