Re: [PATCH] ext4: fix quota accounting WARN in bigalloc punch hole

From: Zhang Yi

Date: Fri May 29 2026 - 06:38:05 EST


On 5/29/2026 4:34 PM, Qiliang Yuan wrote:
> Hi Zhang Yi,
>
> On 5/29/2026 3:32 PM, Zhang Yi wrote:
>> On 5/28/2026 6:21 PM, Qiliang Yuan wrote:
>>> __es_remove_extent() -> get_rsvd() already correctly excludes
>>> boundary clusters that still contain delayed blocks from resv_used.
>>> Adding pending to resv_used double-counts those boundary clusters,
>>> erroneously releasing reservations that are still needed.
>>
>> Hmm, the analysis doesn't seem correct to me. Do you mean the
>> following case?
>>
>> # Assume the cluster size is 16KB.
>> xfs_io -f -c "pwrite 12k 4k" /mnt/foo
>> xfs_io -d -c "pwrite 0 4k" /mnt/foo
>> xfs_io -c "fpunch 0 4k" /mnt/foo
>>
>> During the direct I/O write, quota space will be added in
>> ext4_mb_new_blocks() because the EXT4_MB_DELALLOC_RESERVED flag is
>> not set. Therefore, in ext4_es_insert_extent(), we should release the
>> quota reservations, since this cluster has already been allocated.
>>
>> Then, in the third operation (punch hole), it will reclaim the added
>> dqb_curspace. This should not cause an insufficiency.
>>
>> Am I missing something?
>
> Thanks for the review! Let me explain the issue with your specific
> example.

Thanks for the explanation, some comments below.

>
> After step 1 (delalloc write 4KB@12KB in a 16KB cluster), we have:
>
> ES tree: [0,1) hole, [1,3) delayed, [3,4) hole (blocks 0..3)

[0,3) hole, [3,4) delayed(blocks 0..3) ?

> Quota: dqb_rsvspace += 16KB (one cluster reserved)

Right.

>
> Step 2 (DIO write 4KB@0KB, RWF_DSYNC):
>
> The DIO allocates one cluster, but the mapped extent from
> ext4_ext_map_blocks() only covers the written range, e.g. [0,0].
>
> In ext4_es_insert_extent():
>
> a) __es_remove_extent([0,0]) removes the [0,1) hole entry, but
> there are no delayed extents within [0,0], so resv_used = 0.
>
> This is correct: the DIO extent [0,0] does not overlap the
> delayed region [1,3).

Right.

>
> b) __revise_pending() scans outside the newly inserted extent [0,0]:
>
> - Left boundary (block 0): the range starts at cluster
> boundary, no blocks to scan on the left → no pending insert.
>
> - Right boundary (block 3): blocks [1,3] are outside [0,0]
> and are delayed → __revise_pending() inserts a pending
> reservation and returns pending = 1.
>
> This pending reservation means: "cluster containing block 3
> still has delayed blocks, keep this cluster reserved."

Right.

>
> c) Then comes the bug:
>
> resv_used += pending; // resv_used = 0 + 1 = 1
>
> This causes ext4_da_update_reserve_space() to release 16KB
> of quota reservation (dquot_release_reservation_block()). But
> block 3 is still delayed! Its quota reservation should NOT
> be released — no blocks within [0,0] actually used the delalloc
> reservation.

No. Because this cluster has already been allocated, it needs to occupy
actual quota space and can no longer use reserved counts, so the
reserved counts must be released.

For better understanding, please see the implementation of
ext4_insert_delayed_blocks(). When a new extent is delallocated to an
already allocated cluster using the the reserved count is not increased
(ext4_clu_alloc_state() returns 2).

>
> So after DIO:
> dqb_rsvspace: originally 16KB, now 0 (incorrectly released)
> dqb_curspace: 16KB (from ext4_mb_new_blocks)
>
> Step 3 (punch 4KB@0KB):
>
> ext4_remove_blocks() sees that block 0's cluster has a pending
> reservation → calls ext4_rereserve_cluster() → dquot_reclaim_block()
> tries to move 16KB from dqb_curspace to dqb_rsvspace. But
> dqb_curspace may already be insufficient (depending on whether
> other allocs/frees have happened), triggering:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
How exactly did it happen? All data cluster allocations are precisely
calculated. In addition, the current example cannot reproduce the issue
you encountered.

>
> WARNING at dquot_reclaim_space_nodirty
>
> Step 4 (delalloc writeback of block 3):
>
> ext4_da_update_reserve_space() → dquot_claim_block() tries to
> move 16KB from dqb_rsvspace to dqb_curspace. Since rsvspace was
> incorrectly released and not fully restored by the punch hole's
> rereseve, this triggers:
>
> WARNING at dquot_claim_space_nodirty
>
> The key point is: pending from __revise_pending() indicates clusters
> that *still contain delayed blocks outside the newly inserted extent*.
> These clusters' quota reservations must be preserved — get_rsvd()

Whether to reserve a quota is determined not by whether there are
delalloc extents left, but by whether there are only pure delalloc
extents in this cluster range.

Cheers,
Yi.

> inside __es_remove_extent() already correctly excludes them from
> resv_used. Adding pending to resv_used double-counts them.
>
> I also found a pre-existing retry loop issue: if __es_insert_extent()
> fails with -ENOMEM on the first pass, resv_used carries a stale value
> back to retry and could be double-released. I'll include a fix for
> that in v2.
>