Re: [PATCH v3] ext4: fix indirect punch hole corruption

From: Chris J Arges
Date: Mon Feb 09 2015 - 16:04:07 EST


On 02/09/2015 12:21 PM, Chris J Arges wrote:
> On 02/08/2015 06:15 AM, Omar Sandoval wrote:
>> Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
>> mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
>> mapping. However, there are a bugs in a few cases.
>>
>> In the case where the punch happens within one level of indirection, we
>> expect the start and end shared branches to converge on an indirect
>> block. However, because the branches returned from ext4_find_shared do
>> not necessarily start at the same level (e.g., the partial2 chain will
>> be shallower if the last block occurs at the beginning of an indirect
>> group), the walk of the two chains can end up "missing" each other and
>> freeing a bunch of extra blocks in the process. This mismatch can be
>> handled by first making sure that the chains are at the same level, then
>> walking them together until they converge.
>>
>> In the case that a punch spans different levels of indirection, the
>> original code skips freeing the intermediate indirect trees if the last
>> block is the first triply-indirected block because it returns instead of
>> jumping to do_indirects. Additionally, a non-zero nr2 does not mean that
>> there's nothing else to free at the level of partial2: consider the case
>> where the all_zeroes in ext4_find_shared backed up the shared branch.
>>
>> Signed-off-by: Omar Sandoval <osandov@xxxxxxxxxxx>
>
> Omar,
> With this patch I no longer seem to be getting the original corruption I
> detected with my test case; however eventually I do get errors when
> trying to delete qcow2 snapshots. After getting these errors if I run
> 'qemu-img check <image>' I see the following errors:
>
> ERROR OFLAG_COPIED data cluster: l2_entry=800000018f7f0000 refcount=0
> ERROR OFLAG_COPIED data cluster: l2_entry=800000018f800000 refcount=0
> ERROR OFLAG_COPIED data cluster: l2_entry=800000018f810000 refcount=0
>
> 16941 errors were found on the image.
> Data may be corrupted, or further writes to the image may corrupt it.
>
> 60459 leaked clusters were found on the image.
> This means waste of disk space, but no harm to data.
> 88629/262144 = 33.81% allocated, 9.57% fragmented, 0.00% compressed clusters
> Image end offset: 10438180864
>
> So this patch seems to have moved the problem. I can collect additional
> logs if necessary.
>
> Thanks,
> --chris j arges
>

After ignoring snapshot deletion errors, I've hit the original
corruption problem with your patch still. I'll continue debugging this.
--chris j arges

>> ---
>> Here's a couple more fixes folded in. Still applies to v3.19-rc7.
>>
>> Changes from v2:
>> Handle skipped do_indirects when n < 4, n2 == 4, and partial2 == chain2
>> and skipped ext4_free_branches when nr2 != 0
>>
>> Changes from v1:
>> Handle partial == chain || partial2 == chain2 cases.
>> fs/ext4/indirect.c | 62 ++++++++++++++++++++++++++++--------------------------
>> 1 file changed, 32 insertions(+), 30 deletions(-)
>>
>> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
>> index 36b3696..279d9ba 100644
>> --- a/fs/ext4/indirect.c
>> +++ b/fs/ext4/indirect.c
>> @@ -1393,10 +1393,7 @@ end_range:
>> * to free. Everything was covered by the start
>> * of the range.
>> */
>> - return 0;
>> - } else {
>> - /* Shared branch grows from an indirect block */
>> - partial2--;
>> + goto do_indirects;
>> }
>> } else {
>> /*
>> @@ -1434,49 +1431,54 @@ end_range:
>> * in punch_hole so we need to point to the next element
>> */
>> partial2->p++;
>> - while ((partial > chain) || (partial2 > chain2)) {
>> - /* We're at the same block, so we're almost finished */
>> - if ((partial->bh && partial2->bh) &&
>> - (partial->bh->b_blocknr == partial2->bh->b_blocknr)) {
>> - if ((partial > chain) && (partial2 > chain2)) {
>> - ext4_free_branches(handle, inode, partial->bh,
>> - partial->p + 1,
>> - partial2->p,
>> - (chain+n-1) - partial);
>> - BUFFER_TRACE(partial->bh, "call brelse");
>> - brelse(partial->bh);
>> - BUFFER_TRACE(partial2->bh, "call brelse");
>> - brelse(partial2->bh);
>> - }
>> + while (partial > chain || partial2 > chain2) {
>> + int depth = (chain+n-1) - partial;
>> + int depth2 = (chain2+n2-1) - partial2;
>> +
>> + if (partial > chain && partial2 > chain2 &&
>> + partial->bh->b_blocknr == partial2->bh->b_blocknr) {
>> + /*
>> + * We've converged on the same block. Clear the range,
>> + * then we're done.
>> + */
>> + ext4_free_branches(handle, inode, partial->bh,
>> + partial->p + 1,
>> + partial2->p,
>> + (chain+n-1) - partial);
>> + BUFFER_TRACE(partial->bh, "call brelse");
>> + brelse(partial->bh);
>> + BUFFER_TRACE(partial2->bh, "call brelse");
>> + brelse(partial2->bh);
>> return 0;
>> }
>> +
>> /*
>> - * Clear the ends of indirect blocks on the shared branch
>> - * at the start of the range
>> + * The start and end partial branches may not be at the same
>> + * level even though the punch happened within one level. So, we
>> + * give them a chance to arrive at the same level, then walk
>> + * them in step with each other until we converge on the same
>> + * block.
>> */
>> - if (partial > chain) {
>> + if (partial > chain && depth <= depth2) {
>> ext4_free_branches(handle, inode, partial->bh,
>> - partial->p + 1,
>> - (__le32 *)partial->bh->b_data+addr_per_block,
>> - (chain+n-1) - partial);
>> + partial->p + 1,
>> + (__le32 *)partial->bh->b_data+addr_per_block,
>> + (chain+n-1) - partial);
>> BUFFER_TRACE(partial->bh, "call brelse");
>> brelse(partial->bh);
>> partial--;
>> }
>> - /*
>> - * Clear the ends of indirect blocks on the shared branch
>> - * at the end of the range
>> - */
>> - if (partial2 > chain2) {
>> + if (partial2 > chain2 && depth2 <= depth) {
>> ext4_free_branches(handle, inode, partial2->bh,
>> (__le32 *)partial2->bh->b_data,
>> partial2->p,
>> - (chain2+n-1) - partial2);
>> + (chain2+n2-1) - partial2);
>> BUFFER_TRACE(partial2->bh, "call brelse");
>> brelse(partial2->bh);
>> partial2--;
>> }
>> }
>> + return 0;
>>
>> do_indirects:
>> /* Kill the remaining (whole) subtrees */
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/