Re: [PATCH v4] ext4: fix indirect punch hole corruption

From: Chris J Arges
Date: Tue Feb 10 2015 - 21:59:32 EST


On 02/10/2015 03:44 PM, Omar Sandoval wrote:
> Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
> mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
> mapping. However, there are bugs in several corner cases. This fixes 5
> distinct bugs:
>
> 1. When there is at least one entire level of indirection between the
> start and end of the punch range and the end of the punch range is the
> first block of its level, we can't return early; we have to free the
> intervening levels.
>
> 2. When the end is at a higher level of indirection than the start and
> ext4_find_shared returns a top branch for the end, we still need to free
> the rest of the shared branch it returns; we can't decrement partial2.
>
> 3. When a punch happens within one level of indirection, we need to
> converge on an indirect block that contains the start and end. However,
> because the branches returned from ext4_find_shared do not necessarily
> start at the same level (e.g., the partial2 chain will be shallower if
> the last block occurs at the beginning of an indirect group), the walk
> of the two chains can end up "missing" each other and freeing a bunch of
> extra blocks in the process. This mismatch can be handled by first
> making sure that the chains are at the same level, then walking them
> together until they converge.
>
> 4. When the punch happens within one level of indirection and
> ext4_find_shared returns a top branch for the start, we must free it,
> but only if the end does not occur within that branch.
>
> 5. When the punch happens within one level of indirection and
> ext4_find_shared returns a top branch for the end, then we shouldn't
> free the block referenced by the end of the returned chain (this mirrors
> the different levels case).
>
> Signed-off-by: Omar Sandoval <osandov@xxxxxxxxxxx>
> ---
> Okay, two more bugfixes folded in, all described in the commit message.
> I'm finally no longer seeing xfstest generic/270 cause corruptions, even
> after running it overnight, so hopefully this is it. Chris, would you
> mind trying this out?
>

Omar,
I've completed 80 iterations of this patch so far without failure!
Normally failures have occurred between 2-15 runs. Great job, and thanks
for your persistence in fixing this issue!

Tested-by: Chris J Arges <chris.j.arges@xxxxxxxxxxxxx>


> fs/ext4/indirect.c | 105 ++++++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 71 insertions(+), 34 deletions(-)
>
> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
> index 36b3696..5e7af1c 100644
> --- a/fs/ext4/indirect.c
> +++ b/fs/ext4/indirect.c
> @@ -1393,10 +1393,7 @@ end_range:
> * to free. Everything was covered by the start
> * of the range.
> */
> - return 0;
> - } else {
> - /* Shared branch grows from an indirect block */
> - partial2--;
> + goto do_indirects;
> }
> } else {
> /*
> @@ -1427,56 +1424,96 @@ end_range:
> /* Punch happened within the same level (n == n2) */
> partial = ext4_find_shared(inode, n, offsets, chain, &nr);
> partial2 = ext4_find_shared(inode, n2, offsets2, chain2, &nr2);
> - /*
> - * ext4_find_shared returns Indirect structure which
> - * points to the last element which should not be
> - * removed by truncate. But this is end of the range
> - * in punch_hole so we need to point to the next element
> - */
> - partial2->p++;
> - while ((partial > chain) || (partial2 > chain2)) {
> - /* We're at the same block, so we're almost finished */
> - if ((partial->bh && partial2->bh) &&
> - (partial->bh->b_blocknr == partial2->bh->b_blocknr)) {
> - if ((partial > chain) && (partial2 > chain2)) {
> +
> + /* Free top, but only if partial2 isn't its subtree. */
> + if (nr) {
> + int level = min(partial - chain, partial2 - chain2);
> + int i;
> + int subtree = 1;
> +
> + for (i = 0; i <= level; i++) {
> + if (offsets[i] != offsets2[i]) {
> + subtree = 0;
> + break;
> + }
> + }
> +
> + if (!subtree) {
> + if (partial == chain) {
> + /* Shared branch grows from the inode */
> + ext4_free_branches(handle, inode, NULL,
> + &nr, &nr+1,
> + (chain+n-1) - partial);
> + *partial->p = 0;
> + } else {
> + /* Shared branch grows from an indirect block */
> + BUFFER_TRACE(partial->bh, "get_write_access");
> ext4_free_branches(handle, inode, partial->bh,
> - partial->p + 1,
> - partial2->p,
> + partial->p,
> + partial->p+1,
> (chain+n-1) - partial);
> - BUFFER_TRACE(partial->bh, "call brelse");
> - brelse(partial->bh);
> - BUFFER_TRACE(partial2->bh, "call brelse");
> - brelse(partial2->bh);
> }
> - return 0;
> }
> + }
> +
> + if (!nr2) {
> /*
> - * Clear the ends of indirect blocks on the shared branch
> - * at the start of the range
> + * ext4_find_shared returns Indirect structure which
> + * points to the last element which should not be
> + * removed by truncate. But this is end of the range
> + * in punch_hole so we need to point to the next element
> */
> - if (partial > chain) {
> + partial2->p++;
> + }
> +
> + while (partial > chain || partial2 > chain2) {
> + int depth = (chain+n-1) - partial;
> + int depth2 = (chain2+n2-1) - partial2;
> +
> + if (partial > chain && partial2 > chain2 &&
> + partial->bh->b_blocknr == partial2->bh->b_blocknr) {
> + /*
> + * We've converged on the same block. Clear the range,
> + * then we're done.
> + */
> ext4_free_branches(handle, inode, partial->bh,
> - partial->p + 1,
> - (__le32 *)partial->bh->b_data+addr_per_block,
> - (chain+n-1) - partial);
> + partial->p + 1,
> + partial2->p,
> + (chain+n-1) - partial);
> BUFFER_TRACE(partial->bh, "call brelse");
> brelse(partial->bh);
> - partial--;
> + BUFFER_TRACE(partial2->bh, "call brelse");
> + brelse(partial2->bh);
> + return 0;
> }
> +
> /*
> - * Clear the ends of indirect blocks on the shared branch
> - * at the end of the range
> + * The start and end partial branches may not be at the same
> + * level even though the punch happened within one level. So, we
> + * give them a chance to arrive at the same level, then walk
> + * them in step with each other until we converge on the same
> + * block.
> */
> - if (partial2 > chain2) {
> + if (partial > chain && depth <= depth2) {
> + ext4_free_branches(handle, inode, partial->bh,
> + partial->p + 1,
> + (__le32 *)partial->bh->b_data+addr_per_block,
> + (chain+n-1) - partial);
> + BUFFER_TRACE(partial->bh, "call brelse");
> + brelse(partial->bh);
> + partial--;
> + }
> + if (partial2 > chain2 && depth2 <= depth) {
> ext4_free_branches(handle, inode, partial2->bh,
> (__le32 *)partial2->bh->b_data,
> partial2->p,
> - (chain2+n-1) - partial2);
> + (chain2+n2-1) - partial2);
> BUFFER_TRACE(partial2->bh, "call brelse");
> brelse(partial2->bh);
> partial2--;
> }
> }
> + return 0;
>
> do_indirects:
> /* Kill the remaining (whole) subtrees */
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/