Re: 2.6.21-git10/11: files getting truncated on xfs? or maybe an nlink problem?

From: David Chinner
Date: Wed May 09 2007 - 19:17:19 EST


On Wed, May 09, 2007 at 02:09:50PM -0700, Jeremy Fitzhardinge wrote:
> I've had a couple of instances of a linux-2.6 mercurial repo getting
> corrupted in some odd way this morning. It looks like files are being
> truncated; not to size 0, but losing something off the end.
>
> This is on an xfs filesystem. I haven't had any crashes/oops, and I
> don't think its the normal files getting filled with 0 problem. I saw
> this before the most recent set of xfs updates, but it happened again
> afterwards too.

It looks like the latest XFS changes haven't been pulled yet, so
it's not new code that is triggering this....

> Mercurial uses a strictly append-only model for updating its repo files,
> but it looks like maybe an append operation didn't stick.
>
> I'm repulling a fresh copy of the repo; I'll be able to compare
> before/after. Update: yep, definitely truncated:
>
> $ ls -l .hg-new/store/data/_documentation/pi-futex.txt.i .hg-broken/store/data/_documentation/pi-futex.txt.i
> 4 -rw-rw-r-- 1 jeremy jeremy 3309 May 9 09:43 .hg-broken/store/data/_documentation/pi-futex.txt.i
> 4 -rw-rw-r-- 1 jeremy jeremy 3797 May 9 13:38 .hg-new/store/data/_documentation/pi-futex.txt.i
>
> also
> 3476 -rw-rw-r-- 1 jeremy jeremy 3558208 May 9 13:55 00manifest.i
> 3476 -rw-rw-r-- 1 jeremy jeremy 3555200 May 9 09:41 00manifest.i~
>
>
> where 00manifest.i~ is the broken one. The files are identical up to the
> truncation point.

Hmmm - that is bizarre. What is the output of xfs_bmap -vvp <filename>
on each of those files?

what happens to these files after then are downloaded? Does it only
happen to append-only files or are other files affected as well?

BTW, what's the 'xfs_info <mntpt>' output for this filesystem?

> The repo passed "hg verify" just after I pulled it, so this corruption
> came about after a while.
>
> Hm, the other possibility is that nlinks is being misreported. When
> cloning a repo, mercurial will generally hard-link files where possible,
> and then break the link if it sees nlink > 1. If xfs is mis-reporting
> the link count, then this will cause havok. Is that possible? Seems
> unlikely, but it would also explain the symptoms. I just did a linking
> clone with an older kernel, and the link count is as expected.

I'd be surprised if it was a link count problem - that would cause
all sorts of other problems as well....

> xfs_check passes without any output, which I presume is good.

Yes, it means everythign is ok. You only have to worry when xfs_check
says something - it only brings bad news ;)

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/