Re: [bisected] NFS corruption with 3.4

From: Holger Hoffstaette
Date: Tue Jun 05 2012 - 09:46:02 EST


On Tue, 05 Jun 2012 08:45:37 -0400, Dave Jones wrote:

> > It worked fine for years, until now. With kernel 3.4, everyting works
> > only for the first time after boot (and not always). Next time (next
> > machine), partimage aborts almost immediately as it's probably unable
> > to decompress the image file. md5sum is different on my machine vs. on
> > the target (through NFS). Also SystemRescueCD boot aborts with md5
> > error sometimes. Everything works fine after rebooting back to 3.3.
> >
> > Bisection found this:
> >
> > 0fc9d1040313047edf6a39fd4d7c7defdca97c62 is the first bad commit commit
> > 0fc9d1040313047edf6a39fd4d7c7defdca97c62 Author: Konstantin Khlebnikov
> > <khlebnikov@xxxxxxxxxx> Date: Wed Mar 28 14:42:54 2012 -0700
> >
> > radix-tree: use iterators in find_get_pages* functions
> >
> > Reverting this commit in 3.4 fixes the problem.
>
> I meant to come back to this, because I saw this problem too.

Same here, seen just yesterday.

> is this patch a problem for the client, or the server ? I'm assuming the

In my case I tried to unpack a remote kernel tarball locally to a client
and suddenly got gzip/tar checksum/EOF errors, which repeatably didn't
show up when unpacking said archive directly on the server. Somewhat
confused I re-created a fresh tarball, which then unpacked fine on the
client. Looks like this is a pagecache race/staleness issue.

-h


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/