Re: bio linked list corruption.

From: Josef Bacik
Date: Fri Oct 21 2016 - 16:44:12 EST


On 10/21/2016 04:38 PM, Chris Mason wrote:


On 10/21/2016 04:23 PM, Dave Jones wrote:
On Fri, Oct 21, 2016 at 04:17:48PM -0400, Chris Mason wrote:

> > BTRFS warning (device sda3): csum failed ino 130654 off 0 csum 2566472073
expected csum 3008371513
> > BTRFS warning (device sda3): csum failed ino 131057 off 4096 csum
3563910319 expected csum 738595262
> > BTRFS warning (device sda3): csum failed ino 131176 off 4096 csum
1344477721 expected csum 441864825
> > BTRFS warning (device sda3): csum failed ino 131241 off 245760 csum
3576232181 expected csum 2566472073
> > BTRFS warning (device sda3): csum failed ino 131429 off 0 csum 1494450239
expected csum 2646577722
> > BTRFS warning (device sda3): csum failed ino 131471 off 0 csum 3949539320
expected csum 3828807800
> > BTRFS warning (device sda3): csum failed ino 131471 off 4096 csum
3475108475 expected csum 2566472073
> > BTRFS warning (device sda3): csum failed ino 131471 off 958464 csum
142982740 expected csum 2566472073
> > BTRFS warning (device sda3): csum failed ino 131471 off 0 csum 3949539320
expected csum 3828807800
> > BTRFS warning (device sda3): csum failed ino 131532 off 270336 csum
3138898528 expected csum 2566472073
> > BTRFS warning (device sda3): csum failed ino 131532 off 1249280 csum
2169165042 expected csum 2566472073
> > BTRFS warning (device sda3): csum failed ino 131649 off 16384 csum
2914965650 expected csum 1425742005
> >
> >
> > A curious thing: the expected csum 2566472073 turns up a number of times
for different inodes, and gets
> > differing actual csums each time. I suppose this could be something like
a block of all zeros in multiple files,
> > but it struck me as surprising.
> >
> > btrfs people: is there an easy way to map those inodes to a filename ?
I'm betting those are the
> > test files that trinity generates. If so, it might point to a race
somewhere.
>
> btrfs inspect inode 130654 mntpoint

Interesting, they all return

ERROR: ino paths ioctl: No such file or directory

So these files got deleted perhaps ?

Yeah, they must have.


So one thing that will cause spurious csum errors is if you do things like change the memory while it is in flight during O_DIRECT. Does trinity do that? If so then that would explain it. If not we should probably dig into it. Thanks,

Josef