Re: NULL pointer dereference in ext4_ext_remove_space on 3.5.1

From: Fengguang Wu
Date: Fri Aug 17 2012 - 02:01:08 EST


On Thu, Aug 16, 2012 at 11:25:13AM -0400, Theodore Ts'o wrote:
> On Thu, Aug 16, 2012 at 07:10:51PM +0800, Fengguang Wu wrote:
> >
> > Here is the dmesg. BTW, it seems 3.5.0 don't have this issue.
>
> Fengguang,
>
> It sounds like you have a (at least fairly) reliable reproduction for
> this problem? Is it something you can share? It would be good to get

Right, it can be easily reproduced here. I'm running these writeback
performance tests:

https://github.com/fengguang/writeback-tests

Which is basically doing N parallel dd writes to JBOD/RAID arrays on
various filesystems. It seems that the RAID test can reliably trigger
the problem.

> this into our test suites, since it was _not_ something that was
> caught by xfstests, apparently.
>
> Can you see if this patch addresses it? (The first two patch hunks
> are the same debugging additions I had posted before.)
>
> It looks like the responsible commit is 968dee7722: "ext4: fix hole
> punch failure when depth is greater than 0". I had thought this patch
> was low risk if you weren't using the new punch ioctl, but it turns
> out it did make a critical change in the non-punch (i.e., truncate)
> code path, which is what the addition of "i = 0;" in the patch below
> addresses.

Yes, I'm sure the patch fixed the bug. With the fix, the writeback
tests have run flawlessly for a dozen hours without any problem.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/