Re: [GIT PULL] Core block IO bits for 2.6.39 - early Oops

From: Jens Axboe
Date: Fri Mar 25 2011 - 03:26:32 EST


On 2011-03-25 05:41, Dave Chinner wrote:
> On Thu, Mar 24, 2011 at 08:34:41PM +0100, Markus Trippelsdorf wrote:
>> On 2011.03.24 at 19:58 +0100, Jens Axboe wrote:
>>> On 2011-03-24 19:54, Markus Trippelsdorf wrote:
>>>> On 2011.03.24 at 19:51 +0100, Jens Axboe wrote:
>>>>> On 2011-03-24 19:36, Jens Axboe wrote:
>>>>>> On 2011-03-24 19:30, Markus Trippelsdorf wrote:
>>>>>>> On 2011.03.24 at 14:43 +0100, Jens Axboe wrote:
>>>>>>>>
>>>>>>>> This is the main pull request for the block IO layer and friends for
>>>>>>>> 2.6.39.
>>>>>>>
>>>>>>> This merge results in an early oops on my system (amd64, xfs).
>>>>>>> See the attached photo.
>>>>>>>
>>>>>>
>>>>>> Auch. Can you ensure that you have CONFIG_DEBUGINFO=y in your .config
>>>>>> and then do:
>>>>>>
>>>>>> $ gdb vmlinux
>>>>>> ...
>>>>>> l *cfq_insert_request+0x32
>>>>>>
>>>>>> and send that output?
>>>>>
>>>>> I took a closer look at the oops, and it most likely looks like q ==
>>>>> NULL (offset 0x18 == q->elevator). You left out the Code part, so I
>>>>> can't verify that for certain. Which makes very little sense. I take it
>>>>> this is 100% reproducible? When you send the gdb output, please also
>>>>> attach your .config.
>>>>
>>>> Yes, it's 100% reproducible here. My .config follows:
>>>
>>> Can you try this patch and see if it makes a difference?
>>
>> There's no patch ;-)
>>
>>> If you boot without the patch and add elevator=noop, does it then work?
>>
>> It works insofar as the Oops is gone. But my xfs partitions apparently
>> still get corrupted (I had to run xfs_repair on several of them, because
>> they would not mount otherwise).
>
> So the patchset is causing repeatable filesystem corruption? Sounds
> to me like this series is not yet ready for mainline merging. Last
> thing I want to spend the .39 cycle helping people recover busted
> filesystems as a result of undercooked block layer changes...

Well, the last thing I want to do is be responsible for screwing peoples
file systems. I have been running these changes on my laptop, desktop,
and test machines for the last month at least. It's been in linux-next
for about that long, too. I'm extremely puzzled at this issue that
Markus reports.

So believe me, if we can't resolve this very quickly then we'll pull it
back out.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/