Re: 2.6.30-rc1: invalid opcode with call trace

From: Vegard Nossum
Date: Wed Apr 08 2009 - 12:15:37 EST


2009/4/8 Ingo Molnar <mingo@xxxxxxx>:
>
> * Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
>
>> On Wed, Apr 08 2009, Vegard Nossum wrote:

>> > Would you please try this patch? It has the same symptoms as a few
>> > other reports, only that this is 32-bit (and that makes it a bit
>> > different).
>> >
>> > http://marc.info/?l=linux-kernel&m=123909566829773&w=2
>> >
>> > I think Len Brown has applied it to the ACPI tree already.
>>
>> Works for me!
>
> My 'boot hang' problem is independent of that bug i think.

I agree.

The problem is that you have two async port probes:

[ 24.177306] calling 1_async_port_probe+0x0/0xaa @ 2841
[ 24.177825] calling 2_async_port_probe+0x0/0xaa @ 2842

of which only the first completes, because the first async call itself
tries to flush the async list while holding a lock (the
&shost->scan_mutex in __scsi_add_device), causing deadlock.

In short, I don't think we should call async_synchronize_full() from
scsi_complete_async_scans() at all. I'm including a more detailed
description/justification in the patch (attached).

Arjan, can you comment?


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

Attachment: 0001-scsi-don-t-wait-for-async-operations-in-scsi_comple.patch
Description: Binary data