Re: [PATCH] igb: fix kexec with igb

From: Yinghai Lu
Date: Sat Mar 07 2009 - 13:50:42 EST


On Sat, Mar 7, 2009 at 10:20 AM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Yinghai Lu <yinghai@xxxxxxxxxx> writes:
>
>> On Fri, Mar 6, 2009 at 11:18 PM, Jesse Brandeburg
>> <jesse.brandeburg@xxxxxxxxx> wrote:
>>> On Fri, Mar 6, 2009 at 8:33 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>>>>
>>>> Impact: could probe igb
>>>>
>>>> Found one system with 82575EB, in the kernel that is kexeced, probe igb
>>>> failed with -2.
>>>>
>>>> it looks like the same behavior happened on forcedeth.
>>>>
>>>> try to check system_state to make sure if put it on D3
>>>>
>>>> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
>>>>
>>>> ---
>>>>  drivers/net/igb/igb_main.c |   19 ++++++++++++++-----
>>>>  1 file changed, 14 insertions(+), 5 deletions(-)
>>>
>>> I see the point of the patch, but I know for a fact that ixgbe when
>>> enabled for MSI-X also doesn't work with kexec.
>>>
>>> so my questions are:
>>> are you going to change every driver?
>>
>> i tend to only change driver that i have related HW.
>>
>>> why can't this be fixed in core kernel code instead?
>> will check it.
>>
>>> Shouldn't pci_enable_device take it out of D3?
>>> Or maybe it should be taken out of D3 immediately if someone tries to
>>> ioremap any of the BARx registers?
>>
>>
>> looks like second kernel can not detect the state any more.
>
> I know this has historically been a problem with the e1000 NICs.
> Placing the hardware in a state they can not get them out of on
> the reboot path.
>
> Last I heard (a couple of weeks ago?) we had code to bring devices out
> of a low power state that was working for the e1000 driver.

in net-next tree?

>
> YH can you look and see if you can find that code and if it works?

it seems e1000 and e1000e works well for a long time.

>
> <rant>
> Frankly I don't understand why anyone would want to power down a device
> when they are rebooting or shutting down a computer.  That is a
> system state change.  But it seems to be bleed over from the confusion
> that has been the power management code.
> </rant>

agreed.

>
> If we can teach the kernel to handle this case with the proper enables
> and disables that would be great.  Otherwise let's look at getting the
> responsibilities of the various methods sorted out so we can at least
> say with certainty what the various methods are supposed to do and
> not do.

root cause could be some BIOS/acpi lying to Kernel about device state querying.

for igb, it only has problem on one platform. but other one platform is ok.
for forcedeth, it seems all platform has the problem, aka you can not
put it in D3 in kexec path.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/