Re: [Updated patch] Re: [PATCH] igb: fix kexec with igb

From: Jeff Kirsher
Date: Mon Mar 30 2009 - 17:37:05 EST


On Sun, Mar 29, 2009 at 4:19 AM, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
> On Sunday 29 March 2009, Jeff Kirsher wrote:
>> On Sat, Mar 28, 2009 at 2:27 PM, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>> > On Tuesday 24 March 2009, Yinghai Lu wrote:
>> >> Rafael J. Wysocki wrote:
>> >> > On Thursday 12 March 2009, Yinghai Lu wrote:
>> >> >> Rafael J. Wysocki wrote:
>> >> >>> On Sunday 08 March 2009, Rafael J. Wysocki wrote:
>> >> >>>> On Sunday 08 March 2009, Yinghai Lu wrote:
>> >> >>>>> Rafael J. Wysocki wrote:
>> >> >>>>>> On Saturday 07 March 2009, Yinghai Lu wrote:
>> >> >>>>>>> On Fri, Mar 6, 2009 at 11:18 PM, Jesse Brandeburg
>> >> >>>>>>> <jesse.brandeburg@xxxxxxxxx> wrote:
>> >> >>>>>>>> On Fri, Mar 6, 2009 at 8:33 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>> >> >>>>>>>>> Impact: could probe igb
>> >> >>>>>>>>>
>> >> >>>>>>>>> Found one system with 82575EB, in the kernel that is kexeced, probe igb
>> >> >>>>>>>>> failed with -2.
>> >> >>>>>>>>>
>> >> >>>>>>>>> it looks like the same behavior happened on forcedeth.
>> >> >>>>>>>>>
>> >> >>>>>>>>> try to check system_state to make sure if put it on D3
>> >> >>>>>>>>>
>> >> >>>>>>>>> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
>> >> >>>>>>>>>
>> >> >>>>>>>>> ---
>> >> >>>>>>>>> Âdrivers/net/igb/igb_main.c | Â 19 ++++++++++++++-----
>> >> >>>>>>>>> Â1 file changed, 14 insertions(+), 5 deletions(-)
>> >> >>>>>>>> I see the point of the patch, but I know for a fact that ixgbe when
>> >> >>>>>>>> enabled for MSI-X also doesn't work with kexec.
>> >> >>>>>>>>
>> >> >>>>>>>> so my questions are:
>> >> >>>>>>>> are you going to change every driver?
>> >> >>>>>>> i tend to only change driver that i have related HW.
>> >> >>>>>>>
>> >> >>>>>>>> why can't this be fixed in core kernel code instead?
>> >> >>>>>>> will check it.
>> >> >>>>>>>
>> >> >>>>>>>> Shouldn't pci_enable_device take it out of D3?
>> >> >>>>>>>> Or maybe it should be taken out of D3 immediately if someone tries to
>> >> >>>>>>>> ioremap any of the BARx registers?
>> >> >>>>>>> looks like second kernel can not detect the state any more.
>> >> >>>>>> In fact pci_enable_device() calls pci_set_power_state(dev, PCI_D0) as the first
>> >> >>>>>> thing. ÂThe question is why it doesn't work as expected.
>> >> >>>>> not sure... please check the version for forcedeth that you made.
>> >> >>>>>
>> >> >>>>> commit 3cb5599a84c557c0dd9a19feb63a3788268cf249
>> >> >>>>> Author: Rafael J. Wysocki <rjw@xxxxxxx>
>> >> >>>>> Date: Â Fri Sep 5 14:00:19 2008 -0700
>> >> >>>>>
>> >> >>>>> Â Â forcedeth: fix kexec regression
>> >> >>>>>
>> >> >>>>> Â Â Fix regression tracked as http://bugzilla.kernel.org/show_bug.cgi?id=11361
>> >> >>>>> Â Â and caused by commit f735a2a1a4f2a0f5cd823ce323e82675990469e2 ("[netdrvr]
>> >> >>>>> Â Â forcedeth: setup wake-on-lan before shutting down") that makes network
>> >> >>>>> Â Â adapters integrated into the NVidia MCP55 chipsets fail to work in kexeced
>> >> >>>>> Â Â kernels. ÂThe problem appears to be that if the adapter is put into D3_hot
>> >> >>>>> Â Â during ->shutdown(), it cannot be brought back into D0 after kexec (ref.
>> >> >>>>> Â Â http://marc.info/?l=linux-kernel&m=121900062814967&w=4). ÂTherefore, only
>> >> >>>>> Â Â put forcedeth into D3 during ->shutdown() if the system is to be powered
>> >> >>>>> Â Â off.
>> >> >>>> Thanks, I remember now.
>> >> >>> In which case you need to rework igb_shutdown() rather than igb_suspend().
>> >> >>>
>> >> >>> Something like the patch below, perhaps (totally untested).
>> >> >> it works, David, can you picked it up
>> >> >
>> >> > Still, Yinghai, can you please also test the patch below? ÂIt fixes all
>> >> > shortcomings in the driver's suspend and shutdown methods I was talking about
>> >> > in one of the previous messages. ÂIf it works, IMO it will be a preferable fix
>> >> > (in particular, it would be good to check if WoL still works with it,
>> >> > but I don't have the hardware).
>> >> >
>> >> > Thanks,
>> >> > Rafael
>> >> >
>> >> > ---
>> >> > From: Rafael J. Wysocki <rjw@xxxxxxx>
>> >> > Subject: net/igb: Fix kexec with igb (rev. 3)
>> >> > Impact: Fix
>> >> >
>> >> > Yinghai Lu found one system with 82575EB where, in the kernel that is
>> >> > kexeced, probe igb failed with -2, the reason being that the adapter
>> >> > could not be brought back from D3 by the kexec kernel, most probably
>> >> > due to quirky hardware (it looks like the same behavior happened on
>> >> > forcedeth).
>> >> >
>> >> > Prevent igb from putting the adapter into D3 during shutdown except
>> >> > when we going to power off the system. ÂFor this purpose, seperate
>> >> > igb_shutdown() from igb_suspend() and use the appropriate PCI PM
>> >> > callbacks in both of them.
>> >> >
>> >> > Signed-off-by: Rafael J. Wysocki <rjw@xxxxxxx>
>> >> > Reported-by: Yinghai Lu <yinghai@xxxxxxxxxx>
>> >> > ---
>> >> > Âdrivers/net/igb/igb_main.c | Â 42 ++++++++++++++++++++++++++++++------------
>> >> > Â1 file changed, 30 insertions(+), 12 deletions(-)
>> >> >
>> >> > Index: linux-2.6/drivers/net/igb/igb_main.c
>> >> > ===================================================================
>> >> > --- linux-2.6.orig/drivers/net/igb/igb_main.c
>> >> > +++ linux-2.6/drivers/net/igb/igb_main.c
>> >> > @@ -4277,7 +4277,7 @@ int igb_set_spd_dplx(struct igb_adapter
>> >> > Â}
>> >> >
>> >> >
>> >> > -static int igb_suspend(struct pci_dev *pdev, pm_message_t state)
>> >> > +static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake)
>> >> > Â{
>> >> > Â Â struct net_device *netdev = pci_get_drvdata(pdev);
>> >> > Â Â struct igb_adapter *adapter = netdev_priv(netdev);
>> >> > @@ -4336,15 +4336,9 @@ static int igb_suspend(struct pci_dev *p
>> >> > Â Â Â Â Â Â wr32(E1000_WUFC, 0);
>> >> > Â Â }
>> >> >
>> >> > - Â /* make sure adapter isn't asleep if manageability/wol is enabled */
>> >> > - Â if (wufc || adapter->en_mng_pt) {
>> >> > - Â Â Â Â Â pci_enable_wake(pdev, PCI_D3hot, 1);
>> >> > - Â Â Â Â Â pci_enable_wake(pdev, PCI_D3cold, 1);
>> >> > - Â } else {
>> >> > + Â *enable_wake = wufc || adapter->en_mng_pt;
>> >> > + Â if (!*enable_wake)
>> >> > Â Â Â Â Â Â igb_shutdown_fiber_serdes_link_82575(hw);
>> >> > - Â Â Â Â Â pci_enable_wake(pdev, PCI_D3hot, 0);
>> >> > - Â Â Â Â Â pci_enable_wake(pdev, PCI_D3cold, 0);
>> >> > - Â }
>> >> >
>> >> > Â Â /* Release control of h/w to f/w. ÂIf f/w is AMT enabled, this
>> >> > Â Â Â* would have already happened in close and is redundant. */
>> >> > @@ -4352,12 +4346,29 @@ static int igb_suspend(struct pci_dev *p
>> >> >
>> >> > Â Â pci_disable_device(pdev);
>> >> >
>> >> > - Â pci_set_power_state(pdev, pci_choose_state(pdev, state));
>> >> > -
>> >> > Â Â return 0;
>> >> > Â}
>> >> >
>> >> > Â#ifdef CONFIG_PM
>> >> > +static int igb_suspend(struct pci_dev *pdev, pm_message_t state)
>> >> > +{
>> >> > + Â int retval;
>> >> > + Â bool wake;
>> >> > +
>> >> > + Â retval = __igb_shutdown(pdev, &wake);
>> >> > + Â if (retval)
>> >> > + Â Â Â Â Â return retval;
>> >> > +
>> >> > + Â if (wake) {
>> >> > + Â Â Â Â Â pci_prepare_to_sleep(pdev);
>> >> > + Â } else {
>> >> > + Â Â Â Â Â pci_wake_from_d3(pdev, false);
>> >> > + Â Â Â Â Â pci_set_power_state(pdev, PCI_D3hot);
>> >> > + Â }
>> >> > +
>> >> > + Â return 0;
>> >> > +}
>> >> > +
>> >> > Âstatic int igb_resume(struct pci_dev *pdev)
>> >> > Â{
>> >> > Â Â struct net_device *netdev = pci_get_drvdata(pdev);
>> >> > @@ -4412,7 +4423,14 @@ static int igb_resume(struct pci_dev *pd
>> >> >
>> >> > Âstatic void igb_shutdown(struct pci_dev *pdev)
>> >> > Â{
>> >> > - Â igb_suspend(pdev, PMSG_SUSPEND);
>> >> > + Â bool wake;
>> >> > +
>> >> > + Â __igb_shutdown(pdev, &wake);
>> >> > +
>> >> > + Â if (system_state == SYSTEM_POWER_OFF) {
>> >> > + Â Â Â Â Â pci_wake_from_d3(pdev, wake);
>> >> > + Â Â Â Â Â pci_set_power_state(pdev, PCI_D3hot);
>> >> > + Â }
>> >> > Â}
>> >> >
>> >> > Â#ifdef CONFIG_NET_POLL_CONTROLLER
>> >>
>> >> it works too.
>> >
>> > Great, thanks for testing.
>> >
>> > Jeff, Jesse, is the patch fine with you?
>> >
>> > Rafael
>>
>> Let me talk with Jesse on Monday about it. ÂI know that Jesse's
>> concern's were that there were more than one driver afflicted with the
>> same or similar issue and that it made more sense to fix this in the
>> core for all drivers. ÂFor instance ixgbe, IIRC, but because Yinghai
>> did not have any other hardware, it was unclear if this was an issue
>> on other drivers.
>
> I think it was, at least theoretically. ÂWe should generally avoid powering
> off things on shutdown when it's not really necessary. ÂAlso, the use of
> pci_enable_wake() in these drivers is not really correct.
>
> And in fact this is why I'm asking, because I have analogouns patches for some
> other drivers in the works too. ÂIf you are fine with the approach, I'll post
> the whole series.
>
>> After discussing this with Jesse, one of us will respond. ÂI will
>> apply this patch to my tree anyway and if we are alright with the
>> patch, I will push it out with my other patches to Dave.
>
> Thanks!
>
> Best,
> Rafael
> --

After talking with Jesse, we are fine with it if our testers buy off
on it. I have the patch in my tree and so Emil will run a barrage of
regression/stress tests on the patch overnight. If everything looks
good, I will send it out with the other igb patches I have.

Thanks.

--
Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/