Re: [PATCH 4/5] net: macb: WoL support for GEM type of Ethernet controller

From: Nicolas.Ferre
Date: Tue Apr 21 2020 - 04:21:56 EST


On 17/04/2020 at 19:14, Nicolas Ferre wrote:
> On 16/04/2020 at 19:44, nicolas.ferre@xxxxxxxxxxxxx wrote:
>> From: Nicolas Ferre <nicolas.ferre@xxxxxxxxxxxxx>
>>
>> Adapt the Wake-on-Lan feature to the Cadence GEM Ethernet controller.
>> This controller has different register layout and cannot be handled by
>> previous code.
>> We disable completely interrupts on all the queues but the queue 0.
>> Handling of WoL interrupt is done in another interrupt handler
>> positioned depending on the controller version used, just between
>> suspend() and resume() calls.
>> It allows to lower pressure on the generic interrupt hot path by
>> removing the need to handle 2 tests for each IRQ: the first figuring out
>> the controller revision, the second for actually knowing if the WoL bit
>> is set.
>>
>> Queue management in suspend()/resume() functions inspired from RFC patch
>> by Harini Katakam <harinik@xxxxxxxxxx>, thanks!
>>
>> Signed-off-by: Nicolas Ferre <nicolas.ferre@xxxxxxxxxxxxx>
>> ---
>> drivers/net/ethernet/cadence/macb.h | 3 +
>> drivers/net/ethernet/cadence/macb_main.c | 121 ++++++++++++++++++++---
>> 2 files changed, 109 insertions(+), 15 deletions(-)
>
> [..]
>
>> @@ -4534,23 +4564,56 @@ static int __maybe_unused macb_suspend(struct device *dev)
>> struct macb_queue *queue = bp->queues;
>> unsigned long flags;
>> unsigned int q;
>> + int err;
>>
>> if (!netif_running(netdev))
>> return 0;
>>
>> if (bp->wol & MACB_WOL_ENABLED) {
>> - macb_writel(bp, IER, MACB_BIT(WOL));
>> - macb_writel(bp, WOL, MACB_BIT(MAG));
>> - enable_irq_wake(bp->queues[0].irq);
>> - netif_device_detach(netdev);
>> - } else {
>> - netif_device_detach(netdev);
>> + spin_lock_irqsave(&bp->lock, flags);
>> + /* Flush all status bits */
>> + macb_writel(bp, TSR, -1);
>> + macb_writel(bp, RSR, -1);
>> for (q = 0, queue = bp->queues; q < bp->num_queues;
>> - ++q, ++queue)
>> - napi_disable(&queue->napi);
>> - rtnl_lock();
>> - phylink_stop(bp->phylink);
>> - rtnl_unlock();
>> + ++q, ++queue) {
>> + /* Disable all interrupts */
>> + queue_writel(queue, IDR, -1);
>> + queue_readl(queue, ISR);
>> + if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>> + queue_writel(queue, ISR, -1);
>> + }
>> + /* Change interrupt handler and
>> + * Enable WoL IRQ on queue 0
>> + */
>> + if (macb_is_gem(bp)) {
>> + devm_free_irq(dev, bp->queues[0].irq, bp->queues);
>> + err = devm_request_irq(dev, bp->queues[0].irq, gem_wol_interrupt,
>> + IRQF_SHARED, netdev->name, bp->queues);
>> + if (err) {
>> + dev_err(dev,
>> + "Unable to request IRQ %d (error %d)\n",
>> + bp->queues[0].irq, err);
>> + return err;
>> + }
>> + queue_writel(bp->queues, IER, GEM_BIT(WOL));
>> + gem_writel(bp, WOL, MACB_BIT(MAG));
>> + } else {
>> + queue_writel(bp->queues, IER, MACB_BIT(WOL));
>> + macb_writel(bp, WOL, MACB_BIT(MAG));
>> + }
>> + spin_unlock_irqrestore(&bp->lock, flags);
>> +
>> + enable_irq_wake(bp->queues[0].irq);
>> + }
>> +
>> + netif_device_detach(netdev);
>> + for (q = 0, queue = bp->queues; q < bp->num_queues;
>> + ++q, ++queue)
>> + napi_disable(&queue->napi);
>> +
>> + if (!(bp->wol & MACB_WOL_ENABLED)) {
>> + phy_stop(netdev->phydev);
>> + phy_suspend(netdev->phydev);
>
> Bug here: you must read:
>
> rtnl_lock();
> phylink_stop(bp->phylink);
> rtnl_unlock();
>
> Instead of the 2 previous lines. I'll correct in v2.
>
> Sorry for the regression.
>
>
>> spin_lock_irqsave(&bp->lock, flags);
>> macb_reset_hw(bp);
>> spin_unlock_irqrestore(&bp->lock, flags);
>> @@ -4575,20 +4638,48 @@ static int __maybe_unused macb_resume(struct device *dev)
>
> [..]
> BTW: I have issue having a real resume event from the phy with this
> series. I'm investigating that but didn't find anything for now.
>
> Observation #1: when the WoL is not enabled, I don't have link issue.
> But the path in suspend/resume is far more intrusive in phy state.
>
> Observation #2: when WoL is enabled, I need to do a full ifdown/ifup
> sequence for gain access again to the link:
>
> ip link show eth0
> 2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
> state DOWN mode DEFAULT group default qlen 1000
> link/ether 54:10:ec:be:50:b0 brd ff:ff:ff:ff:ff:ff
>
> ifdown eth0 && ifup eth0
>
> ip link show eth0
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UP mode DEFAULT group default qlen 1000
> link/ether 54:10:ec:be:50:b0 brd ff:ff:ff:ff:ff:ff
>
> Observation #3: I didn't experience this behavior while playing with the
> WoL on my 4.19 kernel before porting to Mainline.

I've reviewed this series to fix this last issues. It's was a
combination of runtime_pm not handled properly and a mix of
netif_carrier_* call with phylink calls not well positioned nor balanced
between suspend and resume.

I have a v2 series that I'm preparing for today: Harini, I prefer to
post it now so it could avoid that you hit the same issues as me while
testing on your platform.

Best regards,
Nicolas


--
Nicolas Ferre