Re: [PATCH net] r8169: correct the reset timing of RTL8125 for link-change event

From: En-Wei WU
Date: Mon Sep 09 2024 - 01:25:32 EST


Hi Heiner,

Thank you for the quick response.

On Sat, 7 Sept 2024 at 05:17, Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote:
>
> On 06.09.2024 10:35, En-Wei Wu wrote:
> > The commit 621735f59064 ("r8169: fix rare issue with broken rx after
> > link-down on RTL8125") set a reset work for RTL8125 in
> > r8169_phylink_handler() to avoid the MAC from locking up, this
> > makes the connection broken after unplugging then re-plugging the
> > Ethernet cable.
> >
> > This is because the commit mistakenly put the reset work in the
> > link-down path rather than the link-up path (The commit message says
> > it should be put in the link-up path).
> >
> That's not what the commit message is saying. It says vendor driver
> r8125 does it in the link-up path.
> I moved it intentionally to the link-down path, because traffic may
> be flowing already after link-up.
>
> > Moving the reset work from the link-down path to the link-up path fixes
> > the issue. Also, remove the unnecessary enum member.
> >
> The user who reported the issue at that time confirmed that the original
> change fixed the issue for him.
> Can you explain, from the NICs perspective, what exactly the difference
> is when doing the reset after link-up?
> Including an explanation how the original change suppresses the link-up
> interrupt. And why that's not the case when doing the reset after link-up.

The host-plug test under original change does have the link-up
interrupt and r8169_phylink_handler() called. There is not much clue
why calling reset in link-down path doesn't work but in link-up does.

After several new tests, I found that with the original change, the
link won't break if I unplug and then plug the cable within about 3
seconds. On the other hand, the connections always break if I re-plug
the cable after a few seconds.

With this new patch (reset in link-up path), both of the tests work
without any error.

>
> I simply want to be convinced enough that your change doesn't break
> behavior for other users.
>
> > Fixes: 621735f59064 ("r8169: fix rare issue with broken rx after link-down on RTL8125")
> > Signed-off-by: En-Wei Wu <en-wei.wu@xxxxxxxxxxxxx>
> > ---
> > drivers/net/ethernet/realtek/r8169_main.c | 11 +++++------
> > 1 file changed, 5 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> > index 3507c2e28110..632e661fc74b 100644
> > --- a/drivers/net/ethernet/realtek/r8169_main.c
> > +++ b/drivers/net/ethernet/realtek/r8169_main.c
> > @@ -590,7 +590,6 @@ struct rtl8169_tc_offsets {
> > enum rtl_flag {
> > RTL_FLAG_TASK_ENABLED = 0,
> > RTL_FLAG_TASK_RESET_PENDING,
> > - RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE,
> > RTL_FLAG_TASK_TX_TIMEOUT,
> > RTL_FLAG_MAX
> > };
> > @@ -4698,8 +4697,6 @@ static void rtl_task(struct work_struct *work)
> > reset:
> > rtl_reset_work(tp);
> > netif_wake_queue(tp->dev);
> > - } else if (test_and_clear_bit(RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE, tp->wk.flags)) {
> > - rtl_reset_work(tp);
> > }
> > out_unlock:
> > rtnl_unlock();
> > @@ -4729,11 +4726,13 @@ static void r8169_phylink_handler(struct net_device *ndev)
> > if (netif_carrier_ok(ndev)) {
> > rtl_link_chg_patch(tp);
> > pm_request_resume(d);
> > - netif_wake_queue(tp->dev);
> > - } else {
> > +
> > /* In few cases rx is broken after link-down otherwise */
> > if (rtl_is_8125(tp))
> > - rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE);
> > + rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
> > + else
> > + netif_wake_queue(tp->dev);
>
> This call to netif_wake_queue() isn't needed any longer, it was introduced with
> the original change only.
>
> > + } else {
> > pm_runtime_idle(d);
> > }
> >
>

CC. Martin Kjær Jørgensen <me@xxxxxxxx>, could you kindly test if
this new patch works on your environment? Thanks!

En-Wei,
Best regards.