Re: [PATCH net-next v2 5/6] net/ncsi: Reset channel state in ncsi_start_dev()
From: Samuel Mendoza-Jonas
Date: Mon Oct 29 2018 - 20:23:22 EST
On Fri, 2018-10-26 at 17:25 +0000, Justin.Lee1@xxxxxxxx wrote:
> Hi Samuel,
>
> I noticed a few issues and commented below.
>
> Thanks,
> Justin
>
>
> > /* Resources */
> > +int ncsi_reset_dev(struct ncsi_dev *nd);
> > void ncsi_start_channel_monitor(struct ncsi_channel *nc);
> > void ncsi_stop_channel_monitor(struct ncsi_channel *nc);
> > struct ncsi_channel *ncsi_find_channel(struct ncsi_package *np,
> > diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
> > index 014321ad31d3..9bad03e3fa5e 100644
> > --- a/net/ncsi/ncsi-manage.c
> > +++ b/net/ncsi/ncsi-manage.c
> > @@ -550,8 +550,10 @@ static void ncsi_suspend_channel(struct ncsi_dev_priv *ndp)
> > spin_lock_irqsave(&nc->lock, flags);
> > nc->state = NCSI_CHANNEL_INACTIVE;
> > spin_unlock_irqrestore(&nc->lock, flags);
> > - ncsi_process_next_channel(ndp);
> > -
> > + if (ndp->flags & NCSI_DEV_RESET)
> > + ncsi_reset_dev(nd);
> > + else
> > + ncsi_process_next_channel(ndp);
> > break;
> > default:
> > netdev_warn(nd->dev, "Wrong NCSI state 0x%x in suspend\n",
> > @@ -1554,7 +1556,7 @@ int ncsi_start_dev(struct ncsi_dev *nd)
> > return 0;
> > }
> >
> > - return ncsi_choose_active_channel(nd);
> > + return ncsi_reset_dev(nd);
>
> If there is no available channel due to the whitelist, ncsi_start_dev() function will return failed
> Status and the network interface may fail to bring up too. It is possible for user to disable all
> channels and leave the interface up for checking the LOM status.
>
I'm not sure that that is a bug, or at least not in the scope of this
series. If the whitelist is set such that no channels are valid then
there's nothing for NCSI to do. If we want to do something like always
monitor all channels then that would be best to do in another patch.
> > }
> > EXPORT_SYMBOL_GPL(ncsi_start_dev);
>
> Also, if I send set_package_mask and set_channel_mask commands back to back in a program,
> the state machine doesn't work well. If I use command line and wait for it to complete for
> each step, then it is fine.
Yeah that's not great; probably hitting some corner cases in the NCSI
locking. I'll look into the multi-channel related stuff but I have a
feeling that if you tried this with the existing set/clear commands you
would probably hit something similar, especially on your dual core
platform. If so this is probably something to fix separately.
>
> npcm7xx-emc f0825000.eth eth2: NCSI: Multi-package enabled on ifindex 2, mask 0x00000001
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 0
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 state 0400
> npcm7xx-emc f0825000.eth eth2: NCSI: pkg 0 ch 0 set as preferred channel
> npcm7xx-emc f0825000.eth eth2: NCSI: Multi-channel enabled on ifindex 2, mask 0x00000003
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 1
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 1 state 0400
> npcm7xx-emc f0825000.eth eth2: NCSI: Package 1 set to all channels disabled
> npcm7xx-emc f0825000.eth eth2: NCSI: Multi-channel enabled on ifindex 2, mask 0x00000000
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel()
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pkg 0
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass pkg whitelist
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - ch 0
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass ch whitelist
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - skip
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - ch 1
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass ch whitelist
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - skip
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - next pkg
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pkg 1
> npcm7xx-emc f0825000.eth eth2: NCSI: No channel found to configure!
> npcm7xx-emc f0825000.eth eth2: NCSI interface down
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> npcm7xx-emc f0825000.eth eth2: Wrong NCSI state 0x100 in workqueue
>
> All masks are set correctly, but you can see the PS column is not right and channel doesn't
> configure correctly.
>
> /sys/kernel/debug/ncsi_protocol# cat ncsi_device_status
> IFIDX IFNAME NAME PID CID RX TX MP MC WP WC PC PS LS RU CR NQ HA
> ===================================================================
> 2 eth2 ncsi0 000 000 1 1 1 1 1 1 1 0 1 1 1 0 1
> 2 eth2 ncsi1 000 001 1 0 1 1 1 1 0 0 1 1 1 0 1
> 2 eth2 ncsi2 001 000 0 0 1 1 0 0 0 0 1 1 1 0 1
> 2 eth2 ncsi3 001 001 0 0 1 1 0 0 0 0 1 1 1 0 1
> ===================================================================
> MP: Multi-mode Package WP: Whitelist Package
> MC: Multi-mode Channel WC: Whitelist Channel
> PC: Primary Channel
> PS: Poll Status
> LS: Link Status
> RU: Running
> CR: Carrier OK
> NQ: Queue Stopped
> HA: Hardware Arbitration
>
> PS column is getting from (int)nc->monitor.enabled.