Re: [PATCH] PCI: qcom: Implement shutdown() callback
From: Niklas Cassel
Date: Tue Apr 15 2025 - 04:31:01 EST
On Tue, Apr 15, 2025 at 01:07:23PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Apr 03, 2025 at 09:51:31AM +0200, Niklas Cassel wrote:
> > Hello Krishna,
> >
> > On Thu, Apr 03, 2025 at 09:26:08AM +0530, Krishna Chaitanya Chundru wrote:
> > > On 4/2/2025 6:14 PM, Niklas Cassel wrote:
> > > >
> > > > Out of curiosity, I tried something similar to on pcie-dw-rockchip.c
> > > >
> > > > Simply having a ->shutdown() callback that only calls dw_pcie_host_deinit()
> > > > was enough for me to produce:
> > > >
> > > > [ 40.209887] r8169 0004:41:00.0 eth0: Link is Down
> > > > [ 40.216572] ------------[ cut here ]------------
> > > > [ 40.216986] called from state HALTED
> > > > [ 40.217317] WARNING: CPU: 7 PID: 265 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
> > > > [ 40.218024] Modules linked in: rk805_pwrkey hantro_vpu v4l2_jpeg v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_v4l2 videobuf2_dma_contig videobuf2_memops videobuf2_common vidf
> > > > [ 40.220267] CPU: 7 UID: 0 PID: 265 Comm: init Not tainted 6.14.0+ #134 PREEMPT
> > > > [ 40.220908] Hardware name: Radxa ROCK 5B (DT)
> > > > [ 40.221289] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > [ 40.221899] pc : phy_stop+0x134/0x1a0
> > > > [ 40.222222] lr : phy_stop+0x134/0x1a0
> > > > [ 40.222546] sp : ffff800082213820
> > > > [ 40.222836] x29: ffff800082213820 x28: ffff45ec84b30000 x27: 0000000000000000
> > > > [ 40.223463] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbe8df7fde030
> > > > [ 40.224088] x23: ffff800082213990 x22: 0000000000000001 x21: ffff45ec80e10000
> > > > [ 40.224714] x20: ffff45ec82cb40c8 x19: ffff45ec82ccc000 x18: 0000000000000006
> > > > [ 40.225340] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0720072007200720
> > > > [ 40.225966] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
> > > > [ 40.226592] x11: 0000000000000058 x10: 0000000000000018 x9 : ffffbe8df556469c
> > > > [ 40.227217] x8 : 0000000000000268 x7 : ffffbe8df7a48648 x6 : ffffbe8df7a48648
> > > > [ 40.227842] x5 : 0000000000017fe8 x4 : 0000000000000000 x3 : 0000000000000000
> > > > [ 40.228468] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff45ec84b30000
> > > > [ 40.229093] Call trace:
> > > > [ 40.229308] phy_stop+0x134/0x1a0 (P)
> > > > [ 40.229634] rtl8169_down+0x34/0x280
> > > > [ 40.229952] rtl8169_close+0x64/0x100
> > > > [ 40.230275] __dev_close_many+0xbc/0x1f0
> > > > [ 40.230621] dev_close_many+0x94/0x160
> > > > [ 40.230951] unregister_netdevice_many_notify+0x14c/0x9c0
> > > > [ 40.231426] unregister_netdevice_queue+0xe4/0x100
> > > > [ 40.231848] unregister_netdev+0x2c/0x60
> > > > [ 40.232193] rtl_remove_one+0xa0/0xe0
> > > > [ 40.232517] pci_device_remove+0x4c/0xf8
> > > > [ 40.232864] device_remove+0x54/0x90
> > > > [ 40.233182] device_release_driver_internal+0x1d4/0x238
> > > > [ 40.233643] device_release_driver+0x20/0x38
> > > > [ 40.234019] pci_stop_bus_device+0x84/0xe0
> > > > [ 40.234381] pci_stop_bus_device+0x40/0xe0
> > > > [ 40.234741] pci_stop_root_bus+0x48/0x80
> > > > [ 40.235087] dw_pcie_host_deinit+0x34/0xe0
> > > > [ 40.235452] rockchip_pcie_shutdown+0x24/0x48
> > > > [ 40.235839] platform_shutdown+0x2c/0x48
> > > > [ 40.236187] device_shutdown+0x150/0x278
> > > > [ 40.236533] kernel_restart+0x4c/0xb8
> > > > [ 40.236859] __do_sys_reboot+0x178/0x280
> > > > [ 40.237206] __arm64_sys_reboot+0x2c/0x40
> > > > [ 40.237561] invoke_syscall+0x50/0x120
> > > > [ 40.237891] el0_svc_common.constprop.0+0x48/0xf0
> > > > [ 40.238305] do_el0_svc+0x24/0x38
> > > > [ 40.238597] el0_svc+0x30/0xd0
> > > > [ 40.238868] el0t_64_sync_handler+0x10c/0x138
> > > > [ 40.239251] el0t_64_sync+0x198/0x1a0
> > > > [ 40.239575] ---[ end trace 0000000000000000 ]---
> > > >
> > > Hi Niklas,
> > >
> > > The issue which you are seeing with specific to the RealTek ethernet
> > > driver and should be fixed by RealTek driver nothing to do with the host
> > > controller.
> >
> > The warning comes from:
> > drivers/net/phy/phy.c:phy_stop()
> > so from the networking phylib.
> >
> > Doing a simple:
> > $ git grep -p phy_stop
> >
> > shows that practially all Ethernet drivers call phy_stop() from the
> > .ndo_stop() callback.
> >
> > So after your suggested patch, you should see this warning appear with
> > any NIC, if connected to your PCIe controller.
> >
>
> I think the issue here is that phy_stop() is called without calling
> corresponding phy_start(). This means that either rtl8169_up() is never called
> or rtl8169_down() is called twice.
phy_start() is called.
>
> I don't think this issue is applicable to all drivers. It'd be worth
> investigating what is going wrong with this r8169 driver.
In case you are curious, I added a dump_stack() before phy_stop() is called:
First phy_stop() call:
[ 21.733574] platform fc400000.usb: deferred probe pending: dwc3: failed to initialize core
[ 23.827753] CPU: 6 UID: 0 PID: 238 Comm: init Not tainted 6.15.0-rc2+ #149 PREEMPT
[ 23.827762] Hardware name: Radxa ROCK 5B (DT)
[ 23.827764] Call trace:
[ 23.827765] show_stack+0x20/0x40 (C)
[ 23.827774] dump_stack_lvl+0x60/0x80
[ 23.827778] dump_stack+0x18/0x24
[ 23.827782] rtl8169_down+0x30/0x2a0
[ 23.827788] rtl_shutdown+0xb0/0xc0
[ 23.827792] pci_device_shutdown+0x3c/0x88
[ 23.827797] device_shutdown+0x150/0x278
[ 23.827802] kernel_restart+0x4c/0xb8
[ 23.827807] __do_sys_reboot+0x178/0x280
[ 23.827811] __arm64_sys_reboot+0x2c/0x40
[ 23.827816] invoke_syscall+0x50/0x120
[ 23.827822] el0_svc_common.constprop.0+0x48/0xf0
[ 23.827826] do_el0_svc+0x24/0x38
[ 23.827828] el0_svc+0x30/0xd0
[ 23.827834] el0t_64_sync_handler+0x10c/0x138
[ 23.827837] el0t_64_sync+0x198/0x1a0
[ 23.827841] calling phy_stop() rtl8169_down:4844
[ 23.834789] r8169 0004:41:00.0 eth0: Link is Down
Second phy_stop() call:
[ 23.841458] CPU: 6 UID: 0 PID: 238 Comm: init Not tainted 6.15.0-rc2+ #149 PREEMPT
[ 23.841467] Hardware name: Radxa ROCK 5B (DT)
[ 23.841468] Call trace:
[ 23.841470] show_stack+0x20/0x40 (C)
[ 23.841478] dump_stack_lvl+0x60/0x80
[ 23.841483] dump_stack+0x18/0x24
[ 23.841486] rtl8169_down+0x30/0x2a0
[ 23.841492] rtl8169_close+0x64/0x100
[ 23.841496] __dev_close_many+0xbc/0x1f0
[ 23.841502] dev_close_many+0x94/0x160
[ 23.841505] unregister_netdevice_many_notify+0x160/0x9d0
[ 23.841510] unregister_netdevice_queue+0xf0/0x100
[ 23.841515] unregister_netdev+0x2c/0x58
[ 23.841519] rtl_remove_one+0xa0/0xe0
[ 23.841524] pci_device_remove+0x4c/0xf8
[ 23.841528] device_remove+0x54/0x90
[ 23.841534] device_release_driver_internal+0x1d4/0x238
[ 23.841539] device_release_driver+0x20/0x38
[ 23.841544] pci_stop_bus_device+0x84/0xe0
[ 23.841548] pci_stop_bus_device+0x40/0xe0
[ 23.841552] pci_stop_root_bus+0x48/0x80
[ 23.841555] dw_pcie_host_deinit+0x34/0xe0
[ 23.841559] rockchip_pcie_shutdown+0x20/0x38
[ 23.841565] platform_shutdown+0x2c/0x48
[ 23.841571] device_shutdown+0x150/0x278
[ 23.841575] kernel_restart+0x4c/0xb8
[ 23.841580] __do_sys_reboot+0x178/0x280
[ 23.841584] __arm64_sys_reboot+0x2c/0x40
[ 23.841588] invoke_syscall+0x50/0x120
[ 23.841595] el0_svc_common.constprop.0+0x48/0xf0
[ 23.841598] do_el0_svc+0x24/0x38
[ 23.841601] el0_svc+0x30/0xd0
[ 23.841605] el0t_64_sync_handler+0x10c/0x138
[ 23.841609] el0t_64_sync+0x198/0x1a0
[ 23.841613] calling phy_stop() rtl8169_down:4844
So it seems that the driver's rtl_shutdown() calls rtl8169_net_suspend(),
which calls rtl8169_down() which calls phy_stop().
If I compare rtl8169_close() with e.g. e1000e_close(),
e1000e_close() does have a guard:
if (netif_device_present(netdev)) {
around the call to e1000e_down(),
while rtl8169_close() does not have a similar guard around the call to
rtl8169_down().
So, I think you are right, this is probably a r8169 driver specific
problem after all.
Kind regards,
Niklas