Re: [PATCH 6.1 00/63] 6.1.111-rc1 review

From: Georgi Djakov
Date: Tue Oct 08 2024 - 19:44:01 EST


On 25.09.24 18:42, Dan Carpenter wrote:
On Wed, Sep 18, 2024 at 03:08:13PM +0300, Georgi Djakov wrote:
Warning log:
--------
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x517f803c]
[ 0.000000] Linux version 6.1.111-rc1 (tuxmake@tuxmake)
(aarch64-linux-gnu-gcc (Debian 13.3.0-5) 13.3.0, GNU ld (GNU Binutils
for Debian) 2.43) #1 SMP PREEMPT @1726489583
[ 0.000000] Machine model: Thundercomm Dragonboard 845c
...
[ 7.841428] ------------[ cut here ]------------
[ 7.841431] WARNING: CPU: 4 PID: 492 at
drivers/interconnect/core.c:685 __icc_enable
(drivers/interconnect/core.c:685 (discriminator 7))
[ 7.841442] Modules linked in: soundwire_bus(+) venus_core(+)
qcom_camss(+) drm_dp_aux_bus bluetooth(+) qcom_stats mac80211(+)
videobuf2_dma_sg drm_display_helper i2c_qcom_geni(+) i2c_qcom_cci
camcc_sdm845(+) v4l2_mem2mem qcom_q6v5_mss(+) videobuf2_memops
reset_qcom_pdc spi_geni_qcom(+) videobuf2_v4l2 phy_qcom_qmp_usb(+)
videobuf2_common gpi(+) qcom_rng cfg80211 phy_qcom_qmp_ufs ufs_qcom(+)
coresight_stm phy_qcom_qmp_pcie stm_core rfkill slim_qcom_ngd_ctrl
qrtr pdr_interface lmh qcom_wdt slimbus icc_osm_l3 qcom_q6v5_pas(+)
icc_bwmon llcc_qcom qcom_pil_info qcom_q6v5 qcom_sysmon qcom_common
qcom_glink_smem qmi_helpers mdt_loader display_connector
drm_kms_helper drm socinfo rmtfs_mem
[ 7.841494] CPU: 4 PID: 492 Comm: (udev-worker) Not tainted 6.1.111-rc1 #1
[ 7.841497] Hardware name: Thundercomm Dragonboard 845c (DT)
[ 7.841499] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 7.841502] pc : __icc_enable (drivers/interconnect/core.c:685
(discriminator 7))
[ 7.841505] lr : icc_disable (drivers/interconnect/core.c:708)
[ 7.841508] sp : ffff800008b23660
[ 7.841509] x29: ffff800008b23660 x28: ffff800008b23c20 x27: 0000000000000000
[ 7.841513] x26: ffffdd85da6ea1c0 x25: 0000000000000008 x24: 00000000000f4240
[ 7.841516] x23: 0000000000000000 x22: ffff46a58b7ca580 x21: 0000000000000001
[ 7.841519] x20: ffff46a58b7ca5c0 x19: ffff46a58b54a800 x18: 0000000000000000
[ 7.841522] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 7.841525] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 7.841528] x11: fefefefefefefeff x10: 0000000000000bf0 x9 : ffffdd85d8c9b0bc
[ 7.841531] x8 : ffff800008b22f58 x7 : 0000000000000000 x6 : 0000000000024404
[ 7.841535] x5 : 0000000000000000 x4 : ffff46a58b64b180 x3 : ffffdd85daa5e810
[ 7.841537] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[ 7.841541] Call trace:
[ 7.841542] __icc_enable (drivers/interconnect/core.c:685 (discriminator 7))
[ 7.841545] icc_disable (drivers/interconnect/core.c:708)
[ 7.841547] geni_icc_disable (drivers/soc/qcom/qcom-geni-se.c:862)
[ 7.841553] spi_geni_runtime_suspend+0x3c/0x4c spi_geni_qcom
[ 7.841561] pm_generic_runtime_suspend (drivers/base/power/generic_ops.c:28)
[ 7.841565] __rpm_callback (drivers/base/power/runtime.c:395)
[ 7.841568] rpm_callback (drivers/base/power/runtime.c:532)
[ 7.841570] rpm_suspend (drivers/base/power/runtime.c:672)
[ 7.841572] rpm_idle (drivers/base/power/runtime.c:504 (discriminator 1))
[ 7.841574] update_autosuspend (drivers/base/power/runtime.c:1662)
[ 7.841576] pm_runtime_disable_action (include/linux/spinlock.h:401
drivers/base/power/runtime.c:1703 include/linux/pm_runtime.h:599
drivers/base/power/runtime.c:1517)
[ 7.841579] devm_action_release (drivers/base/devres.c:720)
[ 7.841581] release_nodes (drivers/base/devres.c:503)
[ 7.841583] devres_release_all (drivers/base/devres.c:532)
[ 7.841585] device_unbind_cleanup (drivers/base/dd.c:531)
[ 7.841589] really_probe (drivers/base/dd.c:710)
[ 7.841592] __driver_probe_device (drivers/base/dd.c:785)
[ 7.841594] driver_probe_device (drivers/base/dd.c:815)
[ 7.841596] __driver_attach (drivers/base/dd.c:1202)
[ 7.841598] bus_for_each_dev (drivers/base/bus.c:301)
[ 7.841600] driver_attach (drivers/base/dd.c:1219)
[ 7.841602] bus_add_driver (drivers/base/bus.c:618)
[ 7.841604] driver_register (drivers/base/driver.c:246)
[ 7.841607] __platform_driver_register (drivers/base/platform.c:868)
[ 7.841609] spi_geni_driver_init+0x28/0x1000 spi_geni_qcom


So it looks like spi_geni_probe() calls geni_icc_get() which fails. It must
be with -EPROBE_DEFER otherwise we would get a printk. This could happen if
of_icc_get_from_provider() fails for example. There are two callers. These
were the only possibilities that I saw which didn't lead to a warning message.

Apologies that it took me some time to get the board and reproduce it.
The case is slightly different - geni_icc_get() is not failing, but it's
the spi_geni_grab_gpi_chan() that sometimes returns -EPROBE_DEFER and then
devres starts freeing the driver resources and it does it in reverse order,
so for this driver the order is:

[ 7.138679] geni_spi 880000.spi: DEVRES REL ffff800081443800 devm_icc_release (8 bytes)
[ 7.138751] geni_spi 880000.spi: DEVRES REL ffff800081443800 devm_icc_release (8 bytes)
[ 7.138827] geni_spi 880000.spi: DEVRES REL ffff800081443800 pm_runtime_disable_action (16 bytes)
[ 7.139494] geni_spi 880000.spi: DEVRES REL ffff800081443800 devm_pm_opp_config_release (16 bytes)
[ 7.139512] geni_spi 880000.spi: DEVRES REL ffff800081443800 devm_spi_release_controller (8 bytes)
[ 7.139516] geni_spi 880000.spi: DEVRES REL ffff800081443800 devm_clk_release (16 bytes)
[ 7.139519] geni_spi 880000.spi: DEVRES REL ffff800081443800 devm_ioremap_release (8 bytes)
[ 7.139524] geni_spi 880000.spi: DEVRES REL ffff800081443800 devm_region_release (24 bytes)
[ 7.139527] geni_spi 880000.spi: DEVRES REL ffff800081443800 devm_kzalloc_release (22 bytes)
[ 7.139530] geni_spi 880000.spi: DEVRES REL ffff800081443800 devm_pinctrl_release (8 bytes)
[ 7.139539] geni_spi 880000.spi: DEVRES REL ffff800081443800 devm_kzalloc_release (40 bytes)

The issue here is that pm_runtime_disable_action() results in a call to
spi_geni_runtime_suspend(), which attempts to suspend the device and
disable an interconnect path that devm_icc_release() has just released.

This could be easily reproduced by adding a sleep in the beginning of the
probe function of the GPI DMA driver to make the SPI driver probe defer.

The first commit that introduced this issue seems to be:
89e362c883c6 ("spi: geni-qcom: Undo runtime PM changes at driver exit time")

Here is a link to the patch i submitted to enable runtime_pm after the
driver gets all resources (including the interconnects). This approach
ensures that when devres releases resources in reverse order, it will
start with pm_runtime_disable_action(), suspending the device, and then
proceed to free the remaining resources:

https://lore.kernel.org/r/20241008231615.430073-1-djakov@xxxxxxxxxx/


The automatic cleanup tries to suspend and triggers the warning IS_ERR() warning
in __icc_enable().

if (WARN_ON(IS_ERR(path) || !path->num_nodes))

The best option is probably to disable the warning for EPROBE_DEFER. Another
two options would be to disable the warning entirely. A third option would be
to do a work-around for EPROBE_DEFER in geni_icc_get().

Please, could you take a look and give the Reported-by tag to Naresh? Or I
could send this patch if you want.

regards,
dan carpenter

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 4526ff2e1bd5..0caf8ead6573 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -682,6 +682,8 @@ static int __icc_enable(struct icc_path *path, bool enable)
if (!path)
return 0;
+ if (IS_ERR(path) && (PTR_ERR(path) == -EPROBE_DEFER))
+ return 0;
if (WARN_ON(IS_ERR(path) || !path->num_nodes))
return -EINVAL;

This change will not help as it's the !path->num_nodes that triggered the warning.

Thanks,
Georgi