[PATCH] thermal: ti-soc-thermal: Disable the CPU PM notifier for OMAP4430
From: Peter Ujfalusi
Date: Thu Oct 29 2020 - 06:03:27 EST
It has been observed that on OMAP4430 (ES2.0, ES2.1 and ES2.3) the enabled
notifier causes errors on the DTEMP readout values:
ti-soc-thermal 4a002260.bandgap: in range ADC val: 52
ti-soc-thermal 4a002260.bandgap: in range ADC val: 64
ti-soc-thermal 4a002260.bandgap: in range ADC val: 64
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 4
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: in range ADC val: 100
raw 100 translates to 133 Celsius on omap4-sdp, triggering shutdown due to
critical temperature.
When the notifier is disable for OMAP4430 the DTEMP values are stable:
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
ti-soc-thermal 4a002260.bandgap: in range ADC val: 57
ti-soc-thermal 4a002260.bandgap: in range ADC val: 57
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
Fixes: 5093402e5b44 ("thermal: ti-soc-thermal: Enable addition power management")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@xxxxxx>
---
Hi,
my omap4-sdp (Blaze) was shutting down randomly due to critical temperature with
5.10-rc1 and I have bisected it back to 5093402e5b44.
Disabling the notifier fixes the random shutdowns on OMAP4430 (ES2.0 and ES2.1)
but it does not cause any issues on OMAP4460 (PandaES) or OMAP3630 (BeagleXM).
Tony's duovero with OMAP4430 ES2.3 did not ninja-shutdown, but he also have
constant and steady stream of:
thermal thermal_zone0: failed to read out thermal zone (-5)
pointing to similar issue.
Regards,
Peter
drivers/thermal/ti-soc-thermal/ti-bandgap.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
index 5e596168ba73..dcac99f327b0 100644
--- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
+++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
@@ -20,6 +20,7 @@
#include <linux/err.h>
#include <linux/types.h>
#include <linux/spinlock.h>
+#include <linux/sys_soc.h>
#include <linux/reboot.h>
#include <linux/of_device.h>
#include <linux/of_platform.h>
@@ -864,6 +865,17 @@ static struct ti_bandgap *ti_bandgap_build(struct platform_device *pdev)
return bgp;
}
+/*
+ * List of SoCs on which the CPU PM notifier can cause erros on the DTEMP
+ * readout.
+ * Enabled notifier on these machines results in erroneous, random values which
+ * could trigger unexpected thermal shutdown.
+ */
+static const struct soc_device_attribute soc_no_cpu_notifier[] = {
+ { .machine = "OMAP4430" },
+ { /* sentinel */ },
+};
+
/*** Device driver call backs ***/
static
@@ -1020,7 +1032,8 @@ int ti_bandgap_probe(struct platform_device *pdev)
#ifdef CONFIG_PM_SLEEP
bgp->nb.notifier_call = bandgap_omap_cpu_notifier;
- cpu_pm_register_notifier(&bgp->nb);
+ if (!soc_device_match(soc_no_cpu_notifier))
+ cpu_pm_register_notifier(&bgp->nb);
#endif
return 0;
@@ -1056,7 +1069,8 @@ int ti_bandgap_remove(struct platform_device *pdev)
struct ti_bandgap *bgp = platform_get_drvdata(pdev);
int i;
- cpu_pm_unregister_notifier(&bgp->nb);
+ if (!soc_device_match(soc_no_cpu_notifier))
+ cpu_pm_unregister_notifier(&bgp->nb);
/* Remove sensor interfaces */
for (i = 0; i < bgp->conf->sensor_count; i++) {
--
Peter
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki