We observed a random null pointer deference crash somewhere in theAs mentioned separately, this is not the problem.
thermal core (crash log below is not very helpful) when calling
mutex_lock(). It looks like we get an interrupt when this crash
happens.
Looking at the lm90 driver, per the above, I now see we are calling
hwmon_notify_event() from the lm90 interrupt handler. Looking at
hwmon_notify_event() I see that ...
hwmon_notify_event()
--> hwmon_thermal_notify()
--> thermal_zone_device_update()
--> update_temperature()
--> mutex_lock()
So although I don't completely understand the crash, it does seem
that we should not be calling hwmon_notify_event() from the
interrupt handler.
I think the problem may be that this is not a devicetree system
(or the lm90 devide does not have a devicetree node), but thermal
notification currently only works in such systems because the hwmon
subsystem uses the devicetree registration method. At the same time,
CONFIG_THERMAL_OF is obviously enabled. Unfortunately, the hwmon code
does not bail out in that situation due to another bug.