On Wed, Jul 3, 2024 at 11:03 AM AngeloGioacchino Del Regno
<angelogioacchino.delregno@xxxxxxxxxxxxx> wrote:
Il 02/07/24 23:24, Nícolas F. R. A. Prado ha scritto:
Commit da781936e7c3 ("thermal: gov_power_allocator: Allow binding
without trip points") allowed the governor to bind even when trip_max
is null. This allows a null pointer dereference to happen in the manage
callback. Add an early return to prevent it, since the governor is
expected to not do anything in this case.
Fixes: da781936e7c3 ("thermal: gov_power_allocator: Allow binding without trip points")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@xxxxxxxxxxxxx>
---
This issue was noticed by KernelCI during a boot test on the
mt8195-cherry-tomato-r2 platform with the config in [1]. The stack trace
is attached below.
[1] http://0x0.st/XaON.txt
[ 4.015786] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 4.015791] Mem abort info:
[ 4.015793] ESR = 0x0000000096000004
[ 4.015796] EC = 0x25: DABT (current EL), IL = 32 bits
[ 4.015799] SET = 0, FnV = 0
[ 4.015802] EA = 0, S1PTW = 0
[ 4.015804] FSC = 0x04: level 0 translation fault
[ 4.015807] Data abort info:
[ 4.015809] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 4.015811] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 4.015814] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 4.015818] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000109809000
[ 4.015821] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[ 4.015835] Modules linked in: mt8195_mt6359(+) mt6577_auxadc snd_soc_mt8195_afe mtk_scp_ipi snd_sof_utils mtk_wdt(+)
[ 4.015852] CPU: 2 PID: 13 Comm: kworker/u32:1 Not tainted 6.10.0-rc6 #1 c5d519ae8e7fec6bbe67cb8c50bfebcb89dfa54e
[ 4.015859] Hardware name: Acer Tomato (rev2) board (DT)
[ 4.015862] Workqueue: events_unbound deferred_probe_work_func
[ 4.015875] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 4.015880] pc : power_allocator_manage+0x110/0x6a0
[ 4.015888] lr : __thermal_zone_device_update+0x1dc/0x400
[ 4.015893] sp : ffff8000800eb800
[ 4.015895] x29: ffff8000800eb810 x28: 0000000000000001 x27: 0000000000000001
[ 4.015903] x26: aaaaaaaaaaaaaaab x25: ffff07a0461c15a0 x24: ffffb58530ca67c0
[ 4.015911] x23: 0000000000000000 x22: ffff07a04098fcc0 x21: ffffb58532eec848
[ 4.015918] x20: ffff8000800eb920 x19: ffff07a0461c1000 x18: 0000000000000b4b
[ 4.015926] x17: 5359534255530031 x16: ffffb585310352e4 x15: 0000000000000020
[ 4.015933] x14: 0000000000000000 x13: ffffffff00000000 x12: 0000000000000040
[ 4.015940] x11: 0101010101010101 x10: ffffffffffffffff x9 : ffffb58530ca8d78
[ 4.015948] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 0000000000001388
[ 4.015955] x5 : 0000000000000000 x4 : 0000000000000384 x3 : 0000000000000000
[ 4.015962] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[ 4.015970] Call trace:
[ 4.015972] power_allocator_manage+0x110/0x6a0
[ 4.015978] __thermal_zone_device_update+0x1dc/0x400
[ 4.015983] thermal_zone_device_set_mode+0x7c/0xa0
[ 4.015987] thermal_zone_device_enable+0x1c/0x28
[ 4.015991] thermal_of_zone_register+0x43c/0x498
[ 4.015996] devm_thermal_of_zone_register+0x6c/0xb8
[ 4.016001] gadc_thermal_probe+0x140/0x214
[ 4.016007] platform_probe+0x70/0xc4
[ 4.016012] really_probe+0x140/0x270
[ 4.016018] __driver_probe_device+0xfc/0x114
[ 4.016024] driver_probe_device+0x44/0x100
[ 4.016029] __device_attach_driver+0x64/0xdc
[ 4.016035] bus_for_each_drv+0xb4/0xdc
[ 4.016041] __device_attach+0xdc/0x16c
[ 4.016046] device_initial_probe+0x1c/0x28
[ 4.016052] bus_probe_device+0x44/0xac
[ 4.016057] deferred_probe_work_func+0xb0/0xc4
[ 4.016063] process_scheduled_works+0x114/0x330
[ 4.016070] worker_thread+0x1c0/0x20c
[ 4.016076] kthread+0xf8/0x108
[ 4.016081] ret_from_fork+0x10/0x20
[ 4.016090] Code: d1030294 17ffffdd f94012c0 f9401ed7 (b9400000)
[ 4.016095] ---[ end trace 0000000000000000 ]---
---
drivers/thermal/gov_power_allocator.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 45f04a25255a..1b2345a697c5 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -759,6 +759,9 @@ static void power_allocator_manage(struct thermal_zone_device *tz)
return;
}
+ if (!params->trip_max)
+ return;
+
I'm not sure that this is the right thing to do.
If you do that, allocate_power() will never be called, so the entire algo doesn't
work, making binding this completely useless (as it's going to be a noop..!).
Check what get_governor_trips() says in the documentation:
* If there is only one trip point, then that's considered to be the
* "maximum desired temperature" trip point and the governor is always
* on. If there are no passive or active trip points, then the
* governor won't do anything. In fact, its throttle function
* won't be called at all.
....and it looks like you're aware of that, as you said that in the commit
description as well.
IIUC, the problematic commit allowed the power allocator governor to
bind to a tripless thermal zone in order to prevent failing the entire
thermal zone registration.
Yes, it will be a noop in this case because in the absence of any
trips there will be nothing to do for it. Still, user space can check
the zone temperature via sysfs.
Adding a NULL pointer check before the place where the pointer in
question is dereferenced is not a bad idea at all.