Re: [PATCH v4 4/4] thermal: core: Add notifications call in the framework

From: Daniel Lezcano
Date: Mon Jul 13 2020 - 05:45:35 EST



Added Arnd in Cc.

On 13/07/2020 11:31, Marek Szyprowski wrote:
> Hi
>
> On 07.07.2020 11:15, Marek Szyprowski wrote:
>> On 06.07.2020 15:46, Daniel Lezcano wrote:
>>> On 06/07/2020 15:17, Marek Szyprowski wrote:
>>>> On 06.07.2020 12:55, Daniel Lezcano wrote:
>>>>> The generic netlink protocol is implemented but the different
>>>>> notification functions are not yet connected to the core code.
>>>>>
>>>>> These changes add the notification calls in the different
>>>>> corresponding places.
>>>>>
>>>>> Reviewed-by: Amit Kucheria <amit.kucheria@xxxxxxxxxx>
>>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
>>>> This patch landed in today's linux-next 20200706 as commit 5df786e46560
>>>> ("thermal: core: Add notifications call in the framework"). Sadly it
>>>> breaks booting various Samsung Exynos based boards. Here is an example
>>>> log from Odroid U3 board:
>>>>
>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>> 00000010
>>>> pgd = (ptrval)
>>>> [00000010] *pgd=00000000
>>>> Internal error: Oops: 5 [#1] PREEMPT SMP ARM
>>>> Modules linked in:
>>>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.0-rc3-00015-g5df786e46560
>>>> #1146
>>>> Hardware name: Samsung Exynos (Flattened Device Tree)
>>>> PC is at kmem_cache_alloc+0x13c/0x418
>>>> LR is at kmem_cache_alloc+0x48/0x418
>>>> pc : [<c02b5cac>]ÂÂÂ lr : [<c02b5bb8>]ÂÂÂ psr: 20000053
>>>> ...
>>>> Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
>>>> Control: 10c5387d Table: 4000404a DAC: 00000051
>>>> Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
>>>> Stack: (0xee8f1cf8 to 0xee8f2000)
>>>> ...
>>>> [<c02b5cac>] (kmem_cache_alloc) from [<c08cd170>]
>>>> (__alloc_skb+0x5c/0x170)
>>>> [<c08cd170>] (__alloc_skb) from [<c07ec19c>]
>>>> (thermal_genl_send_event+0x24/0x174)
>>>> [<c07ec19c>] (thermal_genl_send_event) from [<c07ec648>]
>>>> (thermal_notify_tz_create+0x58/0x74)
>>>> [<c07ec648>] (thermal_notify_tz_create) from [<c07e9058>]
>>>> (thermal_zone_device_register+0x358/0x650)
>>>> [<c07e9058>] (thermal_zone_device_register) from [<c1028d34>]
>>>> (of_parse_thermal_zones+0x304/0x7a4)
>>>> [<c1028d34>] (of_parse_thermal_zones) from [<c1028964>]
>>>> (thermal_init+0xdc/0x154)
>>>> [<c1028964>] (thermal_init) from [<c0102378>]
>>>> (do_one_initcall+0x8c/0x424)
>>>> [<c0102378>] (do_one_initcall) from [<c1001158>]
>>>> (kernel_init_freeable+0x190/0x204)
>>>> [<c1001158>] (kernel_init_freeable) from [<c0ab85f4>]
>>>> (kernel_init+0x8/0x118)
>>>> [<c0ab85f4>] (kernel_init) from [<c0100114>] (ret_from_fork+0x14/0x20)
>>>>
>>>> Reverting it on top of linux-next fixes the boot issue. I will
>>>> investigate it further soon.
>>> Thanks for reporting this.
>>>
>>> Can you send the addr2line result and code it points to ?
>>
>> addr2line of c02b5cac (kmem_cache_alloc+0x13c/0x418) points to
>> mm/slub.c +2839, but I'm not sure if we can trust it. imho it looks
>> like some trashed memory somewhere, but I don't have time right now to
>> analyze it further now...
>
> Just one more thing I've noticed. The crash happens only if the kernel
> is compiled with old GCC (tested with arm-linux-gnueabi-gcc (Linaro GCC
> 4.9-2017.01) 4.9.4). If I compile kernel with newed GCC (like
> arm-linux-gnueabi-gcc (Linaro GCC 6.4-2017.11) 6.4.1 20171012), it works
> fine...
>
> This happens also with Linux next-20200710, which again got this commit.
Arnd,

are you aware of any issue with this gcc version which can explain this
kernel panic ? Sounds like the problem does not appear with more recent
version.

--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog