Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

From: Qian Cai
Date: Sat Dec 01 2018 - 23:17:40 EST




On 11/12/18 3:39 AM, Marc Zyngier wrote:
> On Fri, 09 Nov 2018 18:41:03 +0000,
> Qian Cai <cai@xxxxxx> wrote:
>>
>>
>>
>>> On Nov 9, 2018, at 12:41 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>>>
>>> On 09/11/18 17:28, Sudeep Holla wrote:
>>>> On Fri, Nov 9, 2018 at 4:10 PM Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>>>>>
>>>> [...]
>>>>
>>>>>
>>>>> See bb42ca474010 and d003d029cea8 for details.
>>>>>
>>>>> Now, activating this workaround leads to lockdep being really angry,
>>>>> most likely because the cpus_read_lock is not taken, which is a change
>>>>> in behaviour...
>>>>>
>>>>> I'm trying to dig into this now.
>>>>>
>>>>
>>>> Yes we found similar issue in kernel/sched/core.c sched_init_smp
>>>> There's a fix with detailed description in -next
>>>> (Commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
>>>>
>>>> The behaviour changed since commit cb538267ea1e ("jump_label/lockdep:
>>>> Assert we hold the hotplug lock for _cpuslocked() operations")
>>>
>>> I indeed came to the same conclusion, but the fix is slightly less than
>>> obvious. I have the following arm64-specific crap, but it is pretty
>>> terrible:
>>>
>>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
>>> index f258636273c9..9e96e9eaca9b 100644
>>> --- a/arch/arm64/kernel/time.c
>>> +++ b/arch/arm64/kernel/time.c
>>> @@ -36,6 +36,7 @@
>>> #include <linux/clocksource.h>
>>> #include <linux/clk-provider.h>
>>> #include <linux/acpi.h>
>>> +#include <linux/cpu.h>
>>>
>>> #include <clocksource/arm_arch_timer.h>
>>>
>>> @@ -69,7 +70,9 @@ void __init time_init(void)
>>> u32 arch_timer_rate;
>>>
>>> of_clk_init(NULL);
>>> + cpus_read_lock();
>>> timer_probe();
>>> + cpus_read_unlock();
>>>
>>> tick_setup_hrtimer_broadcast();
>>>
>>> Qian, can you please let me know if this helps? If it does, we'll have
>>> to think of something a bit betterâ
>> After applied the above patch, the original warning is gone but there
>> Is now a new warning.
>
> [...]
>
> Which was ful;ly expected, given that I've taken the cpu lock at some
> semi-random location. I'll try to talk to PeterZ this week to try and
> solve this.
>

Marc, did you have a chance to investigate this further? I have still seen it in
the latest mainline today. This is the only warning left on this Huawei TaiShan
2280 server now after confirmed that those GICv3 warnings were gone.