Re: Bug: Potential KCOV Race Condition in __sanitizer_cov_trace_pc Leading to Crash at kcov.c:217
From: Kun Hu
Date: Sun Jan 12 2025 - 03:42:49 EST
> 2025年1月10日 20:13,Dmitry Vyukov <dvyukov@xxxxxxxxxx> 写道:
>
> On Fri, 10 Jan 2025 at 09:14, Kun Hu <huk23@xxxxxxxxxxxxxx> wrote:
>>>> HEAD commit: dbfac60febfa806abb2d384cb6441e77335d2799
>>>> git tree: upstream
>>>> Console output: https://drive.google.com/file/d/1rmVTkBzuTt0xMUS-KPzm9OafMLZVOAHU/view?usp=sharing
>>>> Kernel config: https://drive.google.com/file/d/1m1mk_YusR-tyusNHFuRbzdj8KUzhkeHC/view?usp=sharing
>>>> C reproducer: /
>>>> Syzlang reproducer: /
>>>>
>>>> The crash in __sanitizer_cov_trace_pc at kernel/kcov.c:217 seems to be related to the handling of KCOV instrumentation when running in a preemption or IRQ-sensitive context. Specifically, the code might allow potential recursive invocations of __sanitizer_cov_trace_pc during early interrupt handling, which could lead to data races or inconsistent updates to the coverage area (kcov_area). It remains unclear whether this is a KCOV-specific issue or a rare edge case exposed by fuzzing.
>>>
>>> Hi Kun,
>>>
>>> How have you inferred this from the kernel oops?
>>> I only see a stall that may have just happened to be caught inside of
>>> __sanitizer_cov_trace_pc function since it's executed often in an
>>> instrumented kernel.
>>>
>>> Note: on syzbot we don't report stalls on instances that have
>>> perf_event_open enabled, since perf have known bugs that lead to stall
>>> all over the kernel.
>>
>> Hi Dmitry,
>>
>> Please allow me to ask for your advice:
>>
>> We get the new c and syzlang reproducer for multiple rounds of reproducing. Indeed, the location of this issue has varied (BUG: soft lockup in tmigr_handle_remote in ./kernel/time/timer_migration.c). The crash log, along with the C and Syzlang reproducer are provided below:
>>
>> Crash log: https://drive.google.com/file/d/16YDP6bU3Ga8OI1l7hsNFG4EdvjxuBz8d/view?usp=sharing
>> C reproducer: https://drive.google.com/file/d/1BHDc6XdXsat07yb94h6VWJ-jIIKhwPfn/view?usp=sharing
>> Syzlang reproducer: https://drive.google.com/file/d/1qo1qfr0KNbyIK909ddAo6uzKnrDPdGyV/view?usp=sharing
>>
>> Should I report the issue to the maintainer responsible for “timer_migration.c”?
>
> If it shows stalls in 2 locations, I assume it can show stalls all
> over the kernel.
>
> The only thing the reproducer is doing is perf_event_open, so I would
> assume the issue is related to perf.
Thanks to Dmitry,
Hi perf maintainers,
We reproduced the issue for multiple rounds.
Does the frequent occurrence of perf_callchain_kernel in the call chain indicate a possible problem with the call chain logging or processing logic for performance events?
We lack the relevant technical background, could you help us to check the cause of the issue?
————
Thanks,
Kun Hu.