Re: Bug: Potential KCOV Race Condition in __sanitizer_cov_trace_pc Leading to Crash at kcov.c:217

From: Dmitry Vyukov
Date: Fri Jan 10 2025 - 07:13:50 EST


On Fri, 10 Jan 2025 at 09:14, Kun Hu <huk23@xxxxxxxxxxxxxx> wrote:
> >> HEAD commit: dbfac60febfa806abb2d384cb6441e77335d2799
> >> git tree: upstream
> >> Console output: https://drive.google.com/file/d/1rmVTkBzuTt0xMUS-KPzm9OafMLZVOAHU/view?usp=sharing
> >> Kernel config: https://drive.google.com/file/d/1m1mk_YusR-tyusNHFuRbzdj8KUzhkeHC/view?usp=sharing
> >> C reproducer: /
> >> Syzlang reproducer: /
> >>
> >> The crash in __sanitizer_cov_trace_pc at kernel/kcov.c:217 seems to be related to the handling of KCOV instrumentation when running in a preemption or IRQ-sensitive context. Specifically, the code might allow potential recursive invocations of __sanitizer_cov_trace_pc during early interrupt handling, which could lead to data races or inconsistent updates to the coverage area (kcov_area). It remains unclear whether this is a KCOV-specific issue or a rare edge case exposed by fuzzing.
> >
> > Hi Kun,
> >
> > How have you inferred this from the kernel oops?
> > I only see a stall that may have just happened to be caught inside of
> > __sanitizer_cov_trace_pc function since it's executed often in an
> > instrumented kernel.
> >
> > Note: on syzbot we don't report stalls on instances that have
> > perf_event_open enabled, since perf have known bugs that lead to stall
> > all over the kernel.
>
> Hi Dmitry,
>
> Please allow me to ask for your advice:
>
> We get the new c and syzlang reproducer for multiple rounds of reproducing. Indeed, the location of this issue has varied (BUG: soft lockup in tmigr_handle_remote in ./kernel/time/timer_migration.c). The crash log, along with the C and Syzlang reproducer are provided below:
>
> Crash log: https://drive.google.com/file/d/16YDP6bU3Ga8OI1l7hsNFG4EdvjxuBz8d/view?usp=sharing
> C reproducer: https://drive.google.com/file/d/1BHDc6XdXsat07yb94h6VWJ-jIIKhwPfn/view?usp=sharing
> Syzlang reproducer: https://drive.google.com/file/d/1qo1qfr0KNbyIK909ddAo6uzKnrDPdGyV/view?usp=sharing
>
> Should I report the issue to the maintainer responsible for “timer_migration.c”?

If it shows stalls in 2 locations, I assume it can show stalls all
over the kernel.

The only thing the reproducer is doing is perf_event_open, so I would
assume the issue is related to perf.