Re: [PATCH] perf/core: Fix cgroup events tracking
From: Chengming Zhou
Date: Wed Dec 07 2022 - 06:20:33 EST
On 2022/12/7 18:41, Ravi Bangoria wrote:
> On 06-Dec-22 8:20 AM, Chengming Zhou wrote:
>> We encounter perf warnings when using cgroup events like:
>> ```
>> cd /sys/fs/cgroup
>> mkdir test
>> perf stat -e cycles -a -G test
>> ```
>>
>> WARNING: CPU: 0 PID: 690 at kernel/events/core.c:849 perf_cgroup_switch+0xb2/0xc0
>> [ 91.393417] Call Trace:
>> [ 91.393772] <TASK>
>> [ 91.394080] __schedule+0x4ae/0x9f0
>> [ 91.394535] ? _raw_spin_unlock_irqrestore+0x23/0x40
>> [ 91.395145] ? __cond_resched+0x18/0x20
>> [ 91.395622] preempt_schedule_common+0x2d/0x70
>> [ 91.396163] __cond_resched+0x18/0x20
>> [ 91.396621] wait_for_completion+0x2f/0x160
>> [ 91.397137] ? cpu_stop_queue_work+0x9e/0x130
>> [ 91.397665] affine_move_task+0x18a/0x4f0
>
> nit: These timestamps can be removed in commit log.
Ok, will remove.
>
>>
>> WARNING: CPU: 0 PID: 690 at kernel/events/core.c:829 ctx_sched_in+0x1cf/0x1e0
>> [ 91.430151] Call Trace:
>> [ 91.430490] <TASK>
>> [ 91.430793] ? ctx_sched_out+0xb7/0x1b0
>> [ 91.431274] perf_cgroup_switch+0x88/0xc0
>> [ 91.431778] __schedule+0x4ae/0x9f0
>> [ 91.432215] ? _raw_spin_unlock_irqrestore+0x23/0x40
>> [ 91.432825] ? __cond_resched+0x18/0x20
>> [ 91.433299] preempt_schedule_common+0x2d/0x70
>> [ 91.433839] __cond_resched+0x18/0x20
>> [ 91.434298] wait_for_completion+0x2f/0x160
>> [ 91.434808] ? cpu_stop_queue_work+0x9e/0x130
>> [ 91.435334] affine_move_task+0x18a/0x4f0
>>
>> The above two warnings are not complete here since I remove other
>> unimportant information. The problem is caused by the perf cgroup
>> events tracking:
>>
>> CPU0 CPU1
>> perf_event_open()
>> perf_event_alloc()
>> account_event()
>> account_event_cpu()
>> atomic_inc(perf_cgroup_events)
>> __perf_event_task_sched_out()
>> if (atomic_read(perf_cgroup_events))
>> perf_cgroup_switch()
>> // kernel/events/core.c:849
>> WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0)
>> if (READ_ONCE(cpuctx->cgrp) == cgrp) // false
>> return
>> perf_ctx_lock()
>> ctx_sched_out()
>> cpuctx->cgrp = cgrp
>> ctx_sched_in()
>> perf_cgroup_set_timestamp()
>> // kernel/events/core.c:829
>> WARN_ON_ONCE(!ctx->nr_cgroups)
>> perf_ctx_unlock()
>> perf_install_in_context()
>> add_event_to_ctx()
>> list_add_event()
>> perf_cgroup_event_enable()
>> ctx->nr_cgroups++
>> cpuctx->cgrp = X
>
> IIUC, since it's a cgroup event, perf_install_in_context() will do:
> cpu_function_call(cpu, __perf_install_in_context, event). And thus,
> callchain starting with add_event_to_ctx() will be executed on CPU1,
> not on CPU0.
Right, will fix it next version.
>
>> We can see from above that we wrongly use percpu atomic perf_cgroup_events
>> to check if we need to perf_cgroup_switch(), which should only be used
>> when we know this CPU has cgroup events enabled.
>>
>> The commit bd2756811766 ("perf: Rewrite core context handling") change
>> to have only one context per-CPU, so we can just use cpuctx->cgrp to
>> check if this CPU has cgroup events enabled.
>>
>> So percpu atomic perf_cgroup_events is not needed.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
>
> Fixes: bd2756811766 ("perf: Rewrite core context handling")
>
> Otherwise looks good.
> Tested-by: Ravi Bangoria <ravi.bangoria@xxxxxxx>
Ok, will add Fixes tag next version.
Thanks!
>
> Thanks,
> Ravi