Re: [PATCH] perf: Fix RCU dereference check in perf_event_comm

From: Stephane Eranian
Date: Fri May 18 2012 - 12:38:09 EST


On Mon, Mar 26, 2012 at 2:41 PM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
> On Thu, 2012-03-22 at 13:36 +0200, Ari Savolainen wrote:
>> 22. maaliskuuta 2012 11.53 Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> kirjoitti:
>> > On Thu, 2012-03-22 at 01:43 +0200, Ari Savolainen wrote:
>> >> The warning below is printed when executing a command like
>> >> sudo perf record su - user -c "echo hello"
>> >>
>> >> It's fixed by moving the call of perf_event_comm to be protected
>> >> by the task lock.
>> >
>> > That seems like a rather poor solution since it increases the lock hold
>> > time for no explained reason.
>> >
>> >> include/linux/cgroup.h:567 suspicious rcu_dereference_check() usage!
>> >
>> >> Â[<ffffffff8109be55>] lockdep_rcu_suspicious+0xe5/0x100
>> >> Â[<ffffffff811131fa>] perf_event_comm+0x37a/0x4d0
>> >
>> > So where exactly is this, perf_event_comm_event() takes rcu_read_lock()
>> > so I presume its before that.
>>
>> I think the warning comes from this source-level call path:
>>
>> perf_event_comm ->
>> Â perf_event_enable_on_exec ->
>> Â Â perf_cgroup_sched_out ->
>> Â Â Â perf_cgroup_from_task ->
>> Â Â Â Â task_subsys_state ->
>> Â Â Â Â Â task_subsys_state_check
>>
>> It seems there that path does not take rcu_read_lock(). Where should
>> rcu_read_lock/unlock be added? In perf_group_sched_out around the
>> calls of perf_cgroup_from_task? Like this:
>
> Ah, ok. So IIRC this too is not needed. As the comment near
> perf_cgroup_from_task() says, we hold explicit references to the cgroup.
>
> Ideally we'd come up with a better validation condition but all variants
> I could come up with make the code ugly and might actually generate
> worse code, the current true simply shuts it up.
>
> Stephane any thoughts?
>
I think it is okay to skip the check because we only actually dereference
the point once we know we have ctx.nr_cgroup > 0 or the event is a cgroup
event. And in both cases, that means we have a refcnt on the cgroup, thus
it cannot disappear behind our back.

As you said, the alternatives would be to only call perf_cgroup_from_task()
only AFTER we've made the expensive checks (which we will do again later
in the call chain). Or we would have to grab task->alloc_lock() or cgroup_lock
none of which are cheap.

Acked-by: Stephane Eranian <eranian@xxxxxxxxxx>

> ---
> Âkernel/events/core.c | Â Â2 +-
> Â1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index a6a9ec4..e423261 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -240,7 +240,7 @@ static void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
> Âstatic inline struct perf_cgroup *
> Âperf_cgroup_from_task(struct task_struct *task)
> Â{
> - Â Â Â return container_of(task_subsys_state(task, perf_subsys_id),
> + Â Â Â return container_of(task_subsys_state_check(task, perf_subsys_id, true),
> Â Â Â Â Â Â Â Â Â Â Â Âstruct perf_cgroup, css);
> Â}
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/