Re: [PATCH] perf_counter: Prevent oopses from per-cpu softwarecounters
From: Ingo Molnar
Date: Thu Feb 05 2009 - 09:22:48 EST
* Paul Mackerras <paulus@xxxxxxxxx> wrote:
> Impact: oops fix
>
> Yanmin Zhang reported that using a PERF_COUNT_TASK_CLOCK software
> counter as a per-cpu counter would reliably crash the system, because
> it calls __task_delta_exec with a null pointer. And indeed, a "task
> clock" counter only makes sense as a per-task counter. Similarly,
> counting page faults, context switches or cpu migrations only makes
> sense for a per-task counter.
>
> This fixes the problem by disallowing the use of the task clock,
> page fault, context switch and cpu migration software counters as
> per-cpu counters, since they all require a task context to obtain their
> data. The only software counter that can be used as a per-cpu counter
> is the cpu clock counter (PERF_COUNT_CPU_CYCLES).
>
> In order for sw_perf_counter_init to be able to tell whether we are
> setting up a per-task or a per-cpu counter, this arranges for counter->ctx
> to be initialized earlier, in perf_counter_alloc.
>
> The other minor change this makes is to ensure that if sw_perf_counter_init
> fails, we don't try to initialize the counter as a hardware counter.
> Since the user has passed a negative event type (and it isn't raw), they
> clearly don't intend it to be interpreted as a hardware event. This
> matters now that sw_perf_counter_init can fail for valid software event
> types (because of the check that the counter is a per-task counter).
Hm, i dont really think that the notion that it should not be possible to
use sw counters on a per CPU basis is valid.
You are right that "pagefaults" and "context switches" do get generated by
tasks - but there is a per cpu and system wide notion of 'number of
pagefaults', and people might be interested in monitoring that.
The existence and widespread use of "vmstat", and its display of system-wide
count of "context switches" (and administrator's reliance on judging a
workload based on those counts) is i think ample proof that it makes sense
to have those counters on a per CPU basis too.
So how about fixing these sw counts to properly work as percpu counters too?
Or am i misssing something subtle that makes that impossible?
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/