[PATCH V2 1/1] perf, core: Use sample period avg as child event's initial period

From: kan . liang
Date: Wed Mar 18 2015 - 08:19:40 EST


From: Kan Liang <kan.liang@xxxxxxxxx>

For perf record frequency mode, the initial sample_period is 1. That's
because perf doesn't know what period should be set. It uses the minimum
period 1 as the first period. It will trigger an interrupt soon. Then
there will be enough data to calculate the period for the given
frequency. But too many very short period like 1 may cause various
problems and increase the overhead. It's better to limit the 1 period to
just the first several period setting.

However, for some workload, 1 period is frequently set. For example,
perf record a busy loop for 10 seconds.

perf record ./finity_busy_loop.sh 10

while [ "A" != "B" ]
do
date > /dev/null
done

Period was changed 150503 times in 10 seconds. 22.5% (33861 times) of
the period is set to 1. That's because, in the inherit_event, the period
for child event is inherit from parent's parent's event, which is
usually the default sample_period 1. Each child event has to recalculate
the period from 1 everytime. That brings high overhead.

This patch keeps the sample period average in parent event. Each new
child event can use it as its initial sample period.
The avg_sample_period was introduced in struct perf_event to track the
average sample period information. The information is kept in the parent
event, since it's the only node everyone knows. For reducing the
contention of updating parent event, the value is updated every tick.
We also need to get rid of 1 period as earlier as possible. So the value
is also updated in the first period adjustment.
For each new child event, the average sample period of parent event will
be assigned to the child as its first sample period.
For each new child event, the parent event refcount++. Parent will not
go away until all children go away. So it's safe to access its parent.

After applying this patch, the 1 period rate reduces to 0.1%.

Signed-off-by: Kan Liang <kan.liang@xxxxxxxxx>
---

Changes since V1
- The average sample period will be kept in parent event
- Use atomic64_t to replace local64_t
- Only update the AVG every tick or the first time

Re-send V2

include/linux/perf_event.h | 2 ++
kernel/events/core.c | 21 +++++++++++++++++++--
2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b8f69d3..b53d88f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -426,6 +426,8 @@ struct perf_event {
struct list_head child_list;
struct perf_event *parent;

+ atomic64_t avg_sample_period;
+
int oncpu;
int cpu;

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2709063..b317e87 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2904,6 +2904,7 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
struct hw_perf_event *hwc = &event->hw;
s64 period, sample_period;
s64 delta;
+ struct perf_event *head_event = (event->parent != NULL) ? event->parent : event;

period = perf_calculate_period(event, nsec, count);

@@ -2917,6 +2918,17 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo

hwc->sample_period = sample_period;

+ /*
+ * Update the AVG sample period in first period adjustment
+ * Then update it every tick to reduce contention.
+ */
+ if (!disable || (atomic64_read(&head_event->avg_sample_period) == 1)) {
+ s64 avg_period;
+
+ avg_period = (atomic64_read(&head_event->avg_sample_period) + sample_period) / 2;
+ atomic64_set(&head_event->avg_sample_period, avg_period);
+ }
+
if (local64_read(&hwc->period_left) > 8*sample_period) {
if (disable)
event->pmu->stop(event, PERF_EF_UPDATE);
@@ -7199,8 +7211,13 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,

hwc = &event->hw;
hwc->sample_period = attr->sample_period;
- if (attr->freq && attr->sample_freq)
+ if (attr->freq && attr->sample_freq) {
hwc->sample_period = 1;
+ if (parent_event)
+ hwc->sample_period = atomic64_read(&parent_event->avg_sample_period);
+ else
+ atomic64_set(&event->avg_sample_period, hwc->sample_period);
+ }
hwc->last_period = hwc->sample_period;

local64_set(&hwc->period_left, hwc->sample_period);
@@ -8159,7 +8176,7 @@ inherit_event(struct perf_event *parent_event,
child_event->state = PERF_EVENT_STATE_OFF;

if (parent_event->attr.freq) {
- u64 sample_period = parent_event->hw.sample_period;
+ u64 sample_period = atomic64_read(&parent_event->avg_sample_period);
struct hw_perf_event *hwc = &child_event->hw;

hwc->sample_period = sample_period;
--
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/