Re: [RFC][PATCH] perf: Rewrite enabled/running timekeeping

From: Alexey Budankov
Date: Mon Sep 04 2017 - 06:46:58 EST


Hi,
On 31.08.2017 20:18, Peter Zijlstra wrote:
> On Wed, Aug 23, 2017 at 11:54:15AM +0300, Alexey Budankov wrote:
>> On 22.08.2017 23:47, Peter Zijlstra wrote:
>>> On Thu, Aug 10, 2017 at 06:57:43PM +0300, Alexey Budankov wrote:
>>>> The key thing in the patch is explicit updating of tstamp fields for
>>>> INACTIVE events in update_event_times().
>>>
>>>> @@ -1405,6 +1426,9 @@ static void update_event_times(struct perf_event *event)
>>>> event->group_leader->state < PERF_EVENT_STATE_INACTIVE)
>>>> return;
>>>>
>>>> + if (event->state == PERF_EVENT_STATE_INACTIVE)
>>>> + perf_event_tstamp_update(event);
>>>> +
>>>> /*
>>>> * in cgroup mode, time_enabled represents
>>>> * the time the event was enabled AND active
>>>
>>> But why!? I thought the whole point was to not need to do this.
>>
>> update_event_times() is not called from timer interrupt handler
>> thus it is not on the critical path which is optimized in this patch set.
>>
>> But update_event_times() is called in the context of read() syscall so
>> this is the place where we may update event times for INACTIVE events
>> instead of timer interrupt.
>>
>> Also update_event_times() is called on thread context switch out so
>> we get event times also updated when the thread migrates to other CPU.
>>
>>>
>>> The thing I outlined earlier would only need to update timestamps when
>>> events change state and at no other point in time.
>>
>> But we still may request times while event is in INACTIVE state
>> thru read() syscall and event timings need to be up-to-date.
>
> Sure, read() also updates.
>
> So the below completely rewrites timekeeping (and probably breaks
> world) but does away with the need to touch events that don't get
> scheduled.

We still need and do iterate thru all events at some points e.g. on context switches.

>
> Esp the cgroup stuff is entirely untested since I simply don't know how
> to operate that. I did run Vince's tests on it, and I think it doesn't
> regress, but I'm near a migraine so I can't really see straight atm.
>
> Vince, Stephane, could you guys have a peek?
>
> (There's a few other bits in, I'll break up into patches and write
> comments and Changelogs later, I think its can be split in some 5
> patches).
>
> The basic idea is really simple, we have a single timestamp and
> depending on the state we update enabled/running. This obviously only
> requires updates when we change state and when we need up-to-date
> timestamps (read).

I would prefer to have this rework in a FSM similar to that below,
so state transition and the corresponding tstamp, total_time_enabled
and total_time_running manipulation logic would be consolidated in
one place and adjacent lines of code.

>From the table below event->state FSM is not as simple as it may seem
on the first sight so in order to avoid regressions after rework we
better keep that in mind and explicitly implement allowed and disallowed
state transitions.

A I O E X D U

A Te+,Tr+ Te+,Tr+ Te+,Tr+ Te+,Tr+ Te+,Tr+ Te+,Tr+ ---
ts ts ts ts ts ts

I Te+,ts Te+,ts Te+,ts Te+,ts Te+,ts Te+,ts ---

O Te=0,Tr=0, Te=0,Tr=0, Te=0,Tr=0 Te=0,Tr=0 Te=0,Tr=0 Te=0,Tr=0 ---
ts ts ts ts ts ts

E Te=0,Tr=0, Te=0,Tr=0, Te=0,Tr=0 Te=0,Tr=0 Te=0,Tr=0 Te=0,Tr=0 ---
ts ts ts ts ts ts

X --- --- --- --- --- --- ---

D --- --- --- --- --- --- ---

U --- Te=0,Tr=0 Te=0,Tr=0 --- --- --- ---
ts ts

LEGEND:

U - allocation, A - ACTIVE, I - INACTIVE, O - OFF,
E - ERROR, X - EXIT, D - DEAD,

Te=0 - event->total_time_enabled = 0
Te+ - event->total_time_enabled += delta

Tr=0 - event->total_time_running = 0
Tr+ - event->total_time_running += delta

ts - event->tstamp = perf_event_time(event)

static void
perf_event_change_state(struct perf_event *event, enum perf_event_state state)
{
u64 delta = 0;
u64 now = perf_event_time(event);

delta = now - event->tstamp;
event->tstamp = now;

switch(event->state)
{
case A:
switch(state)
{
case A:
...
break;
case I:
event->total_time_enabled += delta;
event->total_time_running += delta;
event->state = state;
break;
case O:
...
break;
case E:
...
...
case I:
...
break;
...
}
}

---

Regards,
Alexey