Re: [PATCH 4/6] x86/mbm: Memory bandwidth monitoring event management

From: Peter Zijlstra
Date: Tue Mar 08 2016 - 03:49:27 EST


On Mon, Mar 07, 2016 at 11:27:26PM +0000, Luck, Tony wrote:
> >> + bytes = mbm_current->interval_bytes * MSEC_PER_SEC;
> >> + do_div(bytes, diff_time);
> >> + mbm_current->bandwidth = bytes;
> >> + mbm_current->interval_bytes = 0;
> >> + mbm_current->interval_start = cur_time;
> >> + }
> >>> +
> >> + return mbm_current;
> >> +}
> >
> > How does the above time tracking deal with the event not actually having
> > been scheduled the whole time?
>
> That's been the topic of a few philosophical debates ... what exactly are
> we trying to say when we report that a process has a "memory bandwidth"
> of, say, 1523 MBytes/s? We need to know both the amount of data moved
> and to pick an interval to measure and divide by. Does it make a difference
> whether the process voluntarily gave up the cpu for some part of the interval
> (by blocking on I/O)? Or did the scheduler time-slice it out to run other jobs?
>
> The above code gives the average bandwidth across the last interval
> (with a minimum interval size of 100ms to avoid craziness with rounding
> errors on exceptionally tiny intervals). Some folks apparently want to get
> a "rate" directly from perf. I think many folks will find the "bytes" counters
> more helpful (where they control the sample interval with '-I" flag to perf
> utility).

So why didn't any of that make it into the Changelog? This is very much
different from any other perf driver, at the very least this debate
should have been mentioned and the choice defended.

Also, why are we doing the time tracking and divisions at all? Can't we
simply report the number of bytes transferred and let userspace sort out
the rest?

Userspace is provided the time the event was enabled, the time the event
was running and it can fairly trivially obtain walltime if it so desires
and then it can compute whatever definition of bandwidth it wants to
use.