Re: [PATCH 01/14] x86/cqm: Intel Resource Monitoring Documentation

From: Shivappa Vikas
Date: Sat Dec 24 2016 - 20:51:29 EST




On Fri, 23 Dec 2016, Peter Zijlstra wrote:

On Fri, Dec 23, 2016 at 11:35:03AM -0800, Shivappa Vikas wrote:

Hello Peterz,

On Fri, 23 Dec 2016, Peter Zijlstra wrote:

On Fri, Dec 16, 2016 at 03:12:55PM -0800, Vikas Shivappa wrote:
+Continuous monitoring
+---------------------
+A new file cont_monitoring is added to perf_cgroup which helps to enable
+cqm continuous monitoring. Enabling this field would start monitoring of
+the cgroup without perf being launched. This can be used for long term
+light weight monitoring of tasks/cgroups.
+
+To enable continuous monitoring of cgroup p1.
+#echo 1 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_cont_monitoring
+
+To disable continuous monitoring of cgroup p1.
+#echo 0 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_cont_monitoring
+
+To read the counters at the end of monitoring perf can be used.
+
+LAZY and NOLAZY Monitoring
+--------------------------
+LAZY:
+By default when monitoring is enabled, the RMIDs are not allocated
+immediately and allocated lazily only at the first sched_in.
+There are 2-4 RMIDs per logical processor on each package. So if a dual
+package has 48 logical processors, there would be upto 192 RMIDs on each
+package = total of 192x2 RMIDs.
+There is a possibility that RMIDs can runout and in that case the read
+reports an error since there was no RMID available to monitor for an
+event.
+
+NOLAZY:
+When user wants guaranteed monitoring, he can enable the 'monitoring
+mask' which is basically used to specify the packages he wants to
+monitor. The RMIDs are statically allocated at open and failure is
+indicated if RMIDs are not available.
+
+To specify monitoring on package 0 and package 1:
+#echo 0-1 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_mon_mask
+
+An error is thrown if packages not online are specified.

I very much dislike both those for adding files to the perf cgroup.
Drivers should really not do that.

Is the continuous monitoring the issue or the interface (adding a file in
perf_cgroup) ? I have not mentioned in the documentaion but this continuous
monitoring/ monitoring mask applies only to cgroup in this patch and hence
we thought a good place for that is in the cgroup itself because its per
cgroup.

For task events , this wont apply and we are thinking of providing a prctl
based interface for user to toggle the continous monitoring ..

More fail..


I absolutely hate the second because events already have affinity.

This applies to continuous monitoring as well when there are no events
associated. Meaning if the monitoring mask is chosen and user tries to
enable continuous monitoring using the cgrp->cont_mon - all RMIDs are
allocated immediately. the mon_mask provides a way for the user to have
guarenteed RMIDs for both that have events and for continoous monitoring(no
perf event associated) (assuming user uses it when user knows he would
definitely use it.. or else there is LAZY mode)

Again this is cgroup specific and wont apply to task events and is needed
when there are no events associated.

So no, the problem is that a driver introduces special ABI and behaviour
that radically departs from the regular behaviour.

Ok , looks like the interface is the problem. Will try to fix this. We are just trying to have a light weight monitoring
option so that its reasonable to monitor for a
very long time (like lifetime of process etc). Mainly to not have all the perf scheduling overhead.
May be a perf event attr option is a more reasonable approach for the user to choose the option ? (rather than some new interface like prctl / cgroup file..)

Thanks,
Vikas