On Thu, Jan 19, 2017 at 6:32 PM, Vikas Shivappa
<vikas.shivappa@xxxxxxxxxxxxxxx> wrote:
Resending including Thomas , also with some changes. Sorry for the spam
Based on Thomas and Peterz feedback Can think of two design
variants which target:
-Support monitoring and allocating using the same resctrl group.
user can use a resctrl group to allocate resources and also monitor
them (with respect to tasks or cpu)
-Also allows monitoring outside of resctrl so that user can
monitor subgroups who use the same closid. This mode can be used
when user wants to monitor more than just the resctrl groups.
The first design version uses and modifies perf_cgroup, second version
builds a new interface resmon.
The second version would require to build a whole new set of tools,
deploy them and maintain them. Users will have to run perf for certain
events and resmon (or whatever is named the new tool) for rdt. I see
it as too complex and much prefer to keep using perf.
The first version is close to the patches
sent with some additions/changes. This includes details of the design as
per Thomas/Peterz feedback.
1> First Design option: without modifying the resctrl and using perf
--------------------------------------------------------------------
--------------------------------------------------------------------
In this design everything in resctrl interface works like
before (the info, resource group files like task schemata all remain the
same)
Monitor cqm using perf
----------------------
perf can monitor individual tasks using the -t
option just like before.
# perf stat -e llc_occupancy -t PID1,PID2
user can monitor the cpu occupancy using the -C option in perf:
# perf stat -e llc_occupancy -C 5
Below shows how user can monitor cgroup occupancy:
# mount -t cgroup -o perf_event perf_event /sys/fs/cgroup/perf_event/
# mkdir /sys/fs/cgroup/perf_event/g1
# mkdir /sys/fs/cgroup/perf_event/g2
# echo PID1 > /sys/fs/cgroup/perf_event/g2/tasks
# perf stat -e intel_cqm/llc_occupancy/ -a -G g2
To monitor a resctrl group, user can group the same tasks in resctrl
group into the cgroup.
To monitor the tasks in p1 in example 2 below, add the tasks in resctrl
group p1 to cgroup g1
# echo 5678 > /sys/fs/cgroup/perf_event/g1/tasks
Introducing a new option for resctrl may complicate monitoring because
supporting cgroup 'task groups' and resctrl 'task groups' leads to
situations where:
if the groups intersect, then there is no way to know what
l3_allocations contribute to which group.
ex:
p1 has tasks t1, t2, t3
g1 has tasks t2, t3, t4
The only way to get occupancy for g1 and p1 would be to allocate an RMID
for each task which can as well be done with the -t option.
That's simply recreating the resctrl group as a cgroup.
I think that the main advantage of doing allocation first is that we
could use the context switch in rdt allocation and greatly simplify
the pmu side of it.
If resctrl groups could lift the restriction of one resctl per CLOSID,
then the user can create many resctrl in the way perf cgroups are
created now. The advantage is that there wont be cgroup hierarchy!
making things much simpler. Also no need to optimize perf event
context switch to make llc_occupancy work.
Then we only need a way to express that monitoring must happen in a
resctl to the perf_event_open syscall.
My first thought is to have a "rdt_monitor" file per resctl group. A
user passes it to perf_event_open in the way cgroups are passed now.
We could extend the meaning of the flag PERF_FLAG_PID_CGROUP to also
cover rdt_monitor files. The syscall can figure if it's a cgroup or a
rdt_group. The rdt_monitoring PMU would only work with rdt_monitor
groups
Then the rdm_monitoring PMU will be pretty dumb, having neither task
nor CPU contexts. Just providing the pmu->read and pmu->event_init
functions.
Task monitoring can be done with resctrl as well by adding the PID to
a new resctl and opening the event on it. And, since we'd allow CLOSID
to be shared between resctrl groups, allocation wouldn't break.