Re: [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes

From: Shivappa Vikas
Date: Tue Feb 07 2017 - 15:22:47 EST




On Tue, 7 Feb 2017, Stephane Eranian wrote:

Hi,

I wanted to take a few steps back and look at the overall goals for
cache monitoring.
From the various threads and discussion, my understanding is as follows.

I think the design must ensure that the following usage models can be monitored:
- the allocations in your CAT partitions
- the allocations from a task (inclusive of children tasks)
- the allocations from a group of tasks (inclusive of children tasks)
- the allocations from a CPU
- the allocations from a group of CPUs

All cases but first one (CAT) are natural usage. So I want to describe
the CAT in more details.
The goal, as I understand it, it to monitor what is going on inside
the CAT partition to detect
whether it saturates or if it has room to "breathe". Let's take a
simple example.

Suppose, we have a CAT group, cat1:

cat1: 20MB partition (CLOSID1)
CPUs=CPU0,CPU1
TASKs=PID20

There can only be one CLOSID active on a CPU at a time. The kernel
chooses to prioritize tasks over CPU when enforcing cases with multiple
CLOSIDs.

Let's review how this works for cat1 and for each scenario look at how
the kernel enforces or not the cache partition:

1. ENFORCED: PIDx with no CLOSID runs on CPU0 or CPU1
2. NOT ENFORCED: PIDx with CLOSIDx (x!=1) runs on CPU0, CPU1
3. ENFORCED: PID20 runs with CLOSID1 on CPU0, CPU1
4. ENFORCED: PID20 runs with CLOSID1 on CPUx (x!=0,1) with CPU CLOSIDx (x!=1)
5. ENFORCED: PID20 runs with CLOSID1 on CPUx (x!=0,1) with no CLOSID

Now, let's review how we could track the allocations done in cat1 using a single
RMID. There can only be one RMID active at a time per CPU. The kernel
chooses to prioritize tasks over CPU:

cat1: 20MB partition (CLOSID1, RMID1)
CPUs=CPU0,CPU1
TASKs=PID20

1. MONITORED: PIDx with no RMID runs on CPU0 or CPU1
2. NOT MONITORED: PIDx with RMIDx (x!=1) runs on CPU0, CPU1
3. MONITORED: PID20 with RMID1 runs on CPU0, CPU1
4. MONITORED: PID20 with RMD1 runs on CPUx (x!=0,1) with CPU RMIDx (x!=1)
5. MONITORED: PID20 runs with RMID1 on CPUx (x!=0,1) with no RMID

To make sense to a user, the cases where the hardware monitors MUST be
the same as the cases where the hardware enforces the cache
partitioning.

Here we see that it works using a single RMID.

However doing so limits certain monitoring modes where a user might want to
get a breakdown per CPU of the allocations, such as with:
$ perf stat -a -A -e llc_occupancy -R cat1
(where -R points to the monitoring group in rsrcfs). Here this mode would not be
possible because the two CPUs in the group share the same RMID.

In the requirements here https://marc.info/?l=linux-kernel&m=148597969808732

8) Can get measurements for subsets of tasks in a CAT group (to find the guys hogging the resources).

This should also applies to the subsets of cpus.

That would let you monitor on CPUs that is a subset or different from a CAT group. That should let you create mon groups like in the second example you mention along with the control groups above.

mon0: RMID0
CPUs=CPU0

mon1: RMID1
CPUs=CPU1

mon2: RMID2
CPUs=CPU2

...



Now let's take another scenario, and suppose you have two monitoring groups
as follows:

mon1: RMID1
CPUs=CPU0,CPU1
mon2: RMID2
TASKS=PID20

If PID20 runs on CP0, then RMID2 is activated, and thus allocations
done by PID20 are not counted towards RMID1. There is a blind spot.

Whether or not this is a problem depends on the semantic exported by
the interface for CPU mode:
1-Count all allocations from any tasks running on CPU
2-Count all allocations from tasks which are NOT monitoring themselves

If the kernel choses 1, then there is a blind spot and the measurement
is not as accurate as it could be because of the decision to use only one RDMID.
But if the kernel choses 2, then everything works fine with a single RMID.

If the kernel treats occupancy monitoring as measuring cycles on a CPU, i.e.,
measure any activity from any thread (choice 1), then the single RMID per group
does not work.

If the kernel treats occupancy monitoring as measuring cycles in a cgroup on a
CPU, i.e., measures only when threads of the cgroup run on that CPU, then using
a single RMID per group works.


Agree there are blind spots in both. But the requirements is trying to be based on the resctrl allocation as Thomas suggested.
Which is aligned to monitoring real time tasks as i understand.
for the above example, some tasks which donot have an RMID(say in the root group) are the real time tasks that are specially configured to running on a cpux which need to be allocated or monitored.


Hope this helps clarifies the usage model and design choices.