[PATCH 2/2] x86/intel_rdt: Coordinate performance monitoring with perf

From: Reinette Chatre
Date: Tue Jul 31 2018 - 15:38:47 EST


It is possible to measure cache pseudo-locking success using performance
monitoring counters. This measurement is triggered from user space via
the resctrl debugfs interface.

At this time the usage of the performance monitoring counters are not
coordinated with other users. If any other system measurement is in
progress, for example using perf, then the registers would be clobbered
between the multiple users.

Now that users have access to reserve_pmc_hardware() and
release_pmc_hardware() these functions can be used to ensure that only
one user has access to PMC hardware at a time. Internally this was
already used by perf - the cache pseudo-locking debugging needs to use
it also.

Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
---
Documentation/x86/intel_rdt_ui.txt | 4 ----
arch/x86/kernel/cpu/intel_rdt.h | 2 ++
arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 8 ++++++++
3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index f662d3c530e5..a98e05d3e233 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -520,10 +520,6 @@ the pseudo-locked region:
2) Cache hit and miss measurements using model specific precision counters if
available. Depending on the levels of cache on the system the pseudo_lock_l2
and pseudo_lock_l3 tracepoints are available.
- WARNING: triggering this measurement uses from two (for just L2
- measurements) to four (for L2 and L3 measurements) precision counters on
- the system, if any other measurements are in progress the counters and
- their corresponding event registers will be clobbered.

When a pseudo-locked region is created a new debugfs directory is created for
it in debugfs as /sys/kernel/debug/resctrl/<newdir>. A single
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 4e588f36228f..280dde8c8229 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -558,5 +558,7 @@ void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
void cqm_handle_limbo(struct work_struct *work);
bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
void __check_limbo(struct rdt_domain *d, bool force_free);
+extern bool reserve_pmc_hardware(void);
+extern void release_pmc_hardware(void);

#endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index f80c58f8adc3..164e9b8b070b 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -1010,6 +1010,8 @@ static int measure_cycles_perf_fn(void *_plr)
}

local_irq_disable();
+ if (!reserve_pmc_hardware())
+ goto out_intr;
/*
* Call wrmsr direcly to avoid the local register variables from
* being overwritten due to reordering of their assignment with
@@ -1066,6 +1068,7 @@ static int measure_cycles_perf_fn(void *_plr)
l3_miss = native_read_pmc(3);
}
wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+ release_pmc_hardware();
local_irq_enable();
/*
* On BDW we count references and misses, need to adjust. Sometimes
@@ -1083,6 +1086,11 @@ static int measure_cycles_perf_fn(void *_plr)
trace_pseudo_lock_l3(l3_hits, l3_miss);
}

+ goto out;
+
+out_intr:
+ local_irq_enable();
+ pr_err_ratelimited("Failed to reserve performance monitoring regs\n");
out:
plr->thread_done = 1;
wake_up_interruptible(&plr->lock_thread_wq);
--
2.17.0