Re: [PATCH 00/32] 2nd Iteration of Cache QoS Monitoring support.

From: David Carrillo-Cisneros
Date: Fri Apr 29 2016 - 17:10:33 EST


peterz/queue perf/core

On Fri, Apr 29, 2016 at 2:06 PM Vikas Shivappa
<vikas.shivappa@xxxxxxxxxxxxxxx> wrote:
>
>
>
> On Thu, 28 Apr 2016, David Carrillo-Cisneros wrote:
>
> > This series introduces the next iteration of kernel support for the
> > Cache QoS Monitoring (CQM) technology available in Intel Xeon processors.
>
> Wondering what is the kernel version this compiles on ?
>
> Thanks,
> Vikas
>
> >
> > One of the main limitations of the previous version is the inability
> > to simultaneously monitor:
> > 1) cpu event and any other event in that cpu.
> > 2) cgroup events for cgroups in same descendancy line.
> > 3) cgroup events and any thread event of a cgroup in the same
> > descendancy line.
> >
> > Another limitation is that monitoring for a cgroup was enabled/disabled by
> > the existence of a perf event for that cgroup. Since the event
> > llc_occupancy measures changes in occupancy rather than total occupancy,
> > in order to read meaningful llc_occupancy values, an event should be
> > enabled for a long enough period of time. The overhead in context switches
> > caused by the perf events is undesired in some sensitive scenarios.
> >
> > This series of patches addresses the shortcomings mentioned above and,
> > add some other improvements. The main changes are:
> > - No more potential conflicts between different events. New
> > version builds a hierarchy of RMIDs that captures the dependency
> > between monitored cgroups. llc_occupancy for cgroup is the sum of
> > llc_occupancies for that cgroup RMID and all other RMIDs in the
> > cgroups subtree (both monitored cgroups and threads).
> >
> > - A cgroup integration that allows to monitor the a cgroup without
> > creating a perf event, decreasing the context switch overhead.
> > Monitoring is controlled by a boolean cgroup subsystem attribute
> > in each perf cgroup, this is:
> >
> > echo 1 > cgroup_path/perf_event.cqm_cont_monitoring
> >
> > starts CQM monitoring whether or not there is a perf_event
> > attached to the cgroup. Setting the attribute to 0 makes
> > monitoring dependent on the existence of a perf_event.
> > A perf_event is always required in order to read llc_occupancy.
> > This cgroup integration uses Intel's PQR code and is intended to
> > be used by upcoming versions of Intel's CAT.
> >
> > - A more stable rotation algorithm: New algorithm uses SLOs that
> > guarantee:
> > - A minimum of enabled time for monitored cgroups and
> > threads.
> > - A maximum time disabled before error is introduced by
> > reusing dirty RMIDs.
> > - A minimum rate at which RMIDs recycling must progress.
> >
> > - Reduced impact of stealing/rotation of RMIDs: The new algorithm
> > accounts the residual occupancy held by limbo RMIDs towards the
> > former owner of the limbo RMID, decreasing the error introduced
> > by RMID rotation.
> > It also allows a limbo RMID to be reused by its former owner when
> > appropriate, decreasing the potential error of reusing dirty RMIDs
> > and allowing to make progress even if most limbo RMIDs do not
> > drop occupancy fast enough.
> >
> > - Elimination of pmu::count: perf generic's perf_event_count()
> > perform a quick add of atomic types. The introduction of
> > pmu::count in the previous CQM series to read occupancy for thread
> > events changed the behavior of perf_event_count() by performing a
> > potentially slow IPI and write/read to MSR. It also made pmu::read
> > to have different behaviors depending on whether the event was a
> > cpu/cgroup event or a thread. This patches serie removes the custom
> > pmu::count from CQM and provides a consistent behavior for all
> > calls of perf_event_read .
> >
> > - Added error return for pmu::read: Reads to CQM events may fail
> > due to stealing of RMIDs, even after successfully adding an event
> > to a PMU. This patch series expands pmu::read with an int return
> > value and propagates the error to callers that can fail
> > (ie. perf_read).
> > The ability to fail of pmu::read is consistent with the recent
> > changes that allow perf_event_read to fail for transactional
> > reading of event groups.
> >
> > - Introduces the field pmu_event_flags that contain flags set by
> > the PMU to signal variations on the default behavior to perf's
> > generic code. In this series, three flags are introduced:
> > - PERF_CGROUP_NO_RECURSION : Signals generic code to add
> > events of the cgroup ancestors of a cgroup.
> > - PERF_INACTIVE_CPU_READ_PKG: Signals generic coda that
> > this CPU event can be read in any CPU in its event::cpu's
> > package, even if the event is not active.
> > - PERF_INACTIVE_EV_READ_ANY_CPU: Signals generic code that
> > this event can be read in any CPU in any package in the
> > system even if the event is not active.
> > Using the above flags takes advantage of the CQM's hw ability to
> > read llc_occupancy even when the associated perf event is not
> > running in a CPU.
> >
> > This patch series also updates the perf tool to fix error handling and to
> > better handle the idiosyncrasies of snapshot and per-pkg events.
> >
> > David Carrillo-Cisneros (31):
> > perf/x86/intel/cqm: temporarily remove MBM from CQM and cleanup
> > perf/x86/intel/cqm: remove check for conflicting events
> > perf/x86/intel/cqm: remove all code for rotation of RMIDs
> > perf/x86/intel/cqm: make read of RMIDs per package (Temporal)
> > perf/core: remove unused pmu->count
> > x86/intel,cqm: add CONFIG_INTEL_RDT configuration flag and refactor
> > PQR
> > perf/x86/intel/cqm: separate CQM PMU's attributes from x86 PMU
> > perf/x86/intel/cqm: prepare for next patches
> > perf/x86/intel/cqm: add per-package RMIDs, data and locks
> > perf/x86/intel/cqm: basic RMID hierarchy with per package rmids
> > perf/x86/intel/cqm: (I)state and limbo prmids
> > perf/x86/intel/cqm: add per-package RMID rotation
> > perf/x86/intel/cqm: add polled update of RMID's llc_occupancy
> > perf/x86/intel/cqm: add preallocation of anodes
> > perf/core: add hooks to expose architecture specific features in
> > perf_cgroup
> > perf/x86/intel/cqm: add cgroup support
> > perf/core: adding pmu::event_terminate
> > perf/x86/intel/cqm: use pmu::event_terminate
> > perf/core: introduce PMU event flag PERF_CGROUP_NO_RECURSION
> > x86/intel/cqm: use PERF_CGROUP_NO_RECURSION in CQM
> > perf/x86/intel/cqm: handle inherit event and inherit_stat flag
> > perf/x86/intel/cqm: introduce read_subtree
> > perf/core: introduce PERF_INACTIVE_*_READ_* flags
> > perf/x86/intel/cqm: use PERF_INACTIVE_*_READ_* flags in CQM
> > sched: introduce the finish_arch_pre_lock_switch() scheduler hook
> > perf/x86/intel/cqm: integrate CQM cgroups with scheduler
> > perf/core: add perf_event cgroup hooks for subsystem attributes
> > perf/x86/intel/cqm: add CQM attributes to perf_event cgroup
> > perf,perf/x86,perf/powerpc,perf/arm,perf/*: add int error return to
> > pmu::read
> > perf,perf/x86: add hook perf_event_arch_exec
> > perf/stat: revamp error handling for snapshot and per_pkg events
> >
> > Stephane Eranian (1):
> > perf/stat: fix bug in handling events in error state
> >
> > arch/alpha/kernel/perf_event.c | 3 +-
> > arch/arc/kernel/perf_event.c | 3 +-
> > arch/arm64/include/asm/hw_breakpoint.h | 2 +-
> > arch/arm64/kernel/hw_breakpoint.c | 3 +-
> > arch/metag/kernel/perf/perf_event.c | 5 +-
> > arch/mips/kernel/perf_event_mipsxx.c | 3 +-
> > arch/powerpc/include/asm/hw_breakpoint.h | 2 +-
> > arch/powerpc/kernel/hw_breakpoint.c | 3 +-
> > arch/powerpc/perf/core-book3s.c | 11 +-
> > arch/powerpc/perf/core-fsl-emb.c | 5 +-
> > arch/powerpc/perf/hv-24x7.c | 5 +-
> > arch/powerpc/perf/hv-gpci.c | 3 +-
> > arch/s390/kernel/perf_cpum_cf.c | 5 +-
> > arch/s390/kernel/perf_cpum_sf.c | 3 +-
> > arch/sh/include/asm/hw_breakpoint.h | 2 +-
> > arch/sh/kernel/hw_breakpoint.c | 3 +-
> > arch/sparc/kernel/perf_event.c | 2 +-
> > arch/tile/kernel/perf_event.c | 3 +-
> > arch/x86/Kconfig | 6 +
> > arch/x86/events/amd/ibs.c | 2 +-
> > arch/x86/events/amd/iommu.c | 5 +-
> > arch/x86/events/amd/uncore.c | 3 +-
> > arch/x86/events/core.c | 3 +-
> > arch/x86/events/intel/Makefile | 3 +-
> > arch/x86/events/intel/bts.c | 3 +-
> > arch/x86/events/intel/cqm.c | 3847 +++++++++++++++++++++---------
> > arch/x86/events/intel/cqm.h | 519 ++++
> > arch/x86/events/intel/cstate.c | 3 +-
> > arch/x86/events/intel/pt.c | 3 +-
> > arch/x86/events/intel/rapl.c | 3 +-
> > arch/x86/events/intel/uncore.c | 3 +-
> > arch/x86/events/intel/uncore.h | 2 +-
> > arch/x86/events/msr.c | 3 +-
> > arch/x86/include/asm/hw_breakpoint.h | 2 +-
> > arch/x86/include/asm/perf_event.h | 41 +
> > arch/x86/include/asm/pqr_common.h | 74 +
> > arch/x86/include/asm/processor.h | 4 +
> > arch/x86/kernel/cpu/Makefile | 4 +
> > arch/x86/kernel/cpu/pqr_common.c | 43 +
> > arch/x86/kernel/hw_breakpoint.c | 3 +-
> > arch/x86/kvm/pmu.h | 10 +-
> > drivers/bus/arm-cci.c | 3 +-
> > drivers/bus/arm-ccn.c | 3 +-
> > drivers/perf/arm_pmu.c | 3 +-
> > include/linux/perf_event.h | 91 +-
> > kernel/events/core.c | 170 +-
> > kernel/sched/core.c | 1 +
> > kernel/sched/sched.h | 3 +
> > kernel/trace/bpf_trace.c | 5 +-
> > tools/perf/builtin-stat.c | 43 +-
> > tools/perf/util/counts.h | 19 +
> > tools/perf/util/evsel.c | 44 +-
> > tools/perf/util/evsel.h | 8 +-
> > tools/perf/util/stat.c | 35 +-
> > 54 files changed, 3746 insertions(+), 1337 deletions(-)
> > create mode 100644 arch/x86/events/intel/cqm.h
> > create mode 100644 arch/x86/include/asm/pqr_common.h
> > create mode 100644 arch/x86/kernel/cpu/pqr_common.c
> >
> > --
> > 2.8.0.rc3.226.g39d4020
> >
> >