Re: [PATCH 7/7] x86/intel_rdt: Add CAT documentation and usage guide
From: Marcelo Tosatti
Date: Wed Mar 25 2015 - 18:40:16 EST
On Thu, Mar 12, 2015 at 04:16:07PM -0700, Vikas Shivappa wrote:
> This patch adds a description of Cache allocation technology, overview
> of kernel implementation and usage of CAT cgroup interface.
>
> Signed-off-by: Vikas Shivappa <vikas.shivappa@xxxxxxxxxxxxxxx>
> ---
> Documentation/cgroups/rdt.txt | 183 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 183 insertions(+)
> create mode 100644 Documentation/cgroups/rdt.txt
>
> diff --git a/Documentation/cgroups/rdt.txt b/Documentation/cgroups/rdt.txt
> new file mode 100644
> index 0000000..98eb4b8
> --- /dev/null
> +++ b/Documentation/cgroups/rdt.txt
> @@ -0,0 +1,183 @@
> + RDT
> + ---
> +
> +Copyright (C) 2014 Intel Corporation
> +Written by vikas.shivappa@xxxxxxxxxxxxxxx
> +(based on contents and format from cpusets.txt)
> +
> +CONTENTS:
> +=========
> +
> +1. Cache Allocation Technology
> + 1.1 What is RDT and CAT ?
> + 1.2 Why is CAT needed ?
> + 1.3 CAT implementation overview
> + 1.4 Assignment of CBM and CLOS
> + 1.5 Scheduling and Context Switch
> +2. Usage Examples and Syntax
> +
> +1. Cache Allocation Technology(CAT)
> +===================================
> +
> +1.1 What is RDT and CAT
> +-----------------------
> +
> +CAT is a part of Resource Director Technology(RDT) or Platform Shared
> +resource control which provides support to control Platform shared
> +resources like cache. Currently Cache is the only resource that is
> +supported in RDT.
> +More information can be found in the Intel SDM section 17.15.
> +
> +Cache Allocation Technology provides a way for the Software (OS/VMM)
> +to restrict cache allocation to a defined 'subset' of cache which may
> +be overlapping with other 'subsets'. This feature is used when
> +allocating a line in cache ie when pulling new data into the cache.
> +The programming of the h/w is done via programming MSRs.
> +
> +The different cache subsets are identified by CLOS identifier (class
> +of service) and each CLOS has a CBM (cache bit mask). The CBM is a
> +contiguous set of bits which defines the amount of cache resource that
> +is available for each 'subset'.
> +
> +1.2 Why is CAT needed
> +---------------------
> +
> +The CAT enables more cache resources to be made available for higher
> +priority applications based on guidance from the execution
> +environment.
> +
> +The architecture also allows dynamically changing these subsets during
> +runtime to further optimize the performance of the higher priority
> +application with minimal degradation to the low priority app.
> +Additionally, resources can be rebalanced for system throughput
> +benefit. (Refer to Section 17.15 in the Intel SDM)
> +
> +This technique may be useful in managing large computer systems which
> +large LLC. Examples may be large servers running instances of
> +webservers or database servers. In such complex systems, these subsets
> +can be used for more careful placing of the available cache
> +resources.
> +
> +The CAT kernel patch would provide a basic kernel framework for users
> +to be able to implement such cache subsets.
> +
> +1.3 CAT implementation Overview
> +-------------------------------
> +
> +Kernel implements a cgroup subsystem to support cache allocation.
> +
> +Each cgroup has a CLOSid <-> CBM(cache bit mask) mapping.
> +A CLOS(Class of service) is represented by a CLOSid.CLOSid is internal
> +to the kernel and not exposed to user. Each cgroup would have one CBM
> +and would just represent one cache 'subset'.
> +
> +The cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
> +cgroup never fails. When a child cgroup is created it inherits the
> +CLOSid and the CBM from its parent. When a user changes the default
> +CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
> +used before. The changing of 'cbm' may fail with -ERRNOSPC once the
> +kernel runs out of maximum CLOSids it can support.
> +User can create as many cgroups as he wants but having different CBMs
> +at the same time is restricted by the maximum number of CLOSids
> +(multiple cgroups can have the same CBM).
> +Kernel maintains a CLOSid<->cbm mapping which keeps reference counter
> +for each cgroup using a CLOSid.
> +
> +The tasks in the cgroup would get to fill the LLC cache represented by
> +the cgroup's 'cbm' file.
> +
> +Root directory would have all available bits set in 'cbm' file by
> +default.
> +
> +1.4 Assignment of CBM,CLOS
> +--------------------------
> +
> +The 'cbm' needs to be a subset of the parent node's 'cbm'.
> +Any contiguous subset of these bits(with a minimum of 2 bits) maybe
> +set to indicate the cache mapping desired. The 'cbm' between 2
> +directories can overlap. The 'cbm' would represent the cache 'subset'
> +of the CAT cgroup. For ex: on a system with 16 bits of max cbm bits,
> +if the directory has the least significant 4 bits set in its 'cbm'
> +file(meaning the 'cbm' is just 0xf), it would be allocated the right
> +quarter of the Last level cache which means the tasks belonging to
> +this CAT cgroup can use the right quarter of the cache to fill. If it
> +has the most significant 8 bits set ,it would be allocated the left
> +half of the cache(8 bits out of 16 represents 50%).
> +
> +The cache portion defined in the CBM file is available to all tasks
> +within the cgroup to fill and these task are not allowed to allocate
> +space in other parts of the cache.
Is there a reason to expose the hardware interface rather
than ratios to userspace ?
Say, i'd like to allocate 20% of L3 cache to cgroup A,
80% to cgroup B.
Well, you'd have to expose the shared percentages between
any two cgroups (that information is there in the
cbm bitmaps, but not in "ratios").
One problem i see with exposing cbm bitmasks is that on hardware
updates that change cache size or bitmask length, userspace must
recalculate the bitmaps.
Another is that its vendor dependant, while ratios (plus shared
information for two given cgroups) is not.
> +
> +1.5 Scheduling and Context Switch
> +---------------------------------
> +
> +During context switch kernel implements this by writing the
> +CLOSid (internally maintained by kernel) of the cgroup to which the
> +task belongs to the CPU's IA32_PQR_ASSOC MSR. The MSR is only written
> +when there is a change in the CLOSid for the CPU in order to minimize
> +the latency incurred during context switch.
> +
> +2. Usage examples and syntax
> +============================
> +
> +To check if CAT was enabled on your system
> +
> +dmesg | grep -i intel_rdt
> +should output : intel_rdt: cbmlength:xx, Closs:xx
> +the length of cbm and CLOS should depend on the system you use.
> +
> +
> +Following would mount the cache allocation cgroup subsystem and create
> +2 directories. Please refer to Documentation/cgroups/cgroups.txt on
> +details about how to use cgroups.
> +
> + cd /sys/fs/cgroup
> + mkdir rdt
> + mount -t cgroup -ordt rdt /sys/fs/cgroup/rdt
> + cd rdt
> +
> +Create 2 rdt cgroups
> +
> + mkdir group1
> + mkdir group2
> +
> +Following are some of the Files in the directory
> +
> + ls
> + rdt.cbm
> + tasks
> +
> +Say if the cache is 2MB and cbm supports 16 bits, then setting the
> +below allocates the 'right 1/4th(512KB)' of the cache to group2
> +
> +Edit the CBM for group2 to set the least significant 4 bits. This
> +allocates 'right quarter' of the cache.
> +
> + cd group2
> + /bin/echo 0xf > cat.cbm
> +
> +
> +Edit the CBM for group2 to set the least significant 8 bits.This
> +allocates the right half of the cache to 'group2'.
> +
> + cd group2
> + /bin/echo 0xff > rdt.cbm
> +
> +Assign tasks to the group2
> +
> + /bin/echo PID1 > tasks
> + /bin/echo PID2 > tasks
> +
> + Meaning now threads
> + PID1 and PID2 get to fill the 'right half' of
> + the cache as the belong to cgroup group2.
> +
> +Create a group under group2
> +
> + cd group2
> + mkdir group21
> + cat rdt.cbm
> + 0xff - inherits parents mask.
> +
> + /bin/echo 0xfff > rdt.cbm - throws error as mask has to parent's mask's subset
> +
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/