[PATCHv1 6/6] rdmacg: Added documentation for rdma controller.

From: Parav Pandit
Date: Tue Jan 05 2016 - 14:01:22 EST


Added documentation for rdma controller to use in legacy mode and
using new unified hirerchy.

Signed-off-by: Parav Pandit <pandit.parav@xxxxxxxxx>
---
Documentation/cgroup-legacy/rdma.txt | 129 +++++++++++++++++++++++++++++++++++
Documentation/cgroup.txt | 79 +++++++++++++++++++++
2 files changed, 208 insertions(+)
create mode 100644 Documentation/cgroup-legacy/rdma.txt

diff --git a/Documentation/cgroup-legacy/rdma.txt b/Documentation/cgroup-legacy/rdma.txt
new file mode 100644
index 0000000..70626c5
--- /dev/null
+++ b/Documentation/cgroup-legacy/rdma.txt
@@ -0,0 +1,129 @@
+ RDMA Resource Controller
+ ------------------------
+
+Contents
+--------
+
+1. Overview
+ 1-1. What is RDMA resource controller?
+ 1-2. Why RDMA resource controller needed?
+ 1-3. How is RDMA resource controller implemented?
+2. Usage Examples
+
+1. Overview
+
+1-1. What is RDMA resource controller?
+-------------------------------------
+
+RDMA resource controller allows user to limit RDMA/IB specific resources
+that a given set of processes can use. These processes are grouped using
+RDMA resource controller.
+
+RDMA resource controller currently allows two different type of resource
+pools.
+(a) RDMA IB specification level verb resources defined by IB stack
+(b) HCA vendor device specific resources
+
+RDMA resource controller controller allows maximum of upto 64 resources in
+a resource pool which is the internal construct of rdma cgroup explained
+at later part of this document.
+
+1-2. Why RDMA resource controller needed?
+----------------------------------------
+
+Currently user space applications can easily take away all the rdma device
+specific resources such as AH, CQ, QP, MR etc. Due to which other applications
+in other cgroup or kernel space ULPs may not even get chance to allocate any
+rdma resources. This leads to service unavailability.
+
+Therefore RDMA resource controller is needed through which resource consumption
+of processes can be limited. Through this controller various different rdma
+resources described by IB uverbs layer and any HCA vendor driver can be
+accounted.
+
+1-3. How is RDMA resource controller implemented?
+------------------------------------------------
+
+rdma cgroup allows limit configuration of resources. These resources are not
+defined by the rdma controller. Instead they are defined by the IB stack
+and HCA device drivers(optionally).
+This provides great flexibility to allow IB stack to define new resources,
+without any changes to rdma cgroup.
+Rdma cgroup maintains resource accounting per cgroup, per device, per resource
+type using resource pool structure. Each such resource pool is limited up to
+64 resources in given resource pool by rdma cgroup, which can be extended
+later if required.
+
+This resource pool object is linked to the cgroup css. Typically there
+are 0 to 4 resource pool instances per cgroup, per device in most use cases.
+But nothing limits to have it more. At present hundreds of RDMA devices per
+single cgroup may not be handled optimally, however there is no known use case
+for such configuration either.
+
+Since RDMA resources can be allocated from any process and can be freed by any
+of the child processes which shares the address space, rdma resources are
+always owned by the creator cgroup css. This allows process migration from one
+to other cgroup without major complexity of transferring resource ownership;
+because such ownership is not really present due to shared nature of
+rdma resources. Linking resources around css also ensures that cgroups can be
+deleted after processes migrated. This allow progress migration as well with
+active resources, even though thatâs not the primary use case.
+
+Finally mapping of the resource owner pid to cgroup is maintained using
+simple hash table to perform quick look-up during resource charing/uncharging
+time.
+
+Resource pool object is created in following situations.
+(a) User sets the limit and no previous resource pool exist for the device
+of interest for the cgroup.
+(b) No resource limits were configured, but IB/RDMA stack tries to
+charge the resource. So that it correctly uncharge them when applications are
+running without limits and later on when limits are enforced during uncharging,
+otherwise usage count will drop to negative. This is done using default
+resource pool. Instead of implementing any sort of time markers, default pool
+simplifies the design.
+
+Resource pool is destroyed if it was of default type (not created
+by administrative operation) and itâs the last resource getting
+deallocated. Resource pool created as administrative operation is not
+deleted, as itâs expected to be used in near future.
+
+If user setting tries to delete all the resource limit
+with active resources per device, RDMA cgroup just marks the pool as
+default pool with maximum limits for each resource, otherwise it deletes the
+default resource pool.
+
+2. Usage Examples
+-----------------
+
+(a) List available RDMA verb level resources:
+
+#cat /sys/fs/cgroup/rdma/1/rdma.resource.verb.list
+Output:
+mlx4_0 uctx ah pd mr srq qp flow
+
+(b) Configure resource limit:
+echo mlx4_0 mr=100 qp=10 ah=2 > /sys/fs/cgroup/rdma/1/rdma.resource.verb.limit
+echo ocrdma1 mr=120 qp=20 cq=10 > /sys/fs/cgroup/rdma/2/rdma.resource.verb.limit
+
+(c) Query resource limit:
+cat /sys/fs/cgroup/rdma/2/rdma.resource.verb.limit
+#Output:
+mlx4_0 mr=100 qp=10 ah=2
+ocrdma1 mr=120 qp=20 cq=10
+
+(d) Query current usage:
+cat /sys/fs/cgroup/rdma/2/rdma.resource.verb.usage
+#Output:
+mlx4_0 mr=95 qp=8 ah=2
+ocrdma1 mr=0 qp=20 cq=10
+
+(e) Delete resource limit:
+echo mlx4_0 remove > /sys/fs/cgroup/rdma/1/rdma.resource.verb.limit
+
+(f) List available HCA HW specific resources: (optional)
+cat /sys/fs/cgroup/rdma/1/rdma.hw.verb.list
+vendor1 hw_qp hw_cq hw_timer
+
+(g) Configure hw specific resource limit:
+echo vendor1 hw_qp=56 > /sys/fs/cgroup/rdma/2/rdma.resource.hw.limit
diff --git a/Documentation/cgroup.txt b/Documentation/cgroup.txt
index 983ba63..57eb59c 100644
--- a/Documentation/cgroup.txt
+++ b/Documentation/cgroup.txt
@@ -47,6 +47,8 @@ CONTENTS
5-3. IO
5-3-1. IO Interface Files
5-3-2. Writeback
+ 5-4. RDMA
+ 5-4-1. RDMA Interface Files
6. Namespace
6-1. Basics
6-2. The Root and Views
@@ -1017,6 +1019,83 @@ writeback as follows.
total available memory and applied the same way as
vm.dirty[_background]_ratio.

+5-4. RDMA
+
+The "rdma" controller regulates the distribution of RDMA resources.
+This controller implements both RDMA/IB verb level and RDMA HCA
+driver level resource distribution.
+
+5-4-1. RDMA Interface Files
+
+ rdma.resource.verb.list
+
+ A read-only file that exists for all the cgroups that describes
+ which all verb specific resources of a given device can be
+ distributed and accounted.
+
+ Lines are keyed by device name and are not ordered.
+ Each line contains space separated resource name that can be
+ distributed.
+
+ An example for mlx4_0 device follows.
+
+ mlx4_0 ah cq pd mr qp flow srq
+
+ rdma.resource.verb.limit
+ A readwrite file that exists for all the cgroups that describes
+ current configured verbs resource limit for a RDMA/IB device.
+
+ Lines are keyed by device name and are not ordered.
+ Each line contains space separated resource name and its configured
+ limit that can be distributed.
+
+ An example for mlx4 and ocrdma device follows.
+
+ mlx4_0 mr=1000 qp=104 ah=2
+ ocrdma1 mr=900 qp=89 cq=10
+
+ rdma.resource.verb.usage
+ A read-only file that describes current resource usage.
+ It exists for all the cgroup including root.
+
+ An example for mlx4 and ocrdma device follows.
+
+ mlx4_0 mr=1000 qp=102 ah=2
+ ocrdma1 mr=900 qp=79 cq=10
+
+ rdma.resource.verb.failcnt
+ A read-only file that describes resource allocation failure
+ count for a given resource type of a particular device.
+ It exists for all the cgroup including root.
+
+ An example for mlx4 and ocrdma device follows.
+
+ mlx4_0 mr=0 qp=1 ah=1
+ ocrdma1 mr=2 qp=1 cq=1
+
+ rdma.resource.hw.list
+
+ A read-only file that exists for all the cgroups that describes
+ which all HCA hardware specific resources of a given device can be
+ distributed and accounted.
+
+ rdma.resource.hw.limit
+ A readwrite file that exists for all the cgroups that describes
+ current configured HCA hardware resource limit for a RDMA/IB device.
+
+ Lines are keyed by device name and are not ordered.
+ Each line contains space separated resource name and its configured
+ limit that can be distributed.
+
+ rdma.resource.hw.usage
+ A read-only file that describes current resource usage.
+ It exists for all the cgroup including root.
+
+ rdma.resource.hw.failcnt
+ A read-only file that describes HCA hardware resource
+ allocation failure count for a given resource type of
+ a particular device.
+ It exists for all the cgroup including root.

6. Namespace

--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/