Re: RFC rdma cgroup
From: Tejun Heo
Date: Tue Nov 24 2015 - 10:48:08 EST
Hello, chiming in late.
On Wed, Oct 28, 2015 at 01:59:15PM +0530, Parav Pandit wrote:
> Design guidelines:
> -----------------------
> 1. There will be new rdma cgroup for accounting rdma resources
> (instead of extending device cgroup).
> Rationale: RDMA tracks different type of resources and it functions
> differently than device cgroup. Though device cgroup could have been
> extended for more generic nature, community feels that its better to
> create RDMA cgroup, which might have more features than just resource
> limit enforcement in future.
Yeap, it should definitely be separate from device cgroup.
> 2. RDMA cgroup will allow resource accounting, limit enforcement on
> per cgroup, per rdma device basis (instead of resource limiting across
> all devices).
> Rationale: this give granular control when multiple devices exist in the system.
>
> 3. Resources are not defined by the RDMA cgroup. Resources are defined
> by RDMA/IB subsystem and optionally by HCA vendor device drivers.
> Rationale: This allows rdma cgroup to remain constant while RDMA/IB
> subsystem can evolve without the need of rdma cgroup update. A new
> resource can be easily added by the RDMA/IB subsystem without touching
> rdma cgroup.
I'm *extremely* uncomfortable with this. Drivers for this sort of
higher end devices tend to pull a lot of stunts for better or worse
and my gut feeling is that letting low level drivers run free with
resource definition is highly likely to lead to an unmanageable mess
in the long run. I'd strongly urge to gather consensus on what the
resources should be across the board.
> Design:
> ---------
> 8. Typically each RDMA cgroup will have 0 to 4 RDMA devices. Therefore
> each cgroup will have 0 to 4 verbs resource pool and optionally 0 to 4
> hw resource pool per such device.
> (Nothing stops to have more devices and pools, but design is around
> this use case).
Heh, 4 seems like an arbitrary number. idk, it feels weird to bake in
a number like 4 into the design.
> 9. Resource pool object is created in following situations.
> (a) administrative operation is done to set the limit and no previous
> resource pool exist for the device of interest for the cgroup.
> (b) no resource limits were configured, but IB/RDMA subsystem tries to
> charge the resource. so that when applications are running without
> limits and later on when limits are enforced, during uncharging, it
> correctly uncharges them, otherwise usage count will drop to negative.
> This is done using default resource pool.
> Instead of implementing any sort of time markers, default pool
> simplifies the design.
So, the usual way to deal with this is that the root cgroup is exempt
from accounting and each resource tracks where they were charged to
and frees that cgroup on release. IOW, asssociate on charge and
maintain the association till release.
For interface details, please refer to the following documentation.
https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup.txt?h=for-4.5
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/