Re: [PATCHv12 1/3] rdmacg: Added rdma cgroup controller

From: Parav Pandit
Date: Wed Sep 07 2016 - 03:55:22 EST


Hi Matan,

On Thu, Sep 1, 2016 at 2:14 PM, Christoph Hellwig <hch@xxxxxx> wrote:
> On Thu, Sep 01, 2016 at 10:25:40AM +0300, Matan Barak wrote:
>> Well, if I recall, the reason doing so last time was in order to allow
>> flexible updating of ib_core independently, which is obviously not a good
>> reason (to say the least).
>>
>> Since the new ABI will probably define new object types (all recent
>> proposals go this way), the current approach could lead to either trying to
>> map new objects to existing cgroup resource types, which could lead to some
>> weird non 1:1 mapping, or having a split definitions - such that each
>> driver will declare its objects both in the cgroups mechanism and in its
>> driver dispatch table.

>> Even worse than that, drivers could simply ignore the cgroups support while
>> implementing their own resource types and get a very broken containers
>> support.
If drivers are broken due to ignorance of not-calling cgroup APIs,
that should be ok.
That particular driver should fix it.
If the resource creation using uverbs is using well defined rdma level
resource, uverbs level will make sure to honor cgroup limits without
involving hw drivers anyway.

RDMA Verb level resource is charged/uncharged by RDMA core.
RDMA HW level resource is charged/uncharged by RDMA HW driver using
well defined API and resource by cgroup core.
This scheme ensures that there is 1:1 mapping.

I don't think current definition of resource type is carved out on stone.
They can be extended as we forward.
As many of us agree that, they should be well defined and it should be
agreed by cgroup and rdma community.

For example, today we have RDMA_VERB_xxx resources.
New well defined RDMA HW resources can be defined in rdma_cgroup.h
file as RDMA_HW_xx in same enum table.

>
> Sorry guys, but arbitrary extensibility for something not finished is the
> worst idea ever. We have two options here:
>
> a) delay cgroups support until the grand rewrite is done
> b) add it now and deal with the consequences later
>
Can we do (b) now and differ adding any HW resources to cgroup until
they are clearly called out.
Architecture and APIs are already in place to support this.

> That being said, adding random non-Verbs hardwasre to the RDMA core is
> the second worst idea ever.

We can differ adding HW resource to core and cgroup until they are
clearly defined.
In that case current architecture still holds good.

> Guess I need to catch up with the
> discussion and start using the flame thrower.

Matan,
Can you please point us to the new RFC/ABI email thread which
describes the design, partitioning of code between core vs hw drivers
etc.