Re: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems

From: Moger, Babu
Date: Fri Jun 14 2024 - 17:29:44 EST


Hi Reinette,

On 6/14/2024 11:46 AM, Reinette Chatre wrote:
Hi Babu,

On 6/14/24 9:27 AM, Moger, Babu wrote:
Hi Reinette,

On 6/13/24 15:32, Reinette Chatre wrote:
Hi Babu,

On 6/13/24 12:17 PM, Moger, Babu wrote:
I may be little bit out of sync here. Also, sorry to come back late in the
series.

Looking at the series again, I see this approach adds lots of code.
Look at this structure.


@@ -187,10 +196,12 @@ struct rdt_resource {
       bool            alloc_capable;
       bool            mon_capable;
       int            num_rmid;
-    enum resctrl_scope    scope;
+    enum resctrl_scope    ctrl_scope;
+    enum resctrl_scope    mon_scope;
       struct resctrl_cache    cache;
       struct resctrl_membw    membw;
-    struct list_head    domains;
+    struct list_head    ctrl_domains;
+    struct list_head    mon_domains;
       char            *name;
       int            data_width;
       u32            default_ctrl;

There are two scope fields.
There are two domains fields.

These are very confusing and very hard to maintain. Also, I am not sure if
these fields are useful for anything other than SNC feature. This approach
adds quite a bit of code for no specific advantage.

Why don't we just split the RDT_RESOURCE_L3 resource
into separate resources, one for control, one for monitoring.
We already have "control" only resources (MBA, SMBA, L2). Lets create new
"monitor" only resource. I feel it will be much cleaner approach.

Tony has already tried that approach and showed that it is much simpler.

v15-RFC :
https://lore.kernel.org/lkml/20240130222034.37181-1-tony.luck@xxxxxxxxx/

What do you think?


Some highlights of my thoughts in response to that series, but the whole
thread
may be of interest to you:
https://lore.kernel.org/lkml/78c88c6d-2e8d-42d1-a6f2-1c5ac38fb258@xxxxxxxxx/
https://lore.kernel.org/lkml/59944211-d34a-4ba3-a1de-095822c0b3f0@xxxxxxxxx/


Went through the thread, in summary:

The main concerns are related to duplication of code and data structures.

The solutions are

a) Split the domains.
This is what this series is doing now. This creates members like
ctrl_scope, mon_scope, ctrl_domains etc.. These fields are added to all
the resources (MBA, SMBA and L2). Then there is additional domain header.


b) Split the resource.
   Split RDT_RESOURCE_L3 into two, one for "monitor" and one for "control".
   There will be one domain structure for "monitor" and  one for "control"

Both these approaches have code and data duplication. So, there is no
difference that way.

Could you please elaborate where code and data duplication of (a) is?

We have ctrl_scope, mon_scope, ctrl_domains. mon_domains. Only one resource, RDT_RESOURCE_L3 is going to use these fields. Rest of the resources don't need these fields. But these fields are part of all the resources.

I am not too worried about the size of the patch. But, I don't foresee these fields will be used anytime soon in these resources(MBA. L3. SMBA). Why add it now? In future we may have to cleanup all these anyways.



But complexity wise, approach (a) adds quite bit of complexity. Doesn't it?

"complex" is a subjective term. Could you please elaborate what is complex
about this? Is your concern about the size of the patch? To me that is
not a concern when considering the end result of how the resctrl structures
are organized.


For me, solution (b) looks simple and easy. Eventually, we may have to
restructure these data structures anyways. I feel it is the right direction.


I understand that it is tempting to look for smallest patch possible but we
really need to ensure that any work integrates well into resctrl. Doing
so may end up with larger patches but in the end it makes the data structures
and code easier to understand. I specifically find the duplication of structures
troublesome since that requires developers to always be on high alert of
what code is being worked on and what flows the particular code participates in
since the duplication results in members of structures be invalid based on which
code flow is used. To me this is an unnecessary burden on developers and against
your original goal of making resctrl easier to maintain.

Reinette

--
- Babu Moger