Re: resctrl2 - status

From: Moger, Babu
Date: Fri Sep 08 2023 - 17:35:17 EST


Hi Tony,


On 9/8/2023 1:51 PM, Luck, Tony wrote:
Can you try this out on an AMD system. I think I covered most of the
existing AMD resctrl features, but I have no machine to test the code
on, so very likely there are bugs in these code paths.

I'd like to make any needed changes now, before I start breaking this
into reviewable bite-sized patches to avoid too much churn.
I tried your latest code briefly on my system. Unfortunately, I could
not get it to work on my AMD system.

# git branch -a
next
* resctrl2_v65
# ]# uname -r
6.5.0+
#lsmod |grep rdt
rdt_show_ids 12288 0
rdt_mbm_local_bytes 12288 0
rdt_mbm_total_bytes 12288 0
rdt_llc_occupancy 12288 0
rdt_l3_cat 16384 0

# lsmod |grep mbe
amd_mbec 16384 0

I could not get rdt_l3_mba

# modprobe rdt_l3_mba
modprobe: ERROR: could not insert 'rdt_l3_mba': No such device

I don't see any data for the default group either.

mount -t resctrl resctrl /sys/fs/resctrl/

cd /sys/fs/resctrl/mon_data/mon_L3_00

cat mbm_summary
n/a n/a /
Babu,

Thank a bunch for taking this for a quick spin. There's several bits of
good news there. Several modules automatically loaded as expected.
Nothing went "OOPS" and crashed the system.

Here’s the code that the rdt_l3_mba module runs that can cause failure
to load with "No such device"

if (!boot_cpu_has(X86_FEATURE_RDT_A)) {
pr_debug("No RDT allocation support\n");
return -ENODEV;
}

Shouldn't this be ?(or similar)

if (!rdt_cpu_has(X86_FEATURE_MBA))
                return false;

mba_features = cpuid_ebx(0x10);

if (!(mba_features & BIT(3))) {
pr_debug("No RDT MBA allocation\n");
return -ENODEV;
}

I assume the first test must have succeeded (same code in rdt_l3_cat, and
that loaded OK). So must be the second. How does AMD enumerate MBA
support?

Less obvious what is the root cause of the mbm_summary file to fail to
show any data. rdt_mbm_local_bytes and rdt_mbm_total_bytes modules
loaded OK. So I'm looking for the right CPUID bits to detect memory bandwidth
monitoring.

I am still not sure if resctrl2 will address all the current gaps in resctrl1. We should probably list all issues on the table before we go that route.

One of the main issue for AMD is coupling of LLC domains.

For example, AMD hardware supports 16 CLOSids per LLC domain. But Linux design assumes that there are globally 16 total CLOSIDs for the whole systems. We can only create 16 CLOSID now irrespective of how many domains are there.

In reality, we should be able to create "16 x number of LLC domains" CLOSIDS in the systems.  This is more evident in AMD. But, same problem applies to Intel with multiple sockets.

My 02 cents. Hope to discuss more in our upcoming meeting.

thanks