Re: [PATCH v4 0/8] Support L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE)

From: Moger, Babu
Date: Fri May 02 2025 - 20:53:54 EST


Hi Reinette,

Thanks for quick turnaround.

On 5/2/2025 4:20 PM, Reinette Chatre wrote:
Hi Babu,

On 4/21/25 3:43 PM, Babu Moger wrote:
# Linux Implementation

Feature adds following interface files when the resctrl "io_alloc" feature is
supported on L3 resource:

/sys/fs/resctrl/info/L3/io_alloc: Report the feature status. Enable/disable the
feature by writing to the interface.

/sys/fs/resctrl/info/L3/io_alloc_cbm: List the Capacity Bit Masks (CBMs) available
for I/O devices when io_alloc feature is enabled.
Configure the CBM by writing to the interface.

# Examples:

a. Check if io_alloc feature is available
#mount -t resctrl resctrl /sys/fs/resctrl/

# cat /sys/fs/resctrl/info/L3/io_alloc
disabled

b. Enable the io_alloc feature.

# echo 1 > /sys/fs/resctrl/info/L3/io_alloc
# cat /sys/fs/resctrl/info/L3/io_alloc
enabled

c. Check the CBM values for the io_alloc feature.

# cat /sys/fs/resctrl/info/L3/io_alloc_cbm
L3:0=ffff;1=ffff

d. Change the CBM value for the domain 1:
# echo L3:1=FF > /sys/fs/resctrl/info/L3/io_alloc_cbm

# cat /sys/fs/resctrl/info/L3/io_alloc_cbm
L3:0=ffff;1=00ff

d. Disable io_alloc feature and exit.

# echo 0 > /sys/fs/resctrl/info/L3/io_alloc
# cat /sys/fs/resctrl/info/L3/io_alloc
disabled

#umount /sys/fs/resctrl/


From what I can tell the interface when CDP is enabled will look
as follows:

# mount -o cdp -t resctrl resctrl /sys/fs/resctrl/
# cat /sys/fs/resctrl/info/L3CODE/io_alloc
disabled
# cat /sys/fs/resctrl/info/L3DATA/io_alloc
not supported
"io_alloc" can thus be enabled for L3CODE but not for L3DATA.
This is unexpected considering the feature is called
"L3 Smart *Data* Cache Injection Allocation Enforcement".

I understand that the interface evolved into this because the
"code" allocation of CDP uses the CLOSID required by SDCIAE but I think
leaking implementation details like this to the user interface can
cause confusion.

Since there is no distinction between code and data in these
IO allocations, what do you think of connecting the io_alloc and
io_alloc_cbm files within L3CODE and L3DATA so that the user can
read/write from either with a read showing the same data and
user able to write to either? For example,

# mount -o cdp -t resctrl resctrl /sys/fs/resctrl/
# cat /sys/fs/resctrl/info/L3CODE/io_alloc
disabled
# cat /sys/fs/resctrl/info/L3DATA/io_alloc
disabled
# echo 1 > /sys/fs/resctrl/info/L3CODE/io_alloc
# cat /sys/fs/resctrl/info/L3CODE/io_alloc
enabled
# cat /sys/fs/resctrl/info/L3DATA/io_alloc
enabled
# cat /sys/fs/resctrl/info/L3DATA/io_alloc_cbm
0=ffff;1=ffff
# cat /sys/fs/resctrl/info/L3CODE/io_alloc_cbm
0=ffff;1=ffff
# echo 1=FF > /sys/fs/resctrl/info/L3DATA/io_alloc_cbm
# cat /sys/fs/resctrl/info/L3DATA/io_alloc_cbm
0=ffff;1=00ff
# cat /sys/fs/resctrl/info/L3CODE/io_alloc_cbm
0=ffff;1=00ff

I agree. There is no right or wrong here. It can be done this way like you mentioned above. But I am not sure if will clear the confusion.

We have already added the text in user doc (also spec says the same).

"On AMD systems, the io_alloc feature is supported by the L3 Smart
Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
io_alloc is determined by the highest CLOSID supported by the resource.
When CDP is enabled, io_alloc routes I/O traffic using the highest
CLOSID allocated for the instruction cache (L3CODE).

Dont you think this text might clear the confusion? We can add examples also if that makes it even more clear.

(Note in above I removed the resource name from io_alloc_cbm to match
what was discussed during previous version:
https://lore.kernel.org/lkml/251c8fe1-603f-4993-a822-afb35b49cdfa@xxxxxxx/ )
What do you think?

Yes. I remember. "Kept the resource name while printing the CBM for io_alloc, so we dont have to change show_doms() just for this feature and it is consistant across all the schemata display.

I added the note in here.
https://lore.kernel.org/lkml/784fbc61e02e9a834473c3476ee196ef6a44e338.1745275431.git.babu.moger@xxxxxxx/

I will change it if you feel strongly about it. We will have to change show_doms() to handle this.


---
v4: The "io_alloc" interface will report "enabled/disabled/not supported"
instead of 0 or 1..

Updated resctrl_io_alloc_closid_get() to verify the max closid availability
using closids_supported().

Updated the documentation for "shareable_bits" and "bit_usage".

NOTE: io_alloc is about specific CLOS. rdt_bit_usage_show() is not designed
handle bit_usage for specific CLOS. Its about overall system. So, we cannot
really tell the user which CLOS is shared across both hardware and software.

"bit_usage" is not about CLOS but how the resource is used. Per the doc:

"bit_usage":
Annotated capacity bitmasks showing how all
instances of the resource are used.

The key here is the CBM, not CLOS. For each bit in the *CBM* "bit_usage" shows
how that portion of the cache is used with the legend documented in
Documentation/arch/x86/resctrl.rst.

Consider a system with the following allocations:
# cat /sys/fs/resctrl/schemata
L3:0=0ff0

This is CLOS 0.

# cat /sys/fs/resctrl/info/L3/io_alloc_cbm
0=ff00

This is CLOS 15.


Then "bit_usage" will look like:

# cat /sys/fs/resctrl/info/L3/bit_usage
0=HHHHXXXXSSSS0000

It is confusing here. To make it clear we may have to print all the CLOSes in each domain.

# cat /sys/fs/resctrl/info/L3/bit_usage
DOM0=CLOS0:SSSSSSSSSSSSSSSS;... ;CLOS15=HHHHXXXXSSSS0000;
DOM1=CLOS0:SSSSSSSSSSSSSSSS;... ;CLOS15=HHHHXXXXSSSS0000


"bit_usage" shows how the cache is being used. It shows that the portion of cache represented
by first four bits of CBM is unused, portion of cache represented by bits 4 to 7 of CBM is
only used by software, portion of cache represented by bits 8 to 11 of CBM is shared between
software and hardware, portion of cache represented by bits 12 to 15 is only used by hardware.

This is something we need to discuss.

Looking at implementation in patch #5 the "io_alloc_cbm" bits of CBM are presented
as software bits, since "io_alloc_cbm" represents IO from devices it should be "hardware" bits
(hw_shareable), no?

Yes. It is. But logic is bit different there.

It loops thru all the CLOSes on the domain. So, it will print again like this below.

#cat bit_usage
0=HHHHXXXXSSSS0000

It tells the user that all the CLOSes in domain 0 has this sharing propery which is not correct.

To make it clear we really need to print every CLOS here. What do you think?

Thanks
Babu