Re: [PATCH v13 00/32] x86,fs/resctrl telemetry monitoring
From: Moger, Babu
Date: Wed Dec 17 2025 - 13:23:27 EST
Hi Tony,
On 12/16/2025 6:28 PM, Luck, Tony wrote:
On Wed, Nov 05, 2025 at 09:33:56AM -0600, Moger, Babu wrote:
Hi Tony,
On 10/29/2025 1:59 PM, Luck, Tony wrote:
I took a stab at applying the AET patches on tob of Babu's v10
SDCIAE series https://lore.kernel.org/all/cover.1761090859.git.babu.moger@xxxxxxx/
There are only a couple of easy to resolve conflicts.
I pushed a branch with v6.18-rc3 + SDCIAE + AET here:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git sdciae-aet
I ran your code on my AMD system. It appears to be working fine.
I don't have the hardware that supports these features. It would be helpful
to list the resctrl interface files (files in info directory and files each
group) and give an example of how it looks in the system that supports these
features. It can be in cover letter or resctrl.rst file as well. It will
also help to review the code.
Reinette reminded me that I didn't provide the examples that I promised.
Here's what I plan to add to the v17 cover letter. Is this enough
detail?
-Tony
Examples:
--------
As with other resctrl monitoring features first create CTRL_MON or MON
directories and assign the tasks of interest to the group.
Energy events:
-------------
There are two events associated with energy consumption in the core.
The "core_energy" event reports out directly in Joules. To compute
power just take the difference between two samples and divide by the
time between them. E.g.
$ cat core_energy; sleep 10; cat core_energy
Please use the full path (/sys/fs/resctrl/test/mon_data/xx/<file_name>).
Otherwise looks good to me.
Thanks
Babu
94499439.510380
94499607.019680
$ bc -q
scale=3
(94499607.019680 - 94499439.510380) / 10
16.750
So 16.75 Watts in this example.
Note that different runs of the same workload may report different
energy consumption. This happens when cores shift to different
voltage/frequency profiles due to overall system load.
The "activity" event reports energy usage in a manner independent
of voltage and frequency. This may be useful for developers to
assess how modifications to a program (e.g. attaching to a library
optimized to use AVX instructions) affect energy consumption. So
read the "activity" at the start and end of program execution and
compute the difference.
Perf events:
-----------
The other telemetry events largely duplicate events available using
"perf", but avoid of reading the perf counters on every context switch.
This may be a significant improvement when monitoring highly multi-threaded
applications. E.g. to find the ratio of core cycles to reference cycles:
$ cat unhalted_core_cycles unhalted_ref_cycles
1312249223146571
1660157011698276
$ { run application here }
$ cat unhalted_core_cycles unhalted_ref_cycles
1313573565617233
1661511224019444
$ bc -q
scale = 3
(1661511224019444 - 1660157011698276) / (1313573565617233 - 1312249223146571)
1.022