Re: [PATCH v2 00/13] Tool and hwmon PMUs
From: Ian Rogers
Date: Tue Oct 01 2024 - 00:56:55 EST
On Sun, Sep 29, 2024 at 12:21 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> On Fri, Sep 27, 2024 at 11:09:49AM -0700, Ian Rogers wrote:
> > On Fri, Sep 27, 2024 at 10:22 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Sep 26, 2024 at 12:47:26PM -0700, Ian Rogers wrote:
> > > > On Fri, Sep 13, 2024 at 7:34 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Sep 12, 2024 at 3:50 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Thu, Sep 12, 2024 at 12:03:27PM -0700, Ian Rogers wrote:
> > > > > > > Rather than have fake and tool PMUs being special flags in an evsel,
> > > > > > > create special PMUs. This allows, for example, duration_time to also
> > > > > > > be tool/duration_time/. Once adding events to the tools PMU is just
> > > > > > > adding to an array, add events for nearly all the expr literals like
> > > > > > > num_cpus_online. Rather than create custom logic for finding and
> > > > > > > describing the tool events use json and add a notion of common json
> > > > > > > for the tool events.
> > > > > > >
> > > > > > > Following the convention of the tool PMU, create a hwmon PMU that
> > > > > > > exposes hwmon data for reading. For example, the following shows
> > > > > > > reading the CPU temperature and 2 fan speeds alongside the uncore
> > > > > > > frequency:
> > > > > > > ```
> > > > > > > $ perf stat -e temp_cpu,fan1,hwmon_thinkpad/fan2/,tool/num_cpus_online/ -M UNCORE_FREQ -I 1000
> > > > > > > 1.001153138 52.00 'C temp_cpu
> > > > > > > 1.001153138 2,588 rpm fan1
> > > > > > > 1.001153138 2,482 rpm hwmon_thinkpad/fan2/
> > > > > > > 1.001153138 8 tool/num_cpus_online/
> > > > > > > 1.001153138 1,077,101,397 UNC_CLOCK.SOCKET # 1.08 UNCORE_FREQ
> > > > > > > 1.001153138 1,012,773,595 duration_time
> > > > > > > ...
> > > > > > > ```
> > > > > > >
> > > > > > > Additional data on the hwmon events is in perf list:
> > > > > > > ```
> > > > > > > $ perf list
> > > > > > > ...
> > > > > > > hwmon:
> > > > > > > ...
> > > > > > > temp_core_0 OR temp2
> > > > > > > [Temperature in unit coretemp named Core 0. crit=100'C,max=100'C crit_alarm=0'C. Unit:
> > > > > > > hwmon_coretemp]
> > > > > > > ...
> > > > > > > ```
> > > > > > >
> > > > > > > v2: Address Namhyung's review feedback. Rebase dropping 4 patches
> > > > > > > applied by Arnaldo, fix build breakage reported by Arnaldo.
> > > > > > >
> > > > > > > Ian Rogers (13):
> > > > > > > perf pmu: Simplify an asprintf error message
> > > > > > > perf pmu: Allow hardcoded terms to be applied to attributes
> > > > > > > perf parse-events: Expose/rename config_term_name
> > > > > > > perf tool_pmu: Factor tool events into their own PMU
> > > > > > > perf tool_pmu: Rename enum perf_tool_event to tool_pmu_event
> > > > > > > perf tool_pmu: Rename perf_tool_event__* to tool_pmu__*
> > > > > > > perf tool_pmu: Move expr literals to tool_pmu
> > > > > > > perf jevents: Add tool event json under a common architecture
> > > > > > > perf tool_pmu: Switch to standard pmu functions and json descriptions
> > > > > > > perf tests: Add tool PMU test
> > > > > > > perf hwmon_pmu: Add a tool PMU exposing events from hwmon in sysfs
> > > > > > > perf test: Add hwmon "PMU" test
> > > > > > > perf docs: Document tool and hwmon events
> > > > > >
> > > > > > For patch 1-10,
> > > > > >
> > > > > > Acked-by: Namhyung Kim <namhyung@xxxxxxxxxx>
> > > >
> > > > I thought the plan was for 1 to 10 to be in v6.12 and the remaining 3
> > > > to be in perf-tools-next/v6.13. I'm not seeing any of the series in
> > > > perf-tools so should everything be going in perf-tools-next?
> > >
> > > Ok, I'll pick up the tools_pmu changes first.
>
> It doesn't apply cleanly anymore. Please rebase.
>
> > >
> > > And I think it'd be much easier for me if you break the hwmon change
> > > like with basic PMU enabling and unit/alias support.
> >
> > I'd kept the hwmon PMU as a single addition on purpose - testing and
> > documentation are follow up patches. Typically a new driver would be a
> > single commit, and so I think this is the LKML norm. For example:
> > https://lore.kernel.org/lkml/20190326151753.19384-3-shameerali.kolothum.thodi@xxxxxxxxxx/
> >
> > Having multiple commits where things are only partially working means
> > bisects will be broken. It also means changes I have on top of this
> > can end up conflicting with what you're doing. I agree this means we
> > have a big patch when the new thing is added, I think this is normal
> > in the case of a driver - which to some extent this is.
>
> I think it depends, and of course I want bisectable patches. A
> standalone driver with well-known pattern of implementation could come
> in a single commit. But this is new and I'm not familiar with the hwmon
> interfaces and behaviors. So I'm asking you to split the minimal code
> that can run with perf stat from the full-fledged version. That would
> help maintainers understand and maintain the code better.
I consider the code to be in its minimal state. The first 10 patches
lay ground work in tool PMU for an API hwmon PMU will follow, patch 12
adds the testing separated from the main driver and patch 13 adds the
documentation.
In patch 11 the added APIs are:
perf_pmus__read_hwmon_pmus
perf_pmu__is_hwmon
hwmon_pmu__have_event
hwmon_pmu__num_events
hwmon_pmu__for_each_event
hwmon_pmu__check_alias
hwmon_pmu__config_terms
hwmon_pmu__exit
evsel__hwmon_pmu_open
evsel__hwmon_pmu_read
If we read hwmon PMUs without allowing the events to be programmed
then the hwmon code would just be empty and I see no value in such a
patch - I'd be fighting all of the C compilers unused variable and
function warnings.
If we have a hwmon PMU we need the perf_pmu__is_hwmon test as we lack
C++ virtual methods.
There are 3 event functions exactly corresponding to the same
functions in perf_pmu. The reading code store data in a hashmap so
these functions query or iterate the hashmap.
The check_alias and config_terms functions set up the perf_event_attr
and without them the generic code won't work. They are copies of the
generic code but with support for most terms removed as things like
call-graph have no meaning on a hwmon event.
The exit function is the usual house keeping.
The evsel open and read functions are needed as trying to open a
configured event with perf_event_open will fail and break tests like
tools/perf/tests/shell/stat_all_pmu.sh. If you open the event you
can't read it as if the contents were the binary contents of a
perf_event_open file, it has to be read as text so we need the read
function.
The driver is essentially a hashmap. hwmon files are of the form
<type><number>_<item>, so an event is a type and number with the
different items associated with the event held in the hashmap's value.
We can find an event like "temp1" from the type and number but we may
prefer the event "temp_cpu" where "cpu" is read from the label item
file. Finding such labelled events is a linear search and the label
file is read when the event is created in the hashmap for simplicity.
We could make the driver support the "temp1" name and then the
"temp_cpu" name but the line savings would be minimal, we'd have a new
hwmon driver that would have different rules around event alias names
and the like, basically it'd be a whole heap of work that would in the
next patch just be thrown away.
I've described how the driver works in the paragraph above but in the
code I added a generous amount of kerneldoc that already says this. I
could remove the kerneldoc and add it as an additional patch, but I
don't see why. I also don't see how this differs from how existing
drivers are added except that the pmu API is cleaned up prior to use
while drivers tend to be using a stable API.
Thanks,
Ian