Re: [PATCH 04/12] x86/cqm: Add Per pkg rmid support\

From: Thomas Gleixner
Date: Mon Jan 16 2017 - 13:45:51 EST


On Fri, 6 Jan 2017, Vikas Shivappa wrote:

> Subject : [PATCH 04/12] x86/cqm: Add Per pkg rmid support

Can you please be a bit more careful about the subject lines. 'Per' wants
to be 'per' and pkg really can be written as package. There is no point in
cryptic abbreviations for no value. Finaly RMID wants to be all uppercase
as you use it in the text below.

> Patch introduces a new cqm_pkgs_data to keep track of the per package

Again. We know that this is a patch.....

> free list, limbo list and other locking structures.

So free list and limbo list are locking structures, interesting.

> The corresponding rmid structures in the perf_event is

s/structures/structure/

> changed to hold an array of u32 RMIDs instead of a single u32.
>
> The RMIDs are not assigned at the time of event creation and are
> assigned in lazy mode at the first sched_in time for a task, thereby
> rmid is never allocated if a task is not scheduled on a package. This
> helps better usage of RMIDs and its scales with the increasing
> sockets/packages.
>
> Locking:
> event list - perf init and terminate hold mutex. spin lock is held to
> gaurd from the mbm hrtimer.
> per pkg free and limbo list - global spin lock. Used by
> get_rmid,put_rmid, perf start, terminate

Locking documentation wants to be in the code not in some random changelog.

> Tests: RMIDs available increase by x times where x is number of sockets
> and the usage is dynamic so we save more.

What means: 'Tests:'? Is there a test in this patch? If yes, I seem to be
missing it.

> Patch is based on David Carrillo-Cisneros <davidcc@xxxxxxxxxx> patches
> in cqm2 series.

We document such attributions with a tag:

Originally-From: David Carrillo-Cisneros <davidcc@xxxxxxxxxx>

That way tools can pick it up.

> static cpumask_t cqm_cpumask;
>
> +struct pkg_data **cqm_pkgs_data;

Why is this global? There is no user outside of cqm.c AFAICT.

> -/*
> - * We use a simple array of pointers so that we can lookup a struct
> - * cqm_rmid_entry in O(1). This alleviates the callers of __get_rmid()
> - * and __put_rmid() from having to worry about dealing with struct
> - * cqm_rmid_entry - they just deal with rmids, i.e. integers.
> - *
> - * Once this array is initialized it is read-only. No locks are required
> - * to access it.
> - *
> - * All entries for all RMIDs can be looked up in the this array at all
> - * times.

So this comment was actually valuable. Sure, it does not match the new
implementation, but the basic principle is still the same. The comment
wants to be updated and not just removed.

> *
> * We expect to be called with cache_mutex held.

You rename cache_mutex to cache lock, but updating the comments is
optional, right?

> -static u32 __get_rmid(void)
> +static u32 __get_rmid(int domain)

unsigned int domain please. The domain cannot be negative.

> {
> + struct list_head *cqm_flist;
> struct cqm_rmid_entry *entry;

Please keep the variables as a reverse fir tree:

> struct cqm_rmid_entry *entry;
> + struct list_head *cqm_flist;

>
> - lockdep_assert_held(&cache_mutex);
> + lockdep_assert_held(&cache_lock);

> -static void __put_rmid(u32 rmid)
> +static void __put_rmid(u32 rmid, int domain)
> {
> struct cqm_rmid_entry *entry;
>
> - lockdep_assert_held(&cache_mutex);
> + lockdep_assert_held(&cache_lock);
>
> - WARN_ON(!__rmid_valid(rmid));
> - entry = __rmid_entry(rmid);
> + WARN_ON(!rmid);

What's wrong with __rmid_valid() ?

> + entry = __rmid_entry(rmid, domain);
>
> + kfree(cqm_pkgs_data);
> }
>
> +

Random white space noise.

> static bool is_cqm_event(int e)
> @@ -420,10 +349,11 @@ static void __intel_mbm_event_init(void *info)
> {
> struct rmid_read *rr = info;
>
> - update_sample(rr->rmid, rr->evt_type, 1);
> + if (__rmid_valid(rr->rmid[pkg_id]))

Where the heck is pkg_id coming from?

Ahh:

#define pkg_id topology_physical_package_id(smp_processor_id())

That's crap. It's existing crap, but nevertheless crap.

First of all it's irritating as 'pkg_id' looks like a variable and in this
case it would be at least file global, which makes no sense at all.

Secondly, we really want to move this to the logical package id as that is
actually guaranteing that the package id is smaller than
topology_max_packages(). The physical package id has no such guarantee.

And then please keep this local so its simple to read and parse:

unsigned int pkgid = topology_logical_package_id(smp_processor_id());

> + update_sample(rr->rmid[pkg_id], rr->evt_type, 1);
> }

> @@ -444,7 +374,7 @@ static int intel_cqm_setup_event(struct perf_event *event,
> struct perf_event **group)
> {
> struct perf_event *iter;
> - u32 rmid;
> + u32 *rmid, sizet;

What's wrong with size? It's not confusable with size_t, right?

> +void alloc_needed_pkg_rmid(u32 *cqm_rmid)

Again: Why is this global?

And what's the meaning of needed in the function name? If the RMID wouldn't
be needed then the function would not be called, right?

> +{
> + unsigned long flags;
> + u32 rmid;
> +
> + if (WARN_ON(!cqm_rmid))
> + return;
> +
> + if (cqm_rmid[pkg_id])
> + return;
> +
> + raw_spin_lock_irqsave(&cache_lock, flags);
> +
> + rmid = __get_rmid(pkg_id);
> + if (__rmid_valid(rmid))
> + cqm_rmid[pkg_id] = rmid;
> +
> + raw_spin_unlock_irqrestore(&cache_lock, flags);
> +}
> +
> static void intel_cqm_event_start(struct perf_event *event, int mode)
> {
> struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
> - u32 rmid = event->hw.cqm_rmid;
> + u32 rmid;
>
> if (!(event->hw.cqm_state & PERF_HES_STOPPED))
> return;
>
> event->hw.cqm_state &= ~PERF_HES_STOPPED;
>
> + alloc_needed_pkg_rmid(event->hw.cqm_rmid);
> +
> + rmid = event->hw.cqm_rmid[pkg_id];

The allocation can fail. What's in event->hw.cqm_rmid[pkg_id] then? Zero or
what? And why is any of those values fine?

Can you please add comments so reviewers and readers do not have to figure
out every substantial detail themself?

> state->rmid = rmid;
> wrmsr(MSR_IA32_PQR_ASSOC, rmid, state->closid);

> +static inline void
> + cqm_event_free_rmid(struct perf_event *event)

Yet another coding style variant from your unlimited supply of coding
horrors.

> +{
> + u32 *rmid = event->hw.cqm_rmid;
> + int d;
> +
> + for (d = 0; d < cqm_socket_max; d++) {
> + if (__rmid_valid(rmid[d]))
> + __put_rmid(rmid[d], d);
> + }
> + kfree(event->hw.cqm_rmid);
> + list_del(&event->hw.cqm_groups_entry);
> +}
> static void intel_cqm_event_destroy(struct perf_event *event)
> {
> struct perf_event *group_other = NULL;
> @@ -737,16 +693,11 @@ static void intel_cqm_event_destroy(struct perf_event *event)
> * If there was a group_other, make that leader, otherwise
> * destroy the group and return the RMID.
> */
> - if (group_other) {
> + if (group_other)
> list_replace(&event->hw.cqm_groups_entry,
> &group_other->hw.cqm_groups_entry);

Please keep the brackets. See:

http://lkml.kernel.org/r/alpine.DEB.2.20.1609101416420.32361@nanos

> - } else {
> - u32 rmid = event->hw.cqm_rmid;
> -
> - if (__rmid_valid(rmid))
> - __put_rmid(rmid);
> - list_del(&event->hw.cqm_groups_entry);
> - }
> + else
> + cqm_event_free_rmid(event);
> }
>
> raw_spin_unlock_irqrestore(&cache_lock, flags);

> +static int pkg_data_init_cpu(int cpu)

cpus are unsigned int

> +{
> + struct cqm_rmid_entry *ccqm_rmid_ptrs = NULL, *entry = NULL;

What are the pointer intializations for?

> + int curr_pkgid = topology_physical_package_id(cpu);

Please use logical packages.

> + struct pkg_data *pkg_data = NULL;
> + int i = 0, nr_rmids, ret = 0;

Crap. 'i' is used in the for() loop below and therefor initialized exactly
there and not at some random other place. ret is a completely pointless
variable and the initialization is even more pointless. There is exactly
one code path using it (fail) and you can just return -ENOMEM from there.

> + if (cqm_pkgs_data[curr_pkgid])
> + return 0;
> +
> + pkg_data = kzalloc_node(sizeof(struct pkg_data),
> + GFP_KERNEL, cpu_to_node(cpu));
> + if (!pkg_data)
> + return -ENOMEM;
> +
> + INIT_LIST_HEAD(&pkg_data->cqm_rmid_free_lru);
> + INIT_LIST_HEAD(&pkg_data->cqm_rmid_limbo_lru);
> +
> + mutex_init(&pkg_data->pkg_data_mutex);
> + raw_spin_lock_init(&pkg_data->pkg_data_lock);
> +
> + pkg_data->rmid_work_cpu = cpu;
> +
> + nr_rmids = cqm_max_rmid + 1;
> + ccqm_rmid_ptrs = kzalloc(sizeof(struct cqm_rmid_entry) *
> + nr_rmids, GFP_KERNEL);
> + if (!ccqm_rmid_ptrs) {
> + ret = -ENOMEM;
> + goto fail;
> + }
> +
> + for (; i <= cqm_max_rmid; i++) {
> + entry = &ccqm_rmid_ptrs[i];
> + INIT_LIST_HEAD(&entry->list);
> + entry->rmid = i;
> +
> + list_add_tail(&entry->list, &pkg_data->cqm_rmid_free_lru);
> + }
> +
> + pkg_data->cqm_rmid_ptrs = ccqm_rmid_ptrs;
> + cqm_pkgs_data[curr_pkgid] = pkg_data;
> +
> + /*
> + * RMID 0 is special and is always allocated. It's used for all
> + * tasks that are not monitored.
> + */
> + entry = __rmid_entry(0, curr_pkgid);
> + list_del(&entry->list);
> +
> + return 0;
> +fail:
> + kfree(ccqm_rmid_ptrs);
> + ccqm_rmid_ptrs = NULL;

And clearing the local variable has which value?

> + kfree(pkg_data);
> + pkg_data = NULL;
> + cqm_pkgs_data[curr_pkgid] = NULL;

It never got set, so why do you need to clear it? Just because you do not
trust the compiler?

> + return ret;
> +}
> +
> +static int cqm_init_pkgs_data(void)
> +{
> + int i, cpu, ret = 0;

Again, pointless 0 initialization.

> + cqm_pkgs_data = kzalloc(
> + sizeof(struct pkg_data *) * cqm_socket_max,
> + GFP_KERNEL);

Using a simple local variable to calculate size first would spare this new
variant of coding style horror.

> + if (!cqm_pkgs_data)
> + return -ENOMEM;
> +
> + for (i = 0; i < cqm_socket_max; i++)
> + cqm_pkgs_data[i] = NULL;

Surely kzalloc() is not reliable enough, so make sure that the pointers are
really NULL. Brilliant stuff.

> +
> + for_each_online_cpu(cpu) {
> + ret = pkg_data_init_cpu(cpu);
> + if (ret)
> + goto fail;
> + }

And that's protected against CPU hotplug in which way?

Aside of that. What allocates the package data for packages which come
online _AFTER_ this function is called? Nothing AFAICT.

What you really need is a CPU hotplug callback for the prepare stage, which
is doing the setup of this race free and also handling packages which come
online after this.

> diff --git a/arch/x86/events/intel/cqm.h b/arch/x86/events/intel/cqm.h
> new file mode 100644
> index 0000000..4415497
> --- /dev/null
> +++ b/arch/x86/events/intel/cqm.h
> @@ -0,0 +1,37 @@
> +#ifndef _ASM_X86_CQM_H
> +#define _ASM_X86_CQM_H
> +
> +#ifdef CONFIG_INTEL_RDT_M
> +
> +#include <linux/perf_event.h>
> +
> +/**
> + * struct pkg_data - cqm per package(socket) meta data
> + * @cqm_rmid_free_lru A least recently used list of free RMIDs
> + * These RMIDs are guaranteed to have an occupancy less than the
> + * threshold occupancy

You certainly did not even try to run kernel doc on this.

@var: Explanation

is the proper format. Can you spot the difference?

Also the multi line comments are horribly formatted:

@var: This is a multiline commend which is necessary because
it needs a lot of text......

> + * @cqm_rmid_entry - The entry in the limbo and free lists.

And this is incorrect as well. You are not even trying to make stuff
consistently wrong.

> + * @delayed_work - Work to reuse the RMIDs that have been freed.
> + * @rmid_work_cpu - The cpu on the package on which work is scheduled.

Also the formatting wants to be tabular:

* @var: Text
* @long_var: Text
* @really_long_var: Text

Sigh,

tglx