Re: [PATCH 4/6] x86/mbm: Memory bandwidth monitoring event management

From: Vikas Shivappa
Date: Thu Mar 10 2016 - 17:50:09 EST




On Mon, 7 Mar 2016, Peter Zijlstra wrote:

On Tue, Mar 01, 2016 at 03:48:26PM -0800, Vikas Shivappa wrote:

Lot of the scheduling code was taken out from Tony's patch and a 3-4
lines of change were added in the intel_cqm_event_read. Since the timer
is no more added on every context switch this change was made.

It this here to confuse people or is there some actual information in
it?

Will remove the comment.


+/*
+ * MBM Counter is 24bits wide. MBM_CNTR_MAX defines max counter
+ * value
+ */
+#define MBM_CNTR_MAX 0xffffff

#define MBM_CNTR_WIDTH 24
#define MBM_CNTR_MAX ((1U << MBM_CNTR_WIDTH) - 1)


Will fix

#define QOS_L3_OCCUP_EVENT_ID (1 << 0)
+/*
+ * MBM Event IDs as defined in SDM section 17.15.5
+ * Event IDs are used to program EVTSEL MSRs before reading mbm event counters
+ */
+enum mbm_evt_type {
+ QOS_MBM_TOTAL_EVENT_ID = 0x02,
+ QOS_MBM_LOCAL_EVENT_ID,
+ QOS_MBM_TOTAL_BW_EVENT_ID,
+ QOS_MBM_LOCAL_BW_EVENT_ID,
+};

QOS_L3_*_EVENT_ID is a define, these are an enum. Rather inconsistent.


Will be changing all of them to #define . and we are also removing the bw events..

struct rmid_read {
u32 rmid;

Hole, you could've filled with the enum (which ends up being an int I
think).

atomic64_t value;
+ enum mbm_evt_type evt_type;
};

+static bool is_mbm_event(int e)

You had an enum type, you might as well use it.

the enum will be gone..


+{
+ return (e >= QOS_MBM_TOTAL_EVENT_ID && e <= QOS_MBM_LOCAL_BW_EVENT_ID);
+}


+static struct sample *update_sample(unsigned int rmid,
+ enum mbm_evt_type evt_type, int first)
+{
+ ktime_t cur_time;
+ struct sample *mbm_current;
+ u32 vrmid = rmid_2_index(rmid);
+ u64 val, bytes, diff_time;
+ u32 eventid;
+
+ if (evt_type & QOS_MBM_LOCAL_EVENT_MASK) {
+ mbm_current = &mbm_local[vrmid];
+ eventid = QOS_MBM_LOCAL_EVENT_ID;
+ } else {
+ mbm_current = &mbm_total[vrmid];
+ eventid = QOS_MBM_TOTAL_EVENT_ID;
+ }
+
+ cur_time = ktime_get();
+ wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
+ rdmsrl(MSR_IA32_QM_CTR, val);
+ if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
+ return mbm_current;

+ val &= MBM_CNTR_MAX;

+ if (val < mbm_current->prev_msr)
+ bytes = MBM_CNTR_MAX - mbm_current->prev_msr + val + 1;
+ else
+ bytes = val - mbm_current->prev_msr;

Would not something like:

shift = 64 - MBM_CNTR_WIDTH;

bytes = (val << shift) - (prev << shift);
bytes >>= shift;

be less obtuse? (and consistent with how every other perf update
function does it).

Will fix.


What guarantee is there we didn't wrap multiple times? Doesn't that
deserve a comment?


this is taken care of in the next patch 0006. I have put a comment there that h/w guarentees that overflow wont happen with in 1s at the definition of the timers, but can add an other comment here in the patch 0006


+ bytes *= cqm_l3_scale;
+
+ mbm_current->total_bytes += bytes;
+ mbm_current->interval_bytes += bytes;
+ mbm_current->prev_msr = val;
+ diff_time = ktime_ms_delta(cur_time, mbm_current->interval_start);

Here we do a / 1e6

+
+ /*
+ * The b/w measured is really the most recent/current b/w.
+ * We wait till enough time has passed to avoid
+ * arthmetic rounding problems.Having it at >=100ms,
+ * such errors would be <=1%.
+ */
+ if (diff_time > 100) {

This could well be > 100e6 instead, avoiding the above division most of
the time.

+ bytes = mbm_current->interval_bytes * MSEC_PER_SEC;
+ do_div(bytes, diff_time);
+ mbm_current->bandwidth = bytes;
+ mbm_current->interval_bytes = 0;
+ mbm_current->interval_start = cur_time;
+ }
+
+ return mbm_current;
+}

How does the above time tracking deal with the event not actually having
been scheduled the whole time?

Will be removing the bw events - so should address all three comments above.



+static void init_mbm_sample(u32 rmid, enum mbm_evt_type evt_type)
+{
+ struct rmid_read rr = {
+ .value = ATOMIC64_INIT(0),
+ };
+
+ rr.rmid = rmid;
+ rr.evt_type = evt_type;

That's just sad.. put those two in the struct init as well.

Will fix

thanks,
vikas

+ /* on each socket, init sample */
+ on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_init, &rr, 1);
+}