Re: [PATCH] x86/resctrl: fix an imbalance in domain_remove_cpu

From: Qian Cai
Date: Tue Dec 10 2019 - 07:11:41 EST




> On Dec 10, 2019, at 2:44 AM, Ryan Chen <yu.chen.surf@xxxxxxxxx> wrote:
>
> Could you elaborate a little more on the failure symptom?
> If I understand correctly, the error you described was due to
> r->mon_capable set to false while is_mbm_enabled() returns true?

Yes.

> Which means on this platform rdt_mon_features is non zero?

No idea. I did add some debug code that indicated resctrl_online_cpu() found 3 resources in for_each_capable_rdt_resource(r). Only the first one set r->mon_capable.

> And in get_rdt_mon_resources() it will invoke rdt_get_mon_l3_config(),
> however the only possible failure to do not set r->mon_capable is that it
> failed in dom_data_init() due to kcalloc() failure? Then the logic in

Very likely. Should be easy to confirm.

> get_rdt_resources() is that it will ignore the return error if rdt allocate
> feature is supported on this platform? If

Yes.

> this is the case, the r->mon_capable
> is not an indicator for whether the overflow thread has been created, right?

I am not sure about that.

> Can we simply remove the check of r->mon_capable in domain_add_cpu() and
> invoke domain_setup_mon_state() directly?

That should work too, but it is so perfect align with the r->alloc_capable check above, so I am not sure it is a good idea to break it.

> Humm, it looks like there are two places within this function
> invoked cancel_delayed_work(&d->mbm_over),
> why not adding the check for both of them?

Because I am not sure about the second one. It was never executed due to âcpu != d->mbm_work_cpuâ even after offlining all CPUs except cpu 0 and never cause anything wrong yet, so I could not test it yet, but I can see why it might need a similar check too if d->mbm_work_cpu is non-zero and could trigger the same imbalance.