Re: [PATCH V1 11/13] selftests/resctrl: Change Cache Quality Monitoring (CQM) test

From: Reinette Chatre
Date: Wed Mar 11 2020 - 14:03:27 EST


Hi Sai,

On 3/11/2020 10:33 AM, Sai Praneeth Prakhya wrote:
> On Wed, 2020-03-11 at 10:19 -0700, Reinette Chatre wrote:
>> On 3/10/2020 7:46 PM, Sai Praneeth Prakhya wrote:
>>> On Tue, 2020-03-10 at 15:18 -0700, Reinette Chatre wrote:
>>>> On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:
>>>>> .mum_resctrlfs = 0,
>>>>> .filename = RESULT_FILE_NAME,
>>>>> - .mask = ~(long_mask << n) & long_mask,
>>>>> - .span = cache_size * n / count_of_bits,
>>>>> .num_of_runs = 0,
>>>>> - .setup = cqm_setup,
>>>>> + .setup = cqm_setup
>>>>> };
>>>>> + int ret;
>>>>> + char schemata[64];
>>>>> + unsigned long long_mask;
>>>>>
>>>>> - if (strcmp(benchmark_cmd[0], "fill_buf") == 0)
>>>>> - sprintf(benchmark_cmd[1], "%lu", param.span);
>>>>> + ret = remount_resctrlfs(1);
>>>>> + if (ret)
>>>>> + return ret;
>>>>
>>>> Here resctrl is remounted and followed by some changes to the root
>>>> group's schemata. That is followed by a call to resctrl_val that
>>>> attempts to remount resctrl again that will undo all the configurations
>>>> inbetween.
>>>
>>> No, it wouldn't because mum_resctrlfs is 0. When resctrl FS is already
>>> mounted
>>> and mum_resctrlfs is 0, then remount_resctrlfs() is a noop.
>>>
>>
>> I missed that. Thank you.
>>
>> fyi ... when I tried these tests I encountered the following error
>> related to unmounting:
>>
>> [SNIP]
>> ok Write schema "L3:1=7fff" to resctrl FS
>> ok Write schema "L3:1=ffff" to resctrl FS
>> ok Write schema "L3:1=1ffff" to resctrl FS
>> ok Write schema "L3:1=3ffff" to resctrl FS
>> # Unable to umount resctrl: Device or resource busy
>> # Results are displayed in (Bytes)
>> ok CQM: diff within 5% for mask 1
>> # alloc_llc_cache_size: 2883584
>> # avg_llc_occu_resc: 2973696
>> ok CQM: diff within 5% for mask 3
>> [SNIP]
>>
>> This seems to originate from resctrl_val() that forces an unmount but if
>> that fails the error is not propagated.
>
> Yes, that's right and it's a good test. I didn't encounter this issue during
> my testing because I wasn't using resctrl FS from other terminals (I think you
> were using resctrl FS from other terminal and hence resctrl_test was unable to
> unmount it).

I was not explicitly testing for this but this may have been the case.

As a sidenote ... could remount_resctrlfs() be called consistently? It
seems to switch between being called with true/false and 1/0. Since its
parameter type is boolean using true/false seems most appropriate.

> I think the error should not be propagated because unmounting resctrl FS
> shouldn't stop us from checking the results. If measuring values reports an
> error then we shouldn't check for results.

This sounds right. It is inconsistent though ... the CQM test unmounts
resctrl after it is run but the CAT test does not. Looking closer the
CAT test seems to leave its artifacts around in resctrl and this should
be cleaned up.

I am not sure about the expectations here. Unmounting resctrl after a
test is run is indeed the easiest to clean up and may be ok. It may be a
surprise to the user though. Perhaps there can be a snippet in the
README that warns people about this?

Thank you very much

Reinette