Re: [PATCH v2 4/9] perf affinity: Add infrastructure to save/restore affinity

From: Alexey Budankov
Date: Wed Oct 23 2019 - 14:08:55 EST


On 23.10.2019 20:19, Andi Kleen wrote:
> On Wed, Oct 23, 2019 at 07:16:13PM +0300, Alexey Budankov wrote:
>>
>> On 23.10.2019 17:52, Andi Kleen wrote:
>>> On Wed, Oct 23, 2019 at 04:30:49PM +0200, Jiri Olsa wrote:
>>>> On Wed, Oct 23, 2019 at 06:02:35AM -0700, Andi Kleen wrote:
>>>>> On Wed, Oct 23, 2019 at 11:59:11AM +0200, Jiri Olsa wrote:
>>>>>> On Sun, Oct 20, 2019 at 10:51:57AM -0700, Andi Kleen wrote:
>>>>>>
>>>>>> SNIP
>>>>>>
>>>>>>> +}
>>>>>>> diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..e56148607e33
>>>>>>> --- /dev/null
>>>>>>> +++ b/tools/perf/util/affinity.h
>>>>>>> @@ -0,0 +1,15 @@
>>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>>> +#ifndef AFFINITY_H
>>>>>>> +#define AFFINITY_H 1
>>>>>>> +
>>>>>>> +struct affinity {
>>>>>>> + unsigned char *orig_cpus;
>>>>>>> + unsigned char *sched_cpus;
>>>>>>
>>>>>> why not use cpu_set_t directly?
>>>>>
>>>>> Because it's too small in glibc (only 1024 CPUs) and perf already
>>>>> supports more.
>>>>
>>>> nice, we're using it all over the place.. how about using bitmap_alloc?
>>>
>>> Okay.
>>>
>>> The other places is mainly perf record from Alexey's recent affinity changes.
>>> These probably need to be fixed.
>>>
>>> +Alexey
>>
>> Despite the issue indeed looks generic for stat and record modes,
>> have you already observed record startup overhead somewhere in your setups?
>> I would, first, prefer to reproduce the overhead, to have stable use case
>> for evaluation and then, possibly, improvement.
>
> What I meant the cpu_set usages you added in
>
> commit 9d2ed64587c045304efe8872b0258c30803d370c
> Author: Alexey Budankov <alexey.budankov@xxxxxxxxxxxxxxx>
> Date: Tue Jan 22 20:47:43 2019 +0300
>
> perf record: Allocate affinity masks
>
> need to be fixed to allocate dynamically, or at least use MAX_NR_CPUs to
> support systems with >1024CPUs. That's an independent functionality
> problem.

Oh, it is clear now. Thanks for pointing this out. For that to move from
cpu_mask_t to new custom struct affinity type its API requires extension
to provide mask operations similar to the ones that cpu_mask_t provides:
CPU_ZERO(), CPU_SET(), CPU_EQUAL(), CPU_OR().

For example it could be like: affinity__mask_zero(), affinity__mask_set(),
affinity__mask_equal(), affinity__mask_or() and then the collecting part
of record could also be moved to struct affinity type and overcome >1024CPUs
limitation.

~Alexey

>
> I haven't seen any large enough perf record usage to run
> into the IPI problems for record.
>
> -Andi
>