Re: [PATCH v1] perf record: fix binding of AIO user space buffers to nodes

From: Alexey Budankov
Date: Thu Mar 12 2020 - 13:55:01 EST




On 12.03.2020 20:12, Arnaldo Carvalho de Melo wrote:
> Em Thu, Mar 12, 2020 at 07:09:56PM +0300, Alexey Budankov escreveu:
>>
>> On 12.03.2020 17:31, Arnaldo Carvalho de Melo wrote:
>>> Em Thu, Mar 12, 2020 at 03:21:45PM +0300, Alexey Budankov escreveu:
>>>>
>>>> Correct maxnode parameter value passed to mbind() syscall to be
>>>> the amount of node mask bits to analyze plus 1. Dynamically allocate
>>>> node mask memory depending on the index of node of cpu being profiled.
>>>> Fixes: c44a8b44ca9f ("perf record: Bind the AIO user space buffers to nodes")
>>>> Signed-off-by: Alexey Budankov <alexey.budankov@xxxxxxxxxxxxxxx>
>>>> ---
>>>> tools/perf/util/mmap.c | 21 +++++++++++++++------
>>>> 1 file changed, 15 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
>>>> index 3b664fa673a6..6d604cd67a95 100644
>>>> --- a/tools/perf/util/mmap.c
>>>> +++ b/tools/perf/util/mmap.c
>>>> @@ -98,20 +98,29 @@ static int perf_mmap__aio_bind(struct mmap *map, int idx, int cpu, int affinity)
>>>> {
>>>> void *data;
>>>> size_t mmap_len;
>>>> - unsigned long node_mask;
>>>> + unsigned long *node_mask;
>>>> + unsigned long node_index;
>>>> + int err = 0;
>>>>
>>>> if (affinity != PERF_AFFINITY_SYS && cpu__max_node() > 1) {
>>>> data = map->aio.data[idx];
>>>> mmap_len = mmap__mmap_len(map);
>>>> - node_mask = 1UL << cpu__get_node(cpu);
>>>> - if (mbind(data, mmap_len, MPOL_BIND, &node_mask, 1, 0)) {
>>>> - pr_err("Failed to bind [%p-%p] AIO buffer to node %d: error %m\n",
>>>> - data, data + mmap_len, cpu__get_node(cpu));
>>>> + node_index = cpu__get_node(cpu);
>>>> + node_mask = bitmap_alloc(node_index + 1);
>>>> + if (!node_mask) {
>>>> + pr_err("Failed to allocate node mask for mbind: error %m\n");
>>>> return -1;
>>>> }
>>>> + set_bit(node_index, node_mask);
>>>> + if (mbind(data, mmap_len, MPOL_BIND, node_mask, node_index + 1 + 1/*nr_bits + 1*/, 0)) {
>>>
>>> ^^^^^^^^^^^^^^
>>> Leftover?
>>
>> Intentionally put it here to document kernel behavior for mbind() syscall
>> because currently it is different from the man page [1] documented:
>>
>> "nodemask points to a bit mask of nodes containing up to maxnode bits.
>> The bit mask size is rounded to the next multiple of sizeof(unsigned
>> long), but the kernel will use bits only up to maxnode. A NULL value
>> of nodemask or a maxnode value of zero specifies the empty set of
>> nodes. If the value of maxnode is zero, the nodemask argument is
>> ignored. Where a nodemask is required, it must contain at least one
>> node that is on-line, allowed by the thread's current cpuset context
>> (unless the MPOL_F_STATIC_NODES mode flag is specified), and contains
>> memory."
>
> Ok, will add the above as a comment above the line with that comment.

Thanks!
~Alexey