Re: [PATCH] KVM: selftests: Run dirty_log_perf_test on specific cpus

From: Sean Christopherson
Date: Wed Aug 17 2022 - 13:25:44 EST


On Wed, Aug 17, 2022, Vipin Sharma wrote:
> Add command line options to run the vcpus and the main process on the
> specific cpus on a host machine. This is useful as it provides
> options to analyze performance based on the vcpus and dirty log worker
> locations, like on the different numa nodes or on the same numa nodes.

The two options should probably be separate patches, they are related but still
two very distinct changes.

> Signed-off-by: Vipin Sharma <vipinsh@xxxxxxxxxx>
> Suggested-by: David Matlack <dmatlack@xxxxxxxxxx>
> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Suggested-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
>
> This is based on the discussion at
> https://lore.kernel.org/lkml/20220801151928.270380-1-vipinsh@xxxxxxxxxx/

This can and should be captured in the changelog proper:

Link: https://lore.kernel.org/lkml/20220801151928.270380-1-vipinsh@xxxxxxxxxx

> @@ -348,12 +353,74 @@ static void run_test(enum vm_guest_mode mode, void *arg)
> perf_test_destroy_vm(vm);
> }
>
> +static int parse_num(const char *num_str)
> +{
> + int num;
> + char *end_ptr;
> +
> + errno = 0;
> + num = (int)strtol(num_str, &end_ptr, 10);
> + TEST_ASSERT(num_str != end_ptr && *end_ptr == '\0',
> + "Invalid number string.\n");
> + TEST_ASSERT(errno == 0, "Conversion error: %d\n", errno);

Is the paranoia truly necessary? What happens if parse_cpu_list() simply uses
atoi() and is passed garbage?

> +
> + return num;
> +}
> +
> +static int parse_cpu_list(const char *arg)
> +{
> + char delim[2] = ",";
> + char *cpu, *cpu_list;
> + int i = 0, cpu_num;
> +
> + cpu_list = strdup(arg);
> + TEST_ASSERT(cpu_list, "Low memory\n");

Heh, probably a little less than "low". Just be literal and let the user figure
out why the allocation failed instead.

TEST_ASSERT(cpu_list, "strdup() allocation failed\n");

> +
> + cpu = strtok(cpu_list, delim);
> + while (cpu) {
> + cpu_num = parse_num(cpu);
> + TEST_ASSERT(cpu_num >= 0, "Invalid cpu number: %d\n", cpu_num);
> + vcpu_to_lcpu_map[i++] = cpu_num;
> + cpu = strtok(NULL, delim);
> + }
> +
> + free(cpu_list);

The tokenization and parsing is nearly identical between parse_cpu_list() and
assign_dirty_log_perf_test_cpu(). The code can be made into a common helper by
passing in the destination, e.g.

static int parse_cpu_list(const char *arg, cpu_set_t *cpuset, int *vcpu_map)
{
const char delim[] = ",";
char *cpustr, *cpu_list;
int i = 0, cpu;

TEST_ASSERT(!!cpuset ^ !!vcpu_map);

cpu_list = strdup(arg);
TEST_ASSERT(cpu_list, "Low memory\n");

cpustr = strtok(cpu_list, delim);
while (cpustr) {
cpu = atoi(cpustr);
TEST_ASSERT(cpu >= 0, "Invalid cpu number: %d\n", cpu);
if (vcpu_map)
vcpu_to_lcpu_map[i++] = cpu_num;
else
CPU_SET(cpu_num, cpuset);
cpu = strtok(NULL, delim);
}

free(cpu_list);

return i;
}

> @@ -383,6 +450,26 @@ static void help(char *name)
> backing_src_help("-s");
> printf(" -x: Split the memory region into this number of memslots.\n"
> " (default: 1)\n");
> + printf(" -c: Comma separated values of the logical CPUs which will run\n"
> + " the vCPUs. Number of values should be equal to the number\n"
> + " of vCPUs.\n\n"
> + " Example: ./dirty_log_perf_test -v 3 -c 22,43,1\n"
> + " This means that the vcpu 0 will run on the logical cpu 22,\n"
> + " vcpu 1 on the logical cpu 43 and vcpu 2 on the logical cpu 1.\n"
> + " (default: No cpu mapping)\n\n");
> + printf(" -d: Comma separated values of the logical CPUs on which\n"
> + " dirty_log_perf_test will run. Without -c option, all of\n"
> + " the vcpus and main process will run on the cpus provided here.\n"
> + " This option also accepts a single cpu. (default: No cpu mapping)\n\n"
> + " Example 1: ./dirty_log_perf_test -v 3 -c 22,43,1 -d 101\n"
> + " Main application thread will run on logical cpu 101 and\n"
> + " vcpus will run on the logical cpus 22, 43 and 1\n\n"
> + " Example 2: ./dirty_log_perf_test -v 3 -d 101\n"
> + " Main application thread and vcpus will run on the logical\n"
> + " cpu 101\n\n"
> + " Example 3: ./dirty_log_perf_test -v 3 -d 101,23,53\n"
> + " Main application thread and vcpus will run on logical cpus\n"
> + " 101, 23 and 53.\n");
> puts("");
> exit(0);
> }

> @@ -455,6 +550,13 @@ int main(int argc, char *argv[])
> }
> }
>

I wonder if we should make -c and -d mutually exclusive. Tweak -c to include the
application thread, i.e. TEST_ASSERT(nr_lcpus == nr_vcpus+1) and require 1:1 pinning
for all tasks. E.g. allowing "-c ..., -d 0,1,22" seems unnecessary.

> + if (nr_lcpus != -1) {
> + TEST_ASSERT(nr_lcpus == nr_vcpus,
> + "Number of vCPUs (%d) are not equal to number of logical cpus provided (%d).",
> + nr_vcpus, nr_lcpus);
> + p.vcpu_to_lcpu = vcpu_to_lcpu_map;
> + }
> +
> TEST_ASSERT(p.iterations >= 2, "The test should have at least two iterations");
>
> pr_info("Test iterations: %"PRIu64"\n", p.iterations);