Re: [PATCH 1/1] selftests: KVM: set affinity of VM to right CPUs

From: Dongli Zhang
Date: Tue Sep 28 2021 - 13:37:18 EST




On 9/27/21 12:22 PM, Sean Christopherson wrote:
> On Fri, Sep 24, 2021, Dongli Zhang wrote:
>> The nr_cpus = CPU_COUNT(&possible_mask) is the number of available CPUs in
>> possible_mask. As a result, the "cpu = i % nr_cpus" may always return CPU
>> that is not available in possible_mask.
>>
>> Suppose the server has 8 CPUs. The below Failure is encountered immediately
>> if the task is bound to CPU 5 and 6.
>
> /facepalm
>
>> ==== Test Assertion Failure ====
>> rseq_test.c:228: i > (NR_TASK_MIGRATIONS / 2)
>> pid=10127 tid=10127 errno=4 - Interrupted system call
>> 1 0x00000000004018e5: main at rseq_test.c:227
>> 2 0x00007fcc8fc66bf6: ?? ??:0
>> 3 0x0000000000401959: _start at ??:?
>> Only performed 4 KVM_RUNs, task stalled too much?
>>
>> Signed-off-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx>
>> ---
>> tools/testing/selftests/kvm/rseq_test.c | 18 +++++++++++++++++-
>> 1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
>> index c5e0dd664a7b..41df5173970c 100644
>> --- a/tools/testing/selftests/kvm/rseq_test.c
>> +++ b/tools/testing/selftests/kvm/rseq_test.c
>> @@ -10,6 +10,7 @@
>> #include <signal.h>
>> #include <syscall.h>
>> #include <sys/ioctl.h>
>> +#include <sys/sysinfo.h>
>> #include <asm/barrier.h>
>> #include <linux/atomic.h>
>> #include <linux/rseq.h>
>> @@ -43,6 +44,18 @@ static bool done;
>>
>> static atomic_t seq_cnt;
>>
>> +static int get_max_cpu_idx(void)
>> +{
>> + int nproc = get_nprocs_conf();
>> + int i, max = -ENOENT;
>> +
>> + for (i = 0; i < nproc; i++)
>> + if (CPU_ISSET(i, &possible_mask))
>> + max = i;
>> +
>> + return max;
>> +}
>> +
>> static void guest_code(void)
>> {
>> for (;;)
>> @@ -61,10 +74,13 @@ static void *migration_worker(void *ign)
>> {
>> cpu_set_t allowed_mask;
>> int r, i, nr_cpus, cpu;
>> + int max_cpu_idx;
>>
>> CPU_ZERO(&allowed_mask);
>>
>> - nr_cpus = CPU_COUNT(&possible_mask);
>> + max_cpu_idx = get_max_cpu_idx();
>> + TEST_ASSERT(max_cpu_idx >= 0, "Invalid possible_mask");
>
> I feel like this should be a KSFT_SKIP condition, not an assert.
>
>> + nr_cpus = max_cpu_idx + 1;
>>
>> for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
>> cpu = i % nr_cpus;
>
> This is still flawed, e.g. if the max CPU is 1023, but the task is pinned to _just_
> CPU 1023, then the assert at the end will likely still fail because the migration
> helper is effectively only running 1/1024 loops.
>
> It probably also makes sense to grab the min CPU to reduce the pain if the task
> is affined to a small subset.
>
> As an aside, _which_ CPUs the task is affined to seems to matter, e.g. some
> combinations of CPUs on my system don't fail, even with 100x iterations. Don't
> think there's anything the test can do about that, just an interesting data point
> that suggests pinning while running tests may be a bad idea.

I agree sometimes the failure is expected and reasonable. E.g., I am able to
randomly reproduce the failure if I reduce NR_TASK_MIGRATIONS to 1000.

>
> Anyways, something like this?
>
> ---
> tools/testing/selftests/kvm/rseq_test.c | 44 ++++++++++++++++++++-----
> 1 file changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
> index 060538bd405a..befd64c27152 100644
> --- a/tools/testing/selftests/kvm/rseq_test.c
> +++ b/tools/testing/selftests/kvm/rseq_test.c
> @@ -10,6 +10,7 @@
> #include <signal.h>
> #include <syscall.h>
> #include <sys/ioctl.h>
> +#include <sys/sysinfo.h>
> #include <asm/barrier.h>
> #include <linux/atomic.h>
> #include <linux/rseq.h>
> @@ -39,6 +40,7 @@ static __thread volatile struct rseq __rseq = {
>
> static pthread_t migration_thread;
> static cpu_set_t possible_mask;
> +static int min_cpu, max_cpu;
> static bool done;
>
> static atomic_t seq_cnt;
> @@ -60,16 +62,17 @@ static void sys_rseq(int flags)
> static void *migration_worker(void *ign)
> {
> cpu_set_t allowed_mask;
> - int r, i, nr_cpus, cpu;
> + int r, i, cpu;
>
> CPU_ZERO(&allowed_mask);
>
> - nr_cpus = CPU_COUNT(&possible_mask);
> -
> - for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
> - cpu = i % nr_cpus;
> - if (!CPU_ISSET(cpu, &possible_mask))
> - continue;
> + for (i = 0, cpu = -1; i < NR_TASK_MIGRATIONS; i++) {
> + do {
> + if (cpu < min_cpu || cpu > max_cpu)
> + cpu = min_cpu;
> + else
> + cpu++;
> + } while (!CPU_ISSET(cpu, &possible_mask));
>
> CPU_SET(cpu, &allowed_mask);

I see. The idea is to search within between min_cpu and max_cpu for each
NR_TASK_MIGRATION iteration to find the next cpu.

Perhaps a linked list is more suitable to here (when there are 1024 cpus and the
task is bound to both 1 and 1022) ... to pre-save the possible cpus in a list
and to only move to next cpu in the list for each iteration.

However, I think min_cpu/max_cpu is good enough for selttests case.

Would you please let me know if you would like to send above with my
Reported-by, or if you would like me to send with your Suggested-by.

Thank you very much!

Dongli Zhang