Thanks for cleaning this up.
July 16, 2022 7:45 AM, "Gavin Shan" <gshan@xxxxxxxxxx> wrote:
In rseq_test, there are two threads, which are thread group leader
and migration worker. The migration worker relies on sched_setaffinity()
to force migration on the thread group leader.
It may be clearer to describe it as a vCPU thread and a migration worker
thread. The meat of this test is to catch a regression in KVM.
Unfortunately, we have
s/we have/the test has the/
wrong parameter (0) passed to sched_getaffinity().
wrong PID
It's actually
forcing migration on the migration worker instead of the thread group
leader.
What's missing is _why_ the migration worker is getting moved around by
the call. Perhaps instead it is better to state what a PID of 0 implies,
for those of us who haven't read their manpages in a while ;-)
It also means migration can happen on the thread group leader
at any time, which eventually leads to failure as the following logs
show.
host# uname -r
5.19.0-rc6-gavin+
host# # cat /proc/cpuinfo | grep processor | tail -n 1
processor : 223
host# pwd
/home/gavin/sandbox/linux.main/tools/testing/selftests/kvm
host# for i in `seq 1 100`; \
do echo "--------> $i"; ./rseq_test; done
--------> 1
--------> 2
--------> 3
--------> 4
--------> 5
--------> 6
==== Test Assertion Failure ====
rseq_test.c:265: rseq_cpu == cpu
pid=3925 tid=3925 errno=4 - Interrupted system call
1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2)
2 0x0000ffffb044affb: ?? ??:0
3 0x0000ffffb044b0c7: ?? ??:0
4 0x0000000000401a6f: _start at ??:?
rseq CPU = 4, sched CPU = 27
This fixes the issue by passing correct parameter, tid of the group
thread leader, to sched_setaffinity().
Kernel commit messages should have an imperative tone:
Fix the issue by ...
Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
Signed-off-by: Gavin Shan <gshan@xxxxxxxxxx>
With the comments on the commit message addressed:
Reviewed-by: Oliver Upton <oliver.upton@xxxxxxxxx>