On 7/17/22 7:48 AM, oliver.upton@xxxxxxxxx wrote:
July 16, 2022 7:45 AM, "Gavin Shan" <gshan@xxxxxxxxxx> wrote:
In rseq_test, there are two threads, which are thread group leader
and migration worker. The migration worker relies on sched_setaffinity()
to force migration on the thread group leader.
It may be clearer to describe it as a vCPU thread and a migration worker
thread. The meat of this test is to catch a regression in KVM.
Unfortunately, we have
s/we have/the test has the/
wrong parameter (0) passed to sched_getaffinity().
wrong PID
Yep, it's much clearer to describe it as vCPU thread and migration worker.
It's actually
forcing migration on the migration worker instead of the thread group
leader.
What's missing is _why_ the migration worker is getting moved around by
the call. Perhaps instead it is better to state what a PID of 0 implies,
for those of us who haven't read their manpages in a while ;-)
Yes, it's good idea. I will have something like below in next revision :)
In rseq_test, there are two threads, which are vCPU thread and migration
worker separately. Unfortunately, the test has the wrong PID passed to
sched_setaffinity() in the migration worker. It forces migration on the
migration worker because zeroed PID represents the calling thread, which
is the migration worker itself. It means the vCPU thread is never enforced
to migration and it can migrate at any time, which eventually leads to
failure as the following logs show.
:
:
Fix the issue by passing correct parameter, TID of the vCPU thread, to
sched_setaffinity() in the migration worker.
It also means migration can happen on the thread group leader
at any time, which eventually leads to failure as the following logs
show.
host# uname -r
5.19.0-rc6-gavin+
host# # cat /proc/cpuinfo | grep processor | tail -n 1
processor : 223
host# pwd
/home/gavin/sandbox/linux.main/tools/testing/selftests/kvm
host# for i in `seq 1 100`; \
do echo "--------> $i"; ./rseq_test; done
--------> 1
--------> 2
--------> 3
--------> 4
--------> 5
--------> 6
==== Test Assertion Failure ====
rseq_test.c:265: rseq_cpu == cpu
pid=3925 tid=3925 errno=4 - Interrupted system call
1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2)
2 0x0000ffffb044affb: ?? ??:0
3 0x0000ffffb044b0c7: ?? ??:0
4 0x0000000000401a6f: _start at ??:?
rseq CPU = 4, sched CPU = 27
This fixes the issue by passing correct parameter, tid of the group
thread leader, to sched_setaffinity().
Kernel commit messages should have an imperative tone:
Fix the issue by ...
Ok. I've been having my style for long time. Actually, the style was
shared by some one when I worked for IBM long time ago. I will bear
it in mind to use imperative expression since now on :)
All your comments will be fixed in next revision, but I would delay
the posting a bit to see Sean or Paolo have more comments. In that
case, I can fix all of them at once.
Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
Signed-off-by: Gavin Shan <gshan@xxxxxxxxxx>
With the comments on the commit message addressed:
Reviewed-by: Oliver Upton <oliver.upton@xxxxxxxxx>