I'm currently debugging MP emulation under the user-mode port. I have what I
believe is a legitimate race, but one which is unlikely on real hardware. So,
what I'd like to know is, should I report them (or fix them) and expect that
the report to be taken seriously (or the patch accepted), or should I just
work around them in my arch code?
Before I describe the race, I like to check something which I thought was
subtle, and which I may have gotten wrong. This snippet from schedule()
confused me for a while:
switch_to(prev, next, prev);
__schedule_tail(prev);
After staring at the i386 code for switch_to, I concluded that switch_to is
supposed to not return in this context until the process has been rescheduled,
and when it does, prev has been changed to the task which caused the
rescheduling. If this is true, then the race is as follows:
"processor 1"
process "a" exits and calls schedule
"processor 2"
process "a"s parent waits on "a"
"a" is disposed of
"processor 1"
process "b" is scheduled, returning into schedule with prev set to "a"
__schedule_tail dereferences prev, getting bogus values since "a" has
been completely freed.
This is emulating two processors on a single physical processor, hence the
single timeline for the two.
This isn't likely on hardware because the lag between the process "a" schedule
and the process "b" calling __schedule_tail is too short to allow "a"s parent
on another processor to get scheduled and free up "a"s memory.
I'd appreciate any comments anyone has on this.
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sun Jan 23 2000 - 21:00:14 EST