Re: [tip:sched/core] sched: Fix race in task_group()

From: Stefan Bader
Date: Thu Oct 18 2012 - 06:23:37 EST


On 18.10.2012 10:27, cwillu wrote:
> On Tue, Jul 24, 2012 at 8:21 AM, tip-bot for Peter Zijlstra
> <peterz@xxxxxxxxxxxxx> wrote:
>> Commit-ID: 8323f26ce3425460769605a6aece7a174edaa7d1
>> Gitweb: http://git.kernel.org/tip/8323f26ce3425460769605a6aece7a174edaa7d1
>> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> AuthorDate: Fri, 22 Jun 2012 13:36:05 +0200
>> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
>> CommitDate: Tue, 24 Jul 2012 13:58:20 +0200
>>
>> sched: Fix race in task_group()
>>
>> Stefan reported a crash on a kernel before a3e5d1091c1 ("sched:
>> Don't call task_group() too many times in set_task_rq()"), he
>> found the reason to be that the multiple task_group()
>> invocations in set_task_rq() returned different values.
>>
>> Looking at all that I found a lack of serialization and plain
>> wrong comments.
>>
>> The below tries to fix it using an extra pointer which is
>> updated under the appropriate scheduler locks. Its not pretty,
>> but I can't really see another way given how all the cgroup
>> stuff works.
>>
>> Reported-and-tested-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>> Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
>> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
>
> I just finished bisecting a crash on boot to this commit; booting with
> "noautogroup" brings it back.
>
> 3.5.4 is the latest -stable that still boots, and none of the 3.6 rc's
> boot at all.
>
> Photo of the bug (3.6.0next is 3.6 + btrfs's for-linus):
> https://lh5.googleusercontent.com/-0DY-YYhgvzs/UHdB-BQdzMI/AAAAAAAAAEg/QhY9rgxnv98/s811/2012-10-11
>

On a very quick glance I wonder whether there might be a case where sched_fork
goes into set_task_cpu with a different cpu than the current but has not yet
task_group.sched_task_group set to something valid...


Attachment: signature.asc
Description: OpenPGP digital signature