Re: [RFC][PATCH] sched: Fix race in task_group()

From: Stefan Bader
Date: Wed Jun 27 2012 - 08:52:13 EST


On 27.06.2012 14:40, Hillf Danton wrote:
> The patch went three versions, the first,
>
> On Fri, Jun 22, 2012 at 7:36 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> Reported-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>> ---
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 32157b9..77437d4 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1246,6 +1246,9 @@ struct task_struct {
>> const struct sched_class *sched_class;
>> struct sched_entity se;
>> struct sched_rt_entity rt;
>> +#ifdef CONFIG_CGROUP_SCHED
>> + struct task_struct *sched_task_group;
>> +#endif
>>
>
> The second,
>
>>> On 26.06.2012 15:48, Peter Zijlstra wrote:
>>> Here's one that's actually compile tested (with the right CONFIG_foo
>>> enabled) and I fixed the autogroup lockdep splat.
>>>
>>> ---
>>> Subject: sched: Fix race in task_group()
>>> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>>> Date: Fri, 22 Jun 2012 13:36:05 +0200
>>>
>>> Reported-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
>>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>>> ---
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -1246,6 +1246,9 @@ struct task_struct {
>>> const struct sched_class *sched_class;
>>> struct sched_entity se;
>>> struct sched_rt_entity rt;
>>> +#ifdef CONFIG_CGROUP_SCHED
>>> + struct task_group *sched_task_group;
>>> +#endif
>>>
>
> And the third, https://lkml.org/lkml/2012/6/26/331
>
> From d751ab1f1e532f32412d99b71a1bfea3e5282d07 Mon Sep 17 00:00:00 2001
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Date: Fri, 22 Jun 2012 13:36:00 +0200
> Subject: [PATCH] sched: Fix race in task_group()
>
> Stefan reported a crash on a kernel before a3e5d1091c1 ("sched: Don't
> call task_group() too many times in set_task_rq()"), he found the reason
> to be that the multiple task_group() invocations in set_task_rq()
> returned different values.
>
> Looking at all that I found a lack of serialization and plain wrong
> comments.
>
> The below tries to fix it using an extra pointer which is updated under
> the appropriate scheduler locks. Its not pretty, but I can't really see
> another way given how all the cgroup stuff works.
>
> Reported-and-tested-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> [backported to apply to 3.0 and 3.2]
> Signed-off-by: Stefan Bader <stefan.bader@xxxxxxxxxxxxx>
> ---
> include/linux/init_task.h | 12 +++++++++++-
> include/linux/sched.h | 5 ++++-
> kernel/sched.c | 32 ++++++++++++++++++--------------
> 3 files changed, 33 insertions(+), 16 deletions(-)
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 56de5c1..1fd9884 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1242,6 +1242,9 @@ struct task_struct {
> const struct sched_class *sched_class;
> struct sched_entity se;
> struct sched_rt_entity rt;
> +#ifdef CONFIG_CGROUP_SCHED
> + struct task_struct *sched_task_group;
> +#endif
>
> where sched_task_group was defined to be task_struct twice(in the first
> and the third versions) and to be task_group once.
>
> Before backport, feel free to respin with the final define determined.
>
The second version is correct. I just messed up updating my backport, failing to
notice that change (and trying to be clever and not going trhough re-applying
and failure again).

-Stefan


Attachment: signature.asc
Description: OpenPGP digital signature