Re: [PATCH] Fix: membarrier: racy access to p->mm in membarrier_global_expedited()
From: Mathieu Desnoyers
Date: Mon Jan 28 2019 - 17:46:21 EST
----- On Jan 28, 2019, at 5:39 PM, paulmck paulmck@xxxxxxxxxxxxx wrote:
> On Mon, Jan 28, 2019 at 05:07:07PM -0500, Mathieu Desnoyers wrote:
>> Jann Horn identified a racy access to p->mm in the global expedited
>> command of the membarrier system call.
>>
>> The suggested fix is to hold the task_lock() around the accesses to
>> p->mm and to the mm_struct membarrier_state field to guarantee the
>> existence of the mm_struct.
>>
>> Link:
>> https://lore.kernel.org/lkml/CAG48ez2G8ctF8dHS42TF37pThfr3y0RNOOYTmxvACm4u8Yu3cw@xxxxxxxxxxxxxx
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
>> Tested-by: Jann Horn <jannh@xxxxxxxxxx>
>> CC: Jann Horn <jannh@xxxxxxxxxx>
>> CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> CC: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>> CC: Ingo Molnar <mingo@xxxxxxxxxx>
>> CC: Andrea Parri <parri.andrea@xxxxxxxxx>
>> CC: Andy Lutomirski <luto@xxxxxxxxxx>
>> CC: Avi Kivity <avi@xxxxxxxxxxxx>
>> CC: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
>> CC: Boqun Feng <boqun.feng@xxxxxxxxx>
>> CC: Dave Watson <davejwatson@xxxxxx>
>> CC: David Sehr <sehr@xxxxxxxxxx>
>> CC: H. Peter Anvin <hpa@xxxxxxxxx>
>> CC: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> CC: Maged Michael <maged.michael@xxxxxxxxx>
>> CC: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
>> CC: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>> CC: Paul Mackerras <paulus@xxxxxxxxx>
>> CC: Russell King <linux@xxxxxxxxxxxxxxx>
>> CC: Will Deacon <will.deacon@xxxxxxx>
>> CC: stable@xxxxxxxxxxxxxxx # v4.16+
>> CC: linux-api@xxxxxxxxxxxxxxx
>> ---
>> kernel/sched/membarrier.c | 27 +++++++++++++++++++++------
>> 1 file changed, 21 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
>> index 76e0eaf4654e..305fdcc4c5f7 100644
>> --- a/kernel/sched/membarrier.c
>> +++ b/kernel/sched/membarrier.c
>> @@ -81,12 +81,27 @@ static int membarrier_global_expedited(void)
>>
>> rcu_read_lock();
>> p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>> - if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
>> - MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
>> - if (!fallback)
>> - __cpumask_set_cpu(cpu, tmpmask);
>> - else
>> - smp_call_function_single(cpu, ipi_mb, NULL, 1);
>> + /*
>> + * Skip this CPU if the runqueue's current task is NULL or if
>> + * it is a kernel thread.
>> + */
>> + if (p && READ_ONCE(p->mm)) {
>> + bool mm_match;
>> +
>> + /*
>> + * Read p->mm and access membarrier_state while holding
>> + * the task lock to ensure existence of mm.
>> + */
>> + task_lock(p);
>> + mm_match = p->mm && (atomic_read(&p->mm->membarrier_state) &
>
> Are we guaranteed that this p->mm will be the same as the one loaded via
> READ_ONCE() above? Either way, wouldn't it be better to READ_ONCE() it a
> single time and use the same value everywhere?
The first "READ_ONCE()" above is _outside_ of the task_lock() critical section.
Those two accesses _can_ load two different pointers, and this is why we
need to re-read the p->mm pointer within the task_lock() critical section to
ensure existence of the mm_struct that we use.
If we move the READ_ONCE() into the task_lock(), we need to uselessly
take a lock before we can skip kernel threads.
If we lead the READ_ONCE() outside the task_lock(), then p->mm can be updated
between the READ_ONCE() and reference to the mm_struct content within the
task_lock(), which is racy and does not guarantee its existence.
Or am I missing your point ?
Thanks,
Mathieu
>
> Thanx, Paul
>
>> + MEMBARRIER_STATE_GLOBAL_EXPEDITED);
>> + task_unlock(p);
>> + if (mm_match) {
>> + if (!fallback)
>> + __cpumask_set_cpu(cpu, tmpmask);
>> + else
>> + smp_call_function_single(cpu, ipi_mb, NULL, 1);
>> + }
>> }
>> rcu_read_unlock();
>> }
>> --
>> 2.17.1
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com