Re: [PATCH] locking/osq_lock: fix a data race in osq_wait_next

From: Qian Cai
Date: Thu Jan 30 2020 - 22:33:18 EST




> On Jan 30, 2020, at 8:48 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Jan 30, 2020 at 02:39:38PM +0100, Marco Elver wrote:
>> On Wed, 29 Jan 2020 at 19:40, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>>> It's probably not terrible to put a READ_ONCE() there; we just need to
>>> make sure the compiler doesn't do something stupid (it is known to do
>>> stupid when 'volatile' is present).
>>
>> Maybe we need to optimize READ_ONCE().
>
> I think recent compilers have gotten better at volatile. In part because
> of our complaints.
>
>> 'if (data_race(..))' would also work here and has no cost.
>
> Right, that might be the best option.
>

OK, Iâll send a patch for that.

BTW, I have another one to report. Canât see how the load tearing would
cause any real issue.

[ 519.240629] BUG: KCSAN: data-race in osq_lock / osq_unlock

[ 519.249088] write (marked) to 0xffff8bb2f133be40 of 8 bytes by task 421 on cpu 38:
[ 519.257427] osq_unlock+0xa8/0x170 kernel/locking/osq_lock.c:219
[ 519.261571] __mutex_lock+0x4b3/0xd20
[ 519.265972] mutex_lock_nested+0x31/0x40
[ 519.270639] memcg_create_kmem_cache+0x2e/0x190
[ 519.275922] memcg_kmem_cache_create_func+0x40/0x80
[ 519.281553] process_one_work+0x54c/0xbe0
[ 519.286308] worker_thread+0x80/0x650
[ 519.290715] kthread+0x1e0/0x200
[ 519.294690] ret_from_fork+0x27/0x50


void osq_unlock(struct optimistic_spin_queue *lock)
{
struct optimistic_spin_node *node, *next;
int curr = encode_cpu(smp_processor_id());

/*
* Fast path for the uncontended case.
*/
if (likely(atomic_cmpxchg_release(&lock->tail, curr,
OSQ_UNLOCKED_VAL) == curr))
return;

/*
* Second most likely case.
*/
node = this_cpu_ptr(&osq_node);
next = xchg(&node->next, NULL); <--------------------------
if (next) {
WRITE_ONCE(next->locked, 1);
return;
}

next = osq_wait_next(lock, node, NULL);
if (next)
WRITE_ONCE(next->locked, 1);
}


[ 519.301232] read to 0xffff8bb2f133be40 of 8 bytes by task 196 on cpu 12:
[ 519.308705] osq_lock+0x1e2/0x340 kernel/locking/osq_lock.c:157
[ 519.312762] __mutex_lock+0x277/0xd20
[ 519.317167] mutex_lock_nested+0x31/0x40
[ 519.321838] memcg_create_kmem_cache+0x2e/0x190
[ 519.327120] memcg_kmem_cache_create_func+0x40/0x80
[ 519.332751] process_one_work+0x54c/0xbe0
[ 519.337508] worker_thread+0x80/0x650
[ 519.341922] kthread+0x1e0/0x200
[ 519.345889] ret_from_fork+0x27/0x50


for (;;) {
if (prev->next == node && <------------------------
cmpxchg(&prev->next, node, NULL) == node)
break;

/*
* We can only fail the cmpxchg() racing against an unlock(),
* in which case we should observe @node->locked becomming
* true.
*/
if (smp_load_acquire(&node->locked))
return true;

cpu_relax();

/*
* Or we race against a concurrent unqueue()'s step-B, in which
* case its step-C will write us a new @node->prev pointer.
*/
prev = READ_ONCE(node->prev);
}


[ 519.352420] Reported by Kernel Concurrency Sanitizer on:
[ 519.358492] CPU: 12 PID: 196 Comm: kworker/12:1 Tainted: G W L 5.5.0-next-20200130+ #3
[ 519.368317] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[ 519.377627] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func