Re: [PATCH] locking/osq_lock: fix a data race in osq_wait_next

From: Qian Cai
Date: Mon Jan 27 2020 - 22:13:45 EST




> On Jan 23, 2020, at 4:36 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Wed, Jan 22, 2020 at 11:38:51PM +0100, Marco Elver wrote:
>
>> If possible, decode and get the line numbers. I have observed a data
>> race in osq_lock before, however, this is the only one I have recently
>> seen in osq_lock:
>>
>> read to 0xffff88812c12d3d4 of 4 bytes by task 23304 on cpu 0:
>> osq_lock+0x170/0x2f0 kernel/locking/osq_lock.c:143
>>
>> while (!READ_ONCE(node->locked)) {
>> /*
>> * If we need to reschedule bail... so we can block.
>> * Use vcpu_is_preempted() to avoid waiting for a preempted
>> * lock holder:
>> */
>> --> if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
>> goto unqueue;
>>
>> cpu_relax();
>> }
>>
>> where
>>
>> static inline int node_cpu(struct optimistic_spin_node *node)
>> {
>> --> return node->cpu - 1;
>> }
>>
>>
>> write to 0xffff88812c12d3d4 of 4 bytes by task 23334 on cpu 1:
>> osq_lock+0x89/0x2f0 kernel/locking/osq_lock.c:99
>>
>> bool osq_lock(struct optimistic_spin_queue *lock)
>> {
>> struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
>> struct optimistic_spin_node *prev, *next;
>> int curr = encode_cpu(smp_processor_id());
>> int old;
>>
>> node->locked = 0;
>> node->next = NULL;
>> --> node->cpu = curr;
>>
>
> Yeah, that's impossible. This store happens before the node is
> published, so no matter how the load in node_cpu() is shattered, it must
> observe the right value.

Marco, any thought on how to do something about this? The worry is that
too many false positives like this will render the tool usefulness as a
general debug option.