On 12/23/24 2:47 AM, Bibo Mao wrote:Yes, you are right. After communicating with HW guys, WRITE_ONCE will not surpass over READ_ONCE, WRITE_ONCE can only surpass WRITE_ONCE.
We ported pv spinlock to old linux kernel on LoongArch platform, there isWhat old kernel are you using? Race condition like that should be hard to reproduce. Do you hit this bug only once?
error with some stress tests. The error report is something like this forThe pv_hash_entry structure is supposed to be cacheline aligned. The use of READ_ONCE/WRITE_ONCE will ensure that compiler won't rearrange the ordering. If the CPU can rearrange read/write ordering on the same cacheline like that, it may be some advance optimization technique that I don't quite understand. Can you check if the buffer returned by alloc_large_system_hash() is really properly aligned?
short:
kernel BUG at kernel/locking/qspinlock_paravirt.h:261!
Oops - BUG[#1]:
CPU: 1 PID: 6613 Comm: pidof Not tainted 4.19.190+ #43
Hardware name: Loongson KVM, BIOS 0.0.0 02/06/2015
ra: 9000000000509cfc do_task_stat+0x29c/0xaf0
ERA: 9000000000291308 __pv_queued_spin_unlock_slowpath+0xf8/0x100
CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD: 00000000 (PPLV0 -PIE -PWE)
...
Call Trace:
[<9000000000291308>] __pv_queued_spin_unlock_slowpath+0xf8/0x100
[<9000000000509cf8>] do_task_stat+0x298/0xaf0
[<9000000000502570>] proc_single_show+0x60/0xe0
The problem is that memory accessing is out of order on LoongArch
platform, there is contension between pv_unhash() and pv_hash().
CPU0 pv_unhash: CPU1 pv_hash:
for_each_hash_entry(he, offset, hash) { for_each_hash_entry(he, offset, hash) {
if (READ_ONCE(he->lock) == lock) { struct qspinlock *old = NULL;
node = READ_ONCE(he->node);
WRITE_ONCE(he->lock, NULL);
On LoongArch platform which is out of order, the execution order may be
switched like this:
WRITE_ONCE(he->lock, NULL);if (try_cmpxchg(&he->lock, &old, lock)) {
WRITE_ONCE(he->node, node);
return &he->lock;
CPU1 pv_hash() is executing and watch that lock is set with NULL. Write
he->node with node of new lock.
node = READ_ONCE(he->node);READ_ONCE(he->node) on CPU0 will return node of new lock rather than itself.
This change is abuse usage about try_cmpxchg(), the origin code maybe has no problem, I am testing again.
Here READ_ONCE/WRITE_ONCE is replaced with try_cmpxchg().
Signed-off-by: Bibo Mao <maobibo@xxxxxxxxxxx>
---
kernel/locking/qspinlock_paravirt.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index dc1cb90e3644..cc4eabce092d 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -240,9 +240,10 @@ static struct pv_node *pv_unhash(struct qspinlock *lock)
struct pv_node *node;
for_each_hash_entry(he, offset, hash) {
- if (READ_ONCE(he->lock) == lock) {
+ struct qspinlock *old = lock;
+
+ if (try_cmpxchg(&he->lock, &old, NULL)) {
node = READ_ONCE(he->node);
- WRITE_ONCE(he->lock, NULL);
return node;
}
}
Anyway, this change isn't quite right as suggested by Akira.
Cheers,
Longman
base-commit: 48f506ad0b683d3e7e794efa60c5785c4fdc86fa