Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access

From: Subhra Mazumdar
Date: Tue Mar 19 2019 - 22:32:18 EST

Next message: Pkshih: "Re: [PATCH] rtl8723ae: Make rtl8723e_dm_refresh_rate_adaptive_mask static"
Previous message: kbuild test robot: "(.init.text+0x134): multiple definition of `plat_irq_setup'"
In reply to: Julien Desfossez: "Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access"
Next in thread: Julien Desfossez: "Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 3/18/19 8:41 AM, Julien Desfossez wrote:

The case where we try to acquire the lock on 2 runqueues belonging to 2
different cores requires the rq_lockp wrapper as well otherwise we
frequently deadlock in there.

This fixes the crash reported in
1552577311-8218-1-git-send-email-jdesfossez@xxxxxxxxxxxxxxxx

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 76fee56..71bb71f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2078,7 +2078,7 @@ static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
raw_spin_lock(rq_lockp(rq1));
__acquire(rq2->lock); /* Fake it out ;) */
} else {
- if (rq1 < rq2) {
+ if (rq_lockp(rq1) < rq_lockp(rq2)) {
raw_spin_lock(rq_lockp(rq1));
raw_spin_lock_nested(rq_lockp(rq2), SINGLE_DEPTH_NESTING);
} else {

With this fix and my previous NULL pointer fix my stress tests are surviving. I
re-ran my 2 DB instance setup on 44 core 2 socket system by putting each DB
instance in separate core scheduling group. The numbers look much worse now.

usersÂ baselineÂ %stdevÂ %idleÂ core_schedÂ %stdev %idle
16ÂÂÂÂ 1ÂÂÂÂÂÂÂÂ 0.3ÂÂÂÂ 66ÂÂÂÂ -73.4%ÂÂÂÂÂ 136.8 82
24ÂÂÂÂ 1ÂÂÂÂÂÂÂÂ 1.6ÂÂÂÂ 54ÂÂÂÂ -95.8%ÂÂÂÂÂ 133.2 81
32ÂÂÂÂ 1ÂÂÂÂÂÂÂÂ 1.5ÂÂÂÂ 42ÂÂÂÂ -97.5%ÂÂÂÂÂ 124.3 89

I also notice that if I enable a bunch of debug configs related to mutexes, spin
locks, lockdep etc. (which I did earlier to debug the dead lock), it opens up a
can of worms with multiple crashes.

Next message: Pkshih: "Re: [PATCH] rtl8723ae: Make rtl8723e_dm_refresh_rate_adaptive_mask static"
Previous message: kbuild test robot: "(.init.text+0x134): multiple definition of `plat_irq_setup'"
In reply to: Julien Desfossez: "Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access"
Next in thread: Julien Desfossez: "Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]