Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue

From: Ramon Fried
Date: Thu Jun 04 2020 - 12:04:29 EST


On Thu, Apr 23, 2020 at 11:55 PM <zanussi@xxxxxxxxxx> wrote:
>
> From: Zhang Xiao <xiao.zhang@xxxxxxxxxxxxx>
>
> v4.19.115-rt49-rc1 stable review patch.
> If anyone has any objections, please let me know.
>
> -----------
>
>
> The kernel bugzilla has the following race condition reported:
>
> CPU0 CPU1 CPU2
> ------------------------------------------------
> test_set SCHED
> test_set RUN
> if SCHED
> add_list
> raise
> clear RUN
> <softirq>
> test_set RUN
> test_clear SCHED
> ->func
> test_set SCHED
> tasklet_try_unlock ->0
> test_clear SCHED
> test_set SCHED
> ->func
> tasklet_try_unlock ->1
> test_set RUN
> if SCHED
> add list
> raise
> clear RUN
> test_set RUN
> if SCHED
> add list
> raise
> clear RUN
>
> As a result the tasklet is enqueued on both CPUs and run on both CPUs. Due
> to the nature of the list used here, it is possible that further
> (different) tasklets, which are enqueued after this double-enqueued
> tasklet, are scheduled on CPU2 but invoked on CPU1. It is also possible
> that these tasklets won't be invoked at all, because during the second
> enqueue process the t->next pointer is set to NULL - dropping everything
> from the list.
>
> This race will trigger one or two of the WARN_ON() in
> tasklet_action_common().
> The problem is that the tasklet may be invoked multiple times and clear
> SCHED bit on each invocation. This makes it possible to enqueue the
> very same tasklet on different CPUs.
>
> Current RT-devel is using the upstream implementation which does not
> re-run tasklets if they have SCHED set again and so it does not clear
> the SCHED bit multiple times on a single invocation.
>
> Introduce the CHAINED flag. The tasklet will only be enqueued if the
> CHAINED flag has been set successfully.
> If it is possible to exchange the flags (CHAINED | RUN) -> 0 then the
> tasklet won't be re-run. Otherwise the possible SCHED flag is removed
> and the tasklet is re-run again.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=61451
> Not-signed-off-by: Zhang Xiao <xiao.zhang@xxxxxxxxxxxxx>
> [bigeasy: patch description]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
>
> Signed-off-by: Tom Zanussi <zanussi@xxxxxxxxxx>
> ---
> include/linux/interrupt.h | 5 ++++-
> kernel/softirq.c | 6 +++++-
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 97d9ba26915e..a3b5edb26bc5 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -579,12 +579,15 @@ enum
> {
> TASKLET_STATE_SCHED, /* Tasklet is scheduled for execution */
> TASKLET_STATE_RUN, /* Tasklet is running (SMP only) */
> - TASKLET_STATE_PENDING /* Tasklet is pending */
> + TASKLET_STATE_PENDING, /* Tasklet is pending */
> + TASKLET_STATE_CHAINED /* Tasklet is chained */
> };
>
> #define TASKLET_STATEF_SCHED (1 << TASKLET_STATE_SCHED)
> #define TASKLET_STATEF_RUN (1 << TASKLET_STATE_RUN)
> #define TASKLET_STATEF_PENDING (1 << TASKLET_STATE_PENDING)
> +#define TASKLET_STATEF_CHAINED (1 << TASKLET_STATE_CHAINED)
> +#define TASKLET_STATEF_RC (TASKLET_STATEF_RUN | TASKLET_STATEF_CHAINED)
>
> #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
> static inline int tasklet_trylock(struct tasklet_struct *t)
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 25bcf2f2714b..73dae64bfc9c 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -947,6 +947,10 @@ static void __tasklet_schedule_common(struct tasklet_struct *t,
> * is locked before adding it to the list.
> */
> if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
> + if (test_and_set_bit(TASKLET_STATE_CHAINED, &t->state)) {
> + tasklet_unlock(t);
> + return;
> + }
> t->next = NULL;
> *head->tail = t;
> head->tail = &(t->next);
> @@ -1040,7 +1044,7 @@ static void tasklet_action_common(struct softirq_action *a,
> again:
> t->func(t->data);
>
> - while (!tasklet_tryunlock(t)) {
> + while (cmpxchg(&t->state, TASKLET_STATEF_RC, 0) != TASKLET_STATEF_RC) {
> /*
> * If it got disabled meanwhile, bail out:
> */
> --
> 2.17.1
>

Hi, This patch introduced a regression in our kernel
(v4.19.124-rt53-rebase), It occurs when we're jumping to crush kernel
using kexec, in the initialization of the emmc driver.
I'm still debugging the root cause, but I thought of mentioning this
in the mailing list if you have any idea why this could occur.
The issue doesn't happen on normal boot, only when I specifically
crash the kernel into the crash kernel.
Thanks,
Ramon.

[ 0.546142] macb 2b00000.eth1 eth1: Cadence GEM rev 0x00070200 at
0x02b00000 irq 16 (00:28:f8:b7:a0:40)
[ 0.548041] sdhci: Secure Digital Host Controller Interface driver
[ 0.548042] sdhci: Copyright(c) Pierre Ossman
[ 0.548044] sdhci-pltfm: SDHCI platform and OF driver helper
[ 0.580588] mmc0: SDHCI controller on 2200000.sdhci [2200000.sdhci]
using ADMA 64-bit
[ 0.605753] hm, tasklet state: 00000008
[ 0.605757] ------------[ cut here ]------------
[ 0.605763] WARNING: CPU: 0 PID: 1 at ../kernel/softirq.c:1061
0xa80000080282649c
[ 0.605771] CPU: 0 PID: 1 Comm: swapper Not tainted
4.19.124.eyeq5_ssnt-local-rt53-07819-g3519e8cf4110 #4
[ 0.605774] Stack : 0000000000000000 0000000000000018
a80000080340bd40 0000000002d40000
[ 0.605780] a80000080340be70 0000000000000000
0000000000000000 000000000000007e
[ 0.605785] 0000000000000000 0000000000000001
0000000000000001 0000000002d30000
[ 0.605790] 0000000000000000 a800000802b2a688
0000000002d30000 0000000002d30000
[ 0.605795] 0000000000000000 0000000000000000
a80000080282649c a800000802b6cf28
[ 0.605799] 0000000000000009 0000000000000425
0000000000000000 0000000000000007
[ 0.605804] 0000000002d30000 0000000002d30000
0000000000000000 a800000802d20000
[ 0.605809] a800000803430000 a80000080340bd40
a800000802d284c0 a80000080280a6c4
[ 0.605813] 0000000000000000 0000000000000000
0000000000000000 0000000000000000
[ 0.605818] 0000000000000000 a80000080280a6c8
0000000000000000 a800000802823b0c
[ 0.605823] ...
[ 0.605825] Call Trace:
[ 0.605830] [<a800000802b2a688>] 0xa800000802b2a688
[ 0.605833] [<a80000080282649c>] 0xa80000080282649c
[ 0.605835] [<a80000080280a6c4>] 0xa80000080280a6c4
[ 0.605838] [<a80000080280a6c8>] 0xa80000080280a6c8
[ 0.605840] [<a800000802823b0c>] 0xa800000802823b0c
[ 0.605842] [<a80000080282649c>] 0xa80000080282649c
[ 0.605845] [<a800000802b30394>] 0xa800000802b30394
[ 0.605847] [<a8000008029cd2ec>] 0xa8000008029cd2ec
[ 0.605850] [<a800000802805490>] 0xa800000802805490
[ 0.605851]
[ 0.605854] ---[ end trace 293afee709e91e0b ]---
[ 0.607548] stn8500_serial 800000.uart: detected port #0
[ 0.607576] stn8500_serial 800000.uart: uartclk 125000000
[ 0.607605] 800000.uart: ttyST0 at MMIO 0x800000 (irq = 46,
base_baud = 7812500) is a stn8500
[ 0.610303] stn8500_serial 900000.uart: detected port #1
[ 0.610324] stn8500_serial 900000.uart: uartclk 125000000
[ 0.611564] 900000.uart: ttyST1 at MMIO 0x900000 (irq = 46,
base_baud = 7812500) is a stn8500
[ 0.614645] stn8500_serial a00000.uart: detected port #2
[ 0.614891] stn8500_serial a00000.uart: uartclk 125000000
[ 0.614919] a00000.uart: ttyST2 at MMIO 0xa00000 (irq = 46,
base_baud = 7812500) is a stn8500
[ 0.614930] stn8500_serial a00000.uart: console setup on port #2 :
uartclk 125000000 115207-n-8-n
[ 0.615943] console [ttyST2] enabled
[ 0.615945] bootconsole [stn8500] disabled
[ 0.620959] Freeing unused kernel memory: 1324K
[ 0.620965] This architecture does not have kernel memory protection.
[ 0.620967] Run /init as init process
[ 0.710641] random: dd: uninitialized urandom read (512 bytes read)