Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue

From: Sebastian Andrzej Siewior
Date: Tue Jun 09 2020 - 11:47:50 EST


On 2020-06-04 15:51:14 [-0500], Tom Zanussi wrote:
> >
> > Hi, This patch introduced a regression in our kernel
> > (v4.19.124-rt53-rebase), It occurs when we're jumping to crush kernel
> > using kexec, in the initialization of the emmc driver.
> > I'm still debugging the root cause, but I thought of mentioning this
> > in the mailing list if you have any idea why this could occur.
> > The issue doesn't happen on normal boot, only when I specifically
> > crash the kernel into the crash kernel.
> > Thanks,
> > Ramon.
>
> I'm not very familiar with crashing the kernel into the crash kernel.
> Can you explain in enough detail how to set things up to reproduce this
> and how to trigger it? Does it happen every time?
>
> >From looking at the backtrace, it's hitting the WARN_ON() in the
> cmpxchg() loop below, because TASKLET_STATE is just
> TASKLET_STATE_CHAINED.
>
> It seems that the only way to turn off TASKLET_STATE_CHAINED is via
> this cmpxchg(), but TASKLET_STATE_RUN can be independently turned off
> elsewhere (tasklet_unlock() and tasklet_tryunlock()), so if that
> happens and this loop is hit, you could loop until loops runs out and
> hit this warning.

But clearing TASKLET_STATE_RUN independently happens by the task, that
set it / part of tasklet_schedule().
tasklet_tryunlock() does a cmpxchg() with only the RUN bit so it won't
work if the additional CHAINED bit is set.

The tasklet itself (which may run on another CPU) sets the RUN bit at the
begin and clears it at the end via cmpxchg() together with the CHAINED
bit.

I've been staring at it for sometime and I don't see how this can
happen.

Sebastian