Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue

From: Tom Zanussi
Date: Tue Jun 09 2020 - 12:17:57 EST


Hi Sebastian,

On Tue, 2020-06-09 at 17:47 +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-04 15:51:14 [-0500], Tom Zanussi wrote:
> > >
> > > Hi, This patch introduced a regression in our kernel
> > > (v4.19.124-rt53-rebase), It occurs when we're jumping to crush
> > > kernel
> > > using kexec, in the initialization of the emmc driver.
> > > I'm still debugging the root cause, but I thought of mentioning
> > > this
> > > in the mailing list if you have any idea why this could occur.
> > > The issue doesn't happen on normal boot, only when I specifically
> > > crash the kernel into the crash kernel.
> > > Thanks,
> > > Ramon.
> >
> > I'm not very familiar with crashing the kernel into the crash
> > kernel.
> > Can you explain in enough detail how to set things up to reproduce
> > this
> > and how to trigger it? Does it happen every time?
> >
> > > From looking at the backtrace, it's hitting the WARN_ON() in the
> >
> > cmpxchg() loop below, because TASKLET_STATE is just
> > TASKLET_STATE_CHAINED.
> >
> > It seems that the only way to turn off TASKLET_STATE_CHAINED is via
> > this cmpxchg(), but TASKLET_STATE_RUN can be independently turned
> > off
> > elsewhere (tasklet_unlock() and tasklet_tryunlock()), so if that
> > happens and this loop is hit, you could loop until loops runs out
> > and
> > hit this warning.
>
> But clearing TASKLET_STATE_RUN independently happens by the task,
> that
> set it / part of tasklet_schedule().
> tasklet_tryunlock() does a cmpxchg() with only the RUN bit so it
> won't
> work if the additional CHAINED bit is set.
>
> The tasklet itself (which may run on another CPU) sets the RUN bit at
> the
> begin and clears it at the end via cmpxchg() together with the
> CHAINED
> bit.
>
> I've been staring at it for sometime and I don't see how this can
> happen.
>

I did find a problem with the patch when configured as !SMP since in
that case the RUN flag is never set (will send a patch for that
shortly), but that wouldn't be the case here.

It would help to be able to reproduce it, but I haven't been able to
yet.

Tom

> Sebastian