Re: workqueue list corruption

From: Tejun Heo
Date: Mon Jun 05 2017 - 15:42:48 EST


Hello,

On Sun, Jun 04, 2017 at 12:30:03PM -0700, Cong Wang wrote:
> On Tue, Apr 18, 2017 at 8:08 PM, Samuel Holland <samuel@xxxxxxxxxxxx> wrote:
> > Representative backtraces follow (the warnings come in sets). I have
> > kernel .configs and extended netconsole output from several occurrences
> > available upon request.
> >
> > WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0x89/0xb0
> > list_add corruption. prev->next should be next (ffff99f135016a90), but
> > was ffffd34affc03b10. (prev=ffffd34affc03b10).

So, while trying to move a work item from delayed list to the pending
list, the pending list's last item's next pointer is no longer
pointing to the head and looks re-initialized. Could be a premature
free and reuse.

If this is reproducible, it'd help a lot to update move_linked_works()
to check for list validity directly and print out the work function of
the corrupt work item. There's no guarantee that the re-user is the
one which did premature free but given that we're likely seeing
INIT_LIST_HEAD() instead of random corruption is encouraging, so
there's some chance that doing that would point us to the culprit or
at least pretty close to it.

Thanks.

--
tejun