Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007

From: Thomas Gleixner
Date: Fri Nov 10 2017 - 16:30:09 EST


On Fri, 10 Nov 2017, Linus Torvalds wrote:

> On Wed, Nov 8, 2017 at 9:19 PM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
> >
> > Yes it's accessing the list. Here is the faddr2line output.
>
> Ok, so it's a corrupted timer list. Which is not a big surprise.
>
> It's
>
> next->pprev = pprev;
>
> in __hlist_del(), and the trapping instruction decodes as
>
> mov %rdx,0x8(%rax)
>
> with %rax having the value dead000000000200,
>
> Which is just LIST_POISON2.
>
> So we've deleted that entry twice - LIST_POISON2 is what hlist_del()
> sets pprev to after already deleting it once.
>
> Although in this case it might not be hlist_del(), because
> detach_timer() also sets entry->next to LIST_POISON2.
>
> Which is pretty bogus, we are supposed to use LIST_POISON1 for the
> "next" pointer. Oh well. Nobody cares, except for the list entry
> debugging code, which isn't run on the hlist cases.
>
> Adding Thomas Gleixner to the cc. It should not be possible to delete
> the same timer twice.

Right, it shouldn't.

Fengguang, can you please enable:

CONFIG_DEBUG_OBJECTS
CONFIG_DEBUG_OBJECTS_TIMERS

and try to reproduce? Debugobject should catch that hopefully.

Thanks,

tglx