Re: Suggestions on how to debug kernel crashes where printk and gdb both does not work
From: Dongliang Mu
Date: Mon Jun 14 2021 - 10:42:41 EST
On Mon, Jun 14, 2021 at 10:25 PM Pavel Skripkin <paskripkin@xxxxxxxxx> wrote:
>
> On Mon, 14 Jun 2021 22:19:10 +0800
> Dongliang Mu <mudongliangabcd@xxxxxxxxx> wrote:
>
> > On Mon, Jun 14, 2021 at 9:34 PM Pavel Skripkin <paskripkin@xxxxxxxxx>
> > wrote:
> > >
> > > On Mon, 14 Jun 2021 21:22:43 +0800
> > > Dongliang Mu <mudongliangabcd@xxxxxxxxx> wrote:
> > >
> > > > Dear kernel developers,
> > > >
> > > > I was trying to debug the crash - memory leak in hwsim_add_one [1]
> > > > recently. However, I encountered a disgusting issue: my
> > > > breakpoint and printk/pr_alert in the functions that will be
> > > > surely executed do not work. The stack trace is in the following.
> > > > I wrote this email to ask for some suggestions on how to debug
> > > > such cases?
> > > >
> > > > Thanks very much. Looking forward to your reply.
> > > >
> > >
> > > Hi, Dongliang!
> > >
> > > This bug is not similar to others on the dashboard. I spent some
> > > time debugging it a week ago. The main problem here, that memory
> > > allocation happens in the boot time:
> > >
> > > > [<ffffffff84359255>] kernel_init+0xc/0x1a7 init/main.c:1447
> > >
> >
> > Oh, nice catch. No wonder why my debugging does not work. :(
> >
> > > and reproducer simply tries to
> > > free this data. You can use ftrace to look at it. Smth like this:
> > >
> > > $ echo 'hwsim_*' > $TRACE_DIR/set_ftrace_filter
> >
> > Thanks for your suggestion.
> >
> > Do you have any conclusions about this case? If you have found out the
> > root cause and start writing patches, I will turn my focus to other
> > cases.
>
> No, I had some busy days and I have nothing about this bug for now.
> I've just traced the reproducer execution and that's all :)
>
> I guess, some error handling paths are broken, but Im not sure
In the beginning, I agreed with you. However, after I manually checked
functions: hwsim_probe (initialization) and hwsim_remove (cleanup),
then things may be different. The cleanup looks correct to me. I would
like to debug but stuck with the debugging process.
And there is another issue: the cleanup function also does not output
anything or hit the breakpoint. I don't quite understand it since the
cleanup is not at the boot time.
Any idea?
>
>
> >
> > BTW, I only found another possible memory leak after some manual code
> > review [1]. However, it is not the root cause for this crash.
> >
> > [1] https://lkml.org/lkml/2021/6/10/1297
> >
> > >
> > > would work.
> > >
> > >
> > > With regards,
> > > Pavel Skripkin
>
>
>
>
> With regards,
> Pavel Skripkin