Re: frequent lockups in 3.18rc4

From: Dave Young
Date: Thu Nov 20 2014 - 04:55:20 EST


On 11/19/14 at 09:41am, Don Zickus wrote:
> On Tue, Nov 18, 2014 at 05:02:54PM -0500, Dave Jones wrote:
> > On Tue, Nov 18, 2014 at 04:55:40PM -0500, Don Zickus wrote:
> >
> > > > So here we mangle CPU3 in and lose the backtrace for cpu0, which might
> > > > be the real interesting one ....
> > >
> > > Can you provide another dump? The hope is we get something not mangled?
> >
> > Working on it..
> >
> > > The other option we have done in RHEL is panic the system and let kdump
> > > capture the memory. Then we can analyze the vmcore for the stack trace
> > > cpu0 stored in memory to get a rough idea where it might be if the cpu
> > > isn't responding very well.
> >
> > I don't know if it's because of the debug options I typically run with,
> > or that I'm perpetually cursed, but I've never managed to get kdump to
> > do anything useful. (The last time I tried it was actively harmful in
> > that not only did it fail to dump anything, it wedged the machine so
> > it didn't reboot after panic).
> >
> > Unless there's some magic step missing from the documentation at
> > http://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes
> > then I'm not optimistic it'll be useful.
>
> Well, I don't know when the last time you ran it, but I know the RH kexec
> folks have started pursuing a Fedora-first package patch rule a couple of
> years ago to ensure Fedora had a working kexec/kdump solution.

It started from Fedora 17, I think for Fedora pre F17 kdump support is very
limited, it is becoming better.

>
> As for the wedging part, it was a common problem to have the kernel hang
> while trying to boot the second kernel (and before console output
> happened). So the problem makes sense and is unfortunate. I would
> encourage you to try again. :-)

In fedora we will have more such issues than RHEL because the kernel is updated
frequestly. There's ocasinaly new problems in upstream kernel, such as kaslr
feature in X86.

Problem for Fedora is it is not by default enabled, so user need explictly
specify kerenl cmdline for crashkernel reservation and enable kdump serivce.

There's very few bugs reported from Fedora user. So I guess it is not well tested
in Fedora community. Since Dave bring up this issue I think it's at least a good
news to us that someone is using it. We can address the problem case by case then.

Probably a good way to get more testing is to add kdump anaconda addon by default
at installation phase so user can choose to enable kdump or not.

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/