Re: source line numbers with x86_64 modules? [Was: Re: [patch]measurements, numbers about CONFIG_OPTIMIZE_INLINING=y impact]

From: Theodore Tso
Date: Sat Jan 10 2009 - 16:16:37 EST


On Sat, Jan 10, 2009 at 01:21:06PM -0500, Mike Snitzer wrote:
> > In practice i rarely see bugfixes that were debugged via kdump. Normal
> > oops based fixes outnumber kdump based fixes by a ratio of 1:100 or worse
> > - and kdump is readily available these days - just nobody configures it.
>
> So you're telling me RedHat doesn't rely on kdump at enterprise
> customer installations? I find that hard to believe. Few enterprise
> customers allow defects to be debugged on-site, sometimes collecting a
> crash dump is all you can hope for to make progress. I have to
> believe you know this fairly well; if not with direct experience then
> through your co-workers? Or am I living in Ingo's version of Linux
> hell where kdump is actually useful?

In my experience, there are very few kernel versions and hardware for
which kdump works. I've talked to the people who have to make kdump
work, and every 12-18 months, with a new set of enterprise kernels
comes out, they have to go and fix kdump so it works again for the set
of hardware that they care about, and for the kernel version involved.
Part of the problem is one which has infected nearly every single RAS
technology out there, from kdump to Systemtap, which is the people who
architect and fund these RAS technologies delude themselves into
thinking that they only have to worry about making it work for
enterprise kernels and enterprise users, and to hell with everyone
else --- specifically, kernel developers, which don't matter since
they aren't enterprise users. Heck, until July of last year,
Systemtap wouldn't even ***compile*** out of the box on a
non-enterprise distribution like Ubuntu or Debian. And I still have
yet to make kdump work on a Thinkpad, although I've tried.

Since pretty much no one uses these RAS technologies except enterprise
users, and no one bothers to make it easy for kernel developers,
kernel developers have developed alternate mechanisms for debugging
the Linux kernel --- and they don't involve using Systemtap or kdump,
because in practice, it doesn't work for them at all, or it's too hard
to make it work for them.

And this becomes a vicious cycle; since no one is bothered to spend
time making RAS technologies work for everyday use by kernel
developers, bitrot inevitably sets in, and so the RAS developers get
no help from other kernel developers, who are busy fixing their own
problems via different means; and so the RAS developers hunker down,
and spend even more time fixing the bitrot and complaining that no one
helps them or takes them seriously, and the problem gets worse and
worse and worse --- until now there are people who are busily
developing alternatives to Systemtap, just because too many RAS
architects and developers and had their priorities wrong, and forgot
to focus on every day kernel developers instead of just enterprise
users.

It's very sad, and it means a lot of investment gets wasted, and work
is getting duplicated as a result.

Oh, well.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/