Re: stable? quality assurance?

From: Martin Steigerwald
Date: Sat Sep 04 2010 - 13:12:47 EST



Hi Ted,

I wanted to answer this for a long time...

Am Sonntag 11 Juli 2010 schrieb Ted Ts'o:
> On Sun, Jul 11, 2010 at 09:18:41AM +0200, Martin Steigerwald wrote:
> > I still actually *use* my machines for something else than hunting
> > patches for kernel bugs and on kernel.org it is written "Latest
> > *Stable* Kernel" (accentuation from me). I know of the argument that
> > one should use a distro kernel for machines that are for production
> > use. But frankly, does that justify to deliver in advance known crap
> > to the distributors? What impact do partly grave bugs reported on
> > bugzilla have on the release decision?
>
> So I tend to use -rc3, -rc4, and -rc5 kernels on my laptops, and when
> I find bugs, I report them and I help fix them. If more people did
> that, then the 2.6.X.0 releases would be more stable. But kernel
> development is a volunteer effort, so it's up to the volunteers to
> test and fix bugs during the rc4, -rc5 and -rc6 time frame. But if
> the work tails off, because the developers are busily working on new
> features for the new release, then past a certain point, delaying the
> release reaches a point of diminishing returns. This is why we do
> time-based releases.

It sure helps quality of the kernel if people test rc candidates of them
and report bugs, but I think at least partly you missed my point. I wrote
in my initial mail:

> 2.6.34 was a desaster for me: bug #15969 - patch was availble before
> 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as
> well as most important two complete lockups - well maybe just

So two out of three bugs I experienced - the third one being [Bug 16376]
random - possibly Radeon DRM KMS freezed I am currently bisecting -
actually have been from testers that actually tested rc kernels. One even
had a patch prior to releasing 2.6.34.

So for these two bugs testing rc kernels clearly has not helped raising
the *release* kernel quality.

I now understand that deferring a stable kernel release can cause a lot of
pain. But still I have the question why at least the patch from the bug
15969 has not been taken prior to release? Not to find some guilt, but to
possibly find ways to improve the process. I can't check bugzilla right now
due to too many MySQL connections on the server - already reported, but
supposedly already known to the admins anyway - but AFAIR the patch has
been available and AFAIR also tested way before the release.

So my question still stands whether anything can be improved with at least
getting as much bugfix patches from Bugzilla into stable kernel. At least
for critical bugs like does not boot or only garbage on screen after
booting.

I can accept that bug 15788 would have been missed by that, but this bug
was not that important - it was just the tip on the iceberg.

> It is possible to do other types of release strategies, but look at
> Debian Obsolete^H^H^H^H^H^H^H^H Stable if you want to see what happens
> if you insist on waiting until all release blockers are fixed (and
> even with Debian, past a certain point the release engineer will still
> just reclassify bugs as no longer being release blockers --- after the
> stable release has slipped for months or years past the original
> projected release date.)

I made a suggestion on how to improve the development process while still
holding to time-based releases in my other mail to this thread today.

> So if you and others like you are willing to help, then the quality of
> the Linux kernels can continue to improve. But simply complaining
> about it is not likely to solve things, since threating to not be
> willing to upgrade kernels is generally not going to motivate many, if
> not most, of the volunteers who work on stablizing the kernel.

I do, but I need to balance this. I already spend quite some hours on
bisecting that freeze bug mentioned above and it might take some more
weeks to nail it down.

And it was not a threat at all. I just have to balance how much
instability I can take on systems that I use for my daily stuff.

> > I am willing to risk some testing and do bug reports, but these are
> > still production machines, I do not have any spare test machines, and
> > there needs to be some balance, i.e. the kernels should basically
> > work.
>
> So you want the latest and greatest new features in a brand-new kernel
> release, but you're not willing to pay for test machines, and you're
> not willing to pay for a distribution support... The fact that you
> are willing to do some testing is appreciated, but remember, there's
> no such thing as a free lunch. Linux may be a very good bargain (look
> at how much Oracle has increased its support contracts for Solaris!),
> but it's still not a free lunch. At the end of the day, you get what
> you put into it.

Ted, I think there is no need to attack me like that. Actually all of the
bugs have been on my laptop that I use for work *and* private work. Most
of the time I spent on these bugs have been during my spare volunteer time
as well. And we are yet a small company.

When I apply what you wrote above, the only sane thing would be to use a
distro kernel and be done with it - which means less testing of recent
kernels. Still even then that likely radeon kms related freeze could have
slipped even into Debian stable kernel, considering that no one posted to
the bug report that he was able to reproduce the bug.

Then I'd just accept the slower turn-around cycles with in kernel or
userspace software suspend and be done with compiling TuxOnIce kernels.

But I am not there yet. Cause compiling TuxOnIce kernels worked pretty
well prior from 2.6.11 to 2.6.33. And I want to help as good as I can.
Hopefully after bisecting the radeon kms relate freeze bug thinks are
calmer again - although there is another wierd, possibly difficult to track
bug left. Maybe I just had lots of bad luck with 2.6.34, and after
tracking those two bugs things are calmer again. The Radeon KMS stuff has
been a big change as well.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

Attachment: signature.asc
Description: This is a digitally signed message part.