Re: Reviving the concept of a stable series (was Re: starting with 2.7)

From: Dave Jones
Date: Tue Jan 04 2005 - 13:26:17 EST


On Tue, Jan 04, 2005 at 05:08:46AM -0800, William Lee Irwin III wrote:
> On Mon, Jan 03, 2005 at 05:20:42PM -0500, Bill Davidsen wrote:
> > If the -rc process were in place, new feature freeze until the big green
> > bugs were fixed just before the next release, that actually might be
> > most of a solution.
> > No one bug akpm can accurately asses how well fixes come back from
> > vendors, but I suspect that the kernel is moving too fast and vendors
> > "pick one" and stabilize that, by which time the kernel.org is
> > generations down the road. It's possible that some fixes are then
> > rediffed against the current kernel and fed, but I have zero information
> > on that happening or not.
>
> It does happen. I can't give a good estimate of how often. Someone at a
> distro may be able to help here, though it's unclear what this specific
> point is useful for.

Pull up a chair, this is going to be a long one.

When we shipped Fedora Core 3, we drew a line in the sand, and
decided that 2.6.9 was the kernel we were going to ship with.
It happened to coincide nicely with the final release date, and
everyone was happy.

Post release, the myriad of users filled RH bugzilla diligently
with their many reports of interesting failures. Upstream had
now started charging ahead with what was to be 2.6.10.

The delta between 2.6.9 -> 2.6.10 was around 4000 changesets.
Cherry picking csets to backport to 2.6.9 at this rate of
change is nigh on impossible. You /will/ miss stuff.
In the absense of a 2.6.9.1, we chose to use Alan's -ac
patches as a base to pick up most of the interesting meat,
and then cherry pick anything else which people had noticed
go past, or in some cases, after investigation into a
bugreport.

So now we're at our 2.6.9-ac+a few dozen 2.6.10 csets
and all is happy with the world. Except for the regressions.
As an example, folks upgrading from Fedora core 2, with its
2.6.8 kernel found that ACPI no longer switched off their
machines for example. Much investigation went into
trying to pin this down. Kudos to Len Brown and team for
spending many an hour staring into bug reports on this
issue, but ultimately the cause was never found.
It was noted by several of our users seeing this problem
that 2.6.10 no longer exhibits this flaw. Yet our
2.6.9-ac+backports+every-2.6.10-acpi-cset also was broken.
It's likely Fedora will get a 2.6.10 based update before
the fault is ever really found for a 2.6.9 backport.

This is just one example of a regression that crept in
unnoticed, and got fixed almost by accident. (If it
was intentionally fixed, we'd know which patches
we needed to backport 8-)

For distro kernels to be the 'stable' branch, we *rely*
on help from various subsystem maintainers to continue
to bugfix old kernels, despite it being unsexy.
I admit it's painful, and given the option, replying
"just use 2.6.10-bk6" is a much easier alternative,
but with thousands of changes going into tree each
month, it's not feasable for a distro to ship updates
on that basis without something happening to deal with
regressions.

As for stuff going back upstream.. You may be surprised how
many bugs our 2.6.9-ac-many-backports hybrid has turned up
which turned out to be just as relevant on 2.6.10
Here's the patchcount in our current trees..

Fedora Core 2: 245
Fedora Core 3: 63
Rawhide: 76

FC2 is our 2.6.9 hybrid (the fc3 kernel got backport to fc2
as an update), FC3 is a rebase to 2.6.10-ac2.
rawhide (FC4-to-be) is 2.6.10-bk6.

Note we still have 63 patches in FC3. Out of those, just over
a dozen are 'features' that we added. The majority of the
rest are real bugfixes, currently languishing in out-of-tree
repositories for projects like NFS, s390, e1000 updates etc..
Note also that when FC3 first shipped, before we started
backporting 2.6.10 bits, the patchcount was around 40 or so,
so in the 2.6.9->2.6.10 rebase, we 'grew' around 13 patches.
Each time I rebase to a new upstream, I want to get back to
(or better than) the original patchcount where possible.
When this doesn't happen, it means we're accumulating stuff
that isn't making its way upstream fast enough.

So, of those 182 patches we dropped in our 2.6.10 rebase..
Some of them were upstream backports, but some of them were
patches we pushed upstream that we now get to drop on a rebase.
So the push/pull ecosystem is working out pretty well in this regard
Whilst I'd like to get even more of this stuff upstream,
it's the job of those out-of-tree pool maintainers to push
their work, not mine.

Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/