Re: Process for severe early stable bugs?

From: Willy Tarreau
Date: Sat Dec 08 2018 - 02:33:19 EST


Hi Laura,

On Fri, Dec 07, 2018 at 04:33:10PM -0800, Laura Abbott wrote:
> The latest file system corruption issue (Nominally fixed by
> ffe81d45322c ("blk-mq: fix corruption with direct issue") later
> fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch
> list")) brought a lot of rightfully concerned users asking about
> release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to
> 4.19.3 on Nov 23. When the issue started getting visibility,
> users were left with the option of running known EOL 4.18.x
> kernels or running a 4.19 series that could corrupt their
> data. Admittedly, the risk of running the EOL kernel was pretty
> low given how recent it was, but it's still not a great look
> to tell people to run something marked EOL.
>
> I'm wondering if there's anything we can do to make things easier
> on kernel consumers. Bugs will certainly happen but it really
> makes it hard to push the "always run the latest stable" narrative
> if there isn't a good fallback when things go seriously wrong. I
> don't actually have a great proposal for a solution here other than
> retroactively bringing back 4.18 (which I don't think Greg would
> like) but I figured I should at least bring it up.

This type of problem may happen once in a while but fortunately is
extremely rare, so I guess it can be addressed with unusual methods.

For my use cases, I always make sure that the last two LTS branches
work fine. Since there's some great maintenance overlap between LTS
branches, I can quickly switch to 4.14.x (or even 4.9.x) if this
happens. In our products we make sure that our toolchain is built
with support for the previous kernel as well "just in case". We've
never switched back and will probably never do, but at least it
serves us a lot to compare strange behaviours between two kernels.

I think that if your distro is functionally and technically compatible
with the previous LTS branch, it could be an acceptable escape for
users who are concerned about their data and their security at the
same time. After all, previous LTS branches are there for those who
can't upgrade. In my opinion this situation perfectly qualifies.

But it requires some preparation like I mentioned. It might be that
some components in the distro rely on features from the very latest
kernels. At the very least it might deserve a bit of inspection to
know if such dependencies exist, and/or what is lost in case of such
a fall back, to warn users.

Just my two cents,
Willy