Re: [ANNOUNCE] linux-stable security tree

From: Willy Tarreau
Date: Mon Apr 11 2016 - 17:17:25 EST


Hi Sasha,

On Mon, Apr 11, 2016 at 04:38:17PM -0400, Sasha Levin wrote:
> > How are you
> > going to judge which driver fixes to take and which not to? Why not
> > take them all if they fix bugs?
>
> Because some fixes introduce bug on their own? Take a look at how many
> commits in the stable tree have a "Fixes:" tag that points to a commit
> that's also in the stable tree.

I'm using stable trees myself in the load balancing products we ship at
work. I've met a single bug during the whole 3.10 lifetime and it was
caused by one of our out-of-tree patch that applied at the wrong place
after an update. I'd generally say that -stable quality is very good,
if not excellent. Several people review the patches before they get
merged, several ones build and even boot them. It's not that random.
Look, one patch was just dropped from 3.14.64 because it failed a build
test in one environment. This one will never hit end users.

> Look at the opposite side of this question: why would anyone take a commit
> that fixes a bug he doesn't care about? Are the benefits really worth it
> considering the risks?

That's exactly what most people do. I don't update to each and every kernel.
When I see xen, lvm, drm and audio changes I don't need them in my products.
But when I'm seeing network fixes I study them and often decide that it's
worth upgrading. Sometimes I pick a single fix from the queue because I can't
wait for next release. Many of Greg's kernels more or less focus on certain
topics, probably due to the way he deals with his mailbox and patch storms,
so it's often easy to quickly decide if you're going to need to update or
not.

> [snip]
>
> >>> Define "important". Now go and look at the tty bug we fixed that people
> >>> only realized was "important" 1 1/2 years later and explain if you
> >>> would, or would not have, taken that patch in this tree.
> >>
> >> Probably not, but I would have taken it after it received a CVE number.
> >>
> >> Same applies to quite a few commits that end up in stable - no one thinks
> >> they're stable material at first until someone points out it's crashing
> >> his production boxes for the past few months.
> >
> > Yes, but those are rare, what you are doing here is suddenly having to
> > judge if a bug is a "security" issue or not. You are now in the
> > position of trying to determine "can this be exploited or not", for
> > every commit, and that's a very hard call, as is seen by this specific
> > issue.

Especially for networking stuff or things related to local resource usage
where some people consider it represents a local DoS risk and others
consider that it's just irrelevant to their servers since they have no
local users.

> The stable stuff isn't rare as you might think, even more: the amount of
> actual CVE fixes that are not in the stable tree might surprise you.

I would personally not be surprized since Ben used to feed me with a lot
of fixes I had never seen previously. What is unclear to me is if your
tree will contain only a selection of patches that are already in the
respective branches, or a backport of security fixes that we can pick
from to feed our stable branches and limit the risk of missing them.
*This* actually could be useful to everyone, starting from our users.

(...)
> This is actually what happens now; projects get to the point they don't
> want to update their whole kernel tree anymore so that just freezes because
> they don't want to re-validate the whole thing over and over, but they
> still cherry pick upstream and out-of-tree commits that they care about.
>
> If they added a handful of security commits to cherry pick and carefully
> review their security will be much better than what happens now.

Actually I do think that for end users it's a regression. People will
start reusing outdated kernels which only contain the most critical fixes
known, but will still suffer from memory leaks, deadlocks, kernel panics,
data corruption etc. Every single bug that doesn't have a CVE attached to
it in fact, which means 99% of the bugs that bring a system down in
production. It makes me think about people who only pick security fixes
from openssl and not the regular batch of missing null checks, and who
complain all the time that their systems are unstable while they simply
don't apply fixes.

You see, when I started with the "hotfix" tree 11 years ago for kernel
2.4, I intented to only pick the most critical fixes, they would fit
in just a README and were counted on one hand. One year later there
were 150 just because everything becomes critical for *some* workloads.

I *do* think that having a central reference for fixes that come with
a reproducer (hence many security fixes) can be useful as it would
offer an opportunity for better testing backports when they become
tricky : it often takes much more time to try to set up a test with
a reproducer than it takes to backport and adjust the fix (not always
true). But when it comes to security issues often the reporter cares
about the quality of the backport and helps there.

Just my two cents,
Willy