Re: [PATCH v1] perf evlist: Force adding default events only to core PMUs

From: Linus Torvalds
Date: Tue May 28 2024 - 16:33:52 EST


On Tue, 28 May 2024 at 13:03, Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> But you've traded a fix for one set of users with a fix for another.

No.

You don't seem to understand what "regression" means.

The "no regressions" rule is that we do not introduce NEW bugs.

It *literally* came about because we had an endless dance of "fix two
bugs, introduce one new one", and that then resulted in a system that
you cannot TRUST.

We had years of this with suspend/resume in particular, where the
argument was always exactly "this fixed many more systems" when
something else broke.

But because random other things kept breaking, it meant that people
couldn't upgrade the kernel and feel safe about it.

Your old laptop might no longer work, because somebody had deemed that
all those *new* laptops were more important.

So I introduced that "no regressions" rule something like two decades
ago, because people need to be able to update their kernel without
fear of something they relied on suddenly stopping to work.

The fact that the "suddenly stopped working" may be a minority DOES
NOT MATTER ONE WHIT.

Stability matters.

It's MUCH MUCH better to have legacy bad behavior that you never dealt
with correctly, than to introduce *new* bugs that hit something that
used to work.

So something that "perf" has never done correctly is simply not an issue.

You deal with that by saying "that has never worked properly before either".

You might even document it along with (hopefully) possible workarounds.

The whole "one step forward, two steps back" is absolutely fine if you
are line dancing.

But we're not line dancing.

We take it slow and steady, and if you can't fix something without
breaking something else, then that thing simply does not get fixed.

And there are always exceptions. Sometimes something may need to be
broken because it's an acute security issue.

And if it takes a year for somebody to find a regression, and in the
meantime others have started relying on the new behavior, then at some
point it's a "yes, that's a regression, but it wasn't reported in a
timely enough manner".

And sometimes the use-case is basically a museum machine and we retire
support for it because such a machine should use museum software too.

So it's not like it's some long-term absolute guarantee. But it
absolutely *is* the #1 rule in the kernel.

A fix that breaks something else is not a fix at all.

And in this case, when the regression was noted within days, and
within the merge window, there's simply no discussion about it.

Linus