Re: [PATCH] proc: revert /proc/*/cmdline rewrite

From: Linus Torvalds
Date: Sun Jul 14 2019 - 14:13:20 EST


On Sun, Jul 14, 2019 at 2:52 AM Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:
>
> The problem is that I can't even drag this trivia in out of _fear_ that
> it is userspace observable:
>
> https://marc.info/?t=155863429700002&r=1&w=4
> [PATCH] elf: fix "start_code" evaluation

Oh, we should do things like this all the time.

"Observable" isn't a problem per se. It only turns into a problem when
it actually breaks things.

We should always strive for understandable - and maintainable - code.

"Make it as simple as possible, but no simpler".

Of course, if you can prove that some change isn't observable, then
that is always the safer change.

Because any observable change _can_ (and admittedly surprisingly
often) does end up showing that yes, somebody out there depended on
some particularly subtle observable detail.

But often "observable" doesn't mean "breakage", and it's not that
uncommon that we do things that have slightly different semantics in
order to clean stuff up (or fix actual bugs that cause problems).

The important thing when something observable _does_ cause actual
breakage is that it gets fixed, and that people don't try to make
excuses for it. In particular the "but I fixed a bug" is _not_ an
excuse for causing some user load to break, because that was just
another bug.

Of course, it's also perfectly valid to say "ok, this could be
improved, but changing it changes observable behavior, and it's not
worth my time to worry about whether something can break".

So there should certainly be a *worry* about breakage (and the pain
that breakage can cause) when making cleanups etc. But as long as you
stand behind your cleanup and know that you may have to fix it up if
somebody reports an issue with it, it's all good.

IOW, it's a balancing thing. Do you think the cleanup is worth the
"this may come back to bite me" problem?

> and yet the patch which did a regression and an infoleak continues
> to be papered over and for which the only justification was
> "simplify and clarify".

See above. "Simplify and clarify" is a good excuse in general.

What is *not* a good excuse is then if somebody doesn't stand up and
say "oh, my bad, I screwed up, and here's the fix for the breakage".

In this case it took a year for people to report problems, which shows
that at least the breakage wasn't obvious.

And I'd rather fix it by cleaning up *more* and making the rules
simpler and easier to understand.

Don't get me wrong - reverting is often a good strategy too.

I will revert very aggressively when close to a release, for example,
when we just don't have time to try to figure things out. Or if the
breakage is large enough that it hinders people from testing and
working on other things.

Or if the original developer is not responsive and there isn't
somebody around that goes "ok, that can be fixed by xyz.."

Then "let's just revert" is the right thing to do. It can be the
simplest thing, when you just don't have the resources to do anything
else, or it's not just worth it.

Linus