Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code
From: Ingo Molnar
Date: Sat Mar 12 2016 - 11:18:22 EST
* Toshi Kani <toshi.kani@xxxxxxx> wrote:
> On Fri, 2016-03-11 at 09:13 +0000, Ingo Molnar wrote:
> > * Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> > >
> > > * Toshi Kani <toshi.kani@xxxxxxx> wrote:
> > >
> > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > handler that initializes PAT as part of MTRR initialization.
> > > >
> > > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > > > simply skips PAT init, which causes PAT left enabled without
> > > > initialization. [...]
> > >
> > > What practical effects does this have to the user? Does the kernel
> > > crash?
> >
> > Btw., I find this omission _highly_ annoying: describing what negative
> > effects a bug _causes in practice_ is the most important part of a
> > changelog. How on earth can an experienced contributor omit such an
> > important component from a patch description?
> >
> > Most readers of changelogs couldn't care less about technical details of
> > how the bug is fixed (of course others will read it so it's nice to have
> > too), but what symptoms a bug causes, how serious is it, whether it
> > should be backported are like super important compared to everything else
> > you wrote - and both the description and the changelogs are totally
> > silent on those topics ...
> >
> > I've seen this in other PAT patches - please try to improve this.
>
> My apology. I agree the importance of describing the negative effect of the
> issue. This case is complicated to describe thoroughly, but here is a
> summary.
The new changelog looks very good, thanks!
> The issue was reported as a regression caused by 'commit 9cd25aac1f44
> ("x86/mm/pat: Emulate PAT when it is disabled")'. So, the goal of this
> patchset is to fix this regression.
> https://lkml.org/lkml/2016/3/3/828
So one thing that matters more than anything else in the changelog, the title!
Right now the title is:
x86/mtrr: Refactor PAT initialization code
... that's a nice title for a true refactoring of the code, but this isn't really
that, the purpose of this fix is to fix a bad Xorg crash for Qemu users.
The principle you need to remember is that readers of your changelogs will be
_very happy_ about 'negative' phrases like:
bad bug
Xorg crash
boot failure
kernel crash
NULL dereference
I.e. the 'best' title for a bug fix is to characterize it in the most negative
truthful fashion in the changelog. It sounds a bit counterintuitive but it's true.
So in this case the best changelog title would be something like:
x86/pat: Fix Xorg crashes in Qemu sessions
People will absolutely _love_ such titles, because:
- users who are trying to find mysterious Xorg failures can grep for it and
might find it before it hits a stable kernel they are using
- maintainers (like me) are able to see it at a glance that this fix should go
to Linus more urgently than other fixes. (and definitely more urgently than
feature patches.)
- stable kernel maintainers and distro backporters can see it immediately at a
glance that they really want this fix.
So by being intentionally and maximally negative in the title, you are being very
helpful to your fellow developers and users!
Now consider the original title:
x86/mtrr: Refactor PAT initialization code
99% of people will glance over such a title, which is not good. Furhermore,
maintainers like me will get _annoyed_ at such titles, because this neutrally
formulated title, while very polite, actively hides the important detail that
these patches fix real negative bugs for real users.
Okay?
And please also note that in the Linux kernel no-one ever 'blames' other people
for bugs. Bugs are part of the human condition and they happen all the time as
long as they are not introduced by carelessness. So in the typical case you cannot
possibly socially embarrass any good kernel developer by reporting and fixing a
bug he introduced. The typical reaction you will get is 'oh great, one bug less to
worry about!', so socially you can be absolutely honest and 'impolite' about the
negative effects of bugs.
> The negative effects of the issue were two failures in Xorg on qemu32 env,
> which was triggered by the fact that its virtual CPU does not support MTRR.
> https://lkml.org/lkml/2016/3/4/775
> #1. copy_process() failed in the check in reserve_pfn_range()
> #2. error path in copy_process() then hit WARN_ON_ONCE in untrack_pfn().
Yeah, it's nice to quote actual crash signatures as well (in a short form) -
because people hitting the crashes often do a google search and might find the fix
based on such patterns.
Thanks!
Ingo