Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code

From: Luis R. Rodriguez
Date: Mon Mar 14 2016 - 18:50:57 EST

On Fri, Mar 11, 2016 at 11:34:26AM -0700, Toshi Kani wrote:
> On Fri, 2016-03-11 at 09:13 +0000, Ingo Molnar wrote:
> > * Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> > >
> > > * Toshi Kani <toshi.kani@xxxxxxx> wrote:
> > >
> > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > handler that initializes PAT as part of MTRR initialization.
> > > >
> > > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > > > simply skips PAT init, which causes PAT left enabled without
> > > > initialization. [...]
> > >
> > > What practical effects does this have to the user? Does the kernel
> > > crash?
> >
> > Btw., I find this omission _highly_ annoying: describing what negative
> > effects a bug _causes in practice_ is the most important part of a
> > changelog. How on earth can an experienced contributor omit such an
> > important component from a patch description?
> >
> > Most readers of changelogs couldn't care less about technical details of
> > how the bug is fixed (of course others will read it so it's nice to have
> > too), but what symptoms a bug causes, how serious is it, whether it
> > should be backported are like super important compared to everything else
> > you wrote - and both the description and the changelogs are totally
> > silent on those topics ...
> >
> > I've seen this in other PAT patches - please try to improve this.
> My apology. I agree the importance of describing the negative effect of the
> issue. This case is complicated to describe thoroughly, but here is a
> summary.
> The issue was reported as a regression caused by 'commit 9cd25aac1f44
> ("x86/mm/pat: Emulate PAT when it is disabled")'. So, the goal of this
> patchset is to fix this regression.

Huh, interesting.

> The negative effects of the issue were two failures in Xorg on qemu32 env,
> which was triggered by the fact that its virtual CPU does not support MTRR.
>  #1. copy_process() failed in the check in reserve_pfn_range()
>  #2. error path in copy_process() then hit WARN_ON_ONCE in untrack_pfn().
> These negative effects are also caused by two different bugs, but they can
> be dealt in lower priority. Fixing the pat_init() issue will avoid Xorg
> hitting these cases.

Was it confirmed that these patches fix this issue? Mentioning this in the
commit log would also help. Tested-by, etc.

Joe at Stratus also hit this issue but on a system where MTRR is enabled. He
sent his report only to me as he thought it was caused by the ioremap_wc()
changes and his driver was one that got it. In his case though he modified the
driver significantly, and upon inspection of that code saw how it used a
secondary backup PCI device for failover for a framebuffer device... The changes
to the driver in place are rather complex though and as such it made no sense
to further review unless he moved his changes upstream. It is still worth
noting this issue has been seeing elsehwere, but the root cause is still not
known. The error Joe got is:

x86/PAT: Xorg:37506 map pfn expected mapping type uncached-minus for [mem 0x9f000000-0x9f7fffff], got write-combining

Even though the driver is custom (and actually I even saw another unrelated
proprietary driver loaded) I figured its worth noting others have seen this
error without MTRR being disabled.

The second thread you referred to seems to say that if you built-in the code
the error does not come up. What the hell. Joe, can you try building your
driver built-in to see if you also see this go away? Even though I don't
want to support your custom hacked up driver I do want to know if your
issue goes away with built-in as well.

> When the CPU does not support MTRR, MTRR does not call pat_init(), but
> leaves PAT enabled. This pat_init() issue is a long-standing issue, but
> manifested as issue #1 (and then hit issue #2) with the commit because the
> memtype now tracks cache attribute with 'page_cache_mode'. A WC map request
> is tracked as WC in memtype, but sets a PTE as UC (pgprot) per
> __cachemode2pte_tbl[].

Can you think of anything else other than having MTRR disabled that could
cause this same ending result?

> This caused an error in reserve_pfn_range() when it
> was called from track_pfn_copy(), which obtained pgprot from a PTE. It
> converts pgprot to page_cache_mode, which does not necessarily result in
> the original page_cache_mode since __cachemode2pte_tbl[] redirects multiple
> types to UC. This is a separate issue in reserve_pfn_range().

Do you have a fix in mind for that already too?

> If PAT is set to disabled properly, the code bypasses the memtype check.
> So, #1 is a non-issue as a result (although it should be fixed).
> This pat_init() issue existed before commit 9cd25aac1f44, but we used
> pgprot in memtype. Hence, we did not have issue #1 before. But WC request
> resulted in WT in effect because WC pgrot is actually WT when PAT is not
> initialized. This is not how the code was designed to work. When PAT is set
> to disabled properly, WC gets converted to UC. The use of WT can result in
> a system crash if the target range does not support WT, but fortunately
> people did not run into such issue before.

And since the ioremap_wc() crusade ended via commit 2baa891e42d84, and both
were merged on v4.3 it mean that we'd likely see more of these issues now as
more driver are using PAT since v4.3.