Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code

From: Toshi Kani
Date: Tue Mar 29 2016 - 19:24:27 EST


On Tue, 2016-03-29 at 15:12 -0700, Luis R. Rodriguez wrote:
> On Tue, Mar 29, 2016 at 2:46 PM, Toshi Kani <toshi.kani@xxxxxxx> wrote:
> > On Tue, 2016-03-29 at 10:14 -0700, Luis R. Rodriguez wrote:
> > > On Fri, Mar 18, 2016 at 2:35 PM, Toshi Kani <toshi.kani@xxxxxxx>
> > > wrote:
Â:
> > >
> > > Do we really need UC for the fan?
> >
> > When you say "we", are you referring Xen guests?ÂÂXen guests do not
> > need to control the fan, so they do not need UC set in MTRRs.
> >
> > In general, yes, MMIO registers need UC when they need to be accessed.
>
> Curious, what does a BIOS do for fan control when MTRRs are disabled?

You mean, when the kernel modified the MTRR setup and disabled them. ÂBIOS
would assume the original setup and still access the registers. ÂThis may
lead to undefined behavior and may result in a system crash.

> Also what if a BIOS just set MSR_MTRRdefType to uncachable only ?

Many BIOSes actually set the default type to UC. ÂMTRRs then cover regular
memory with WB.

> Wouldn't that help simplify the BIOS when systems are known as not
> wanting to deal with reading MTRRs on the kernel front, even if its
> just to read the setup ?

Nope.

> I'm trying to determine exactly why a BIOS cannot simply enable use an
> alternative for what it needs for fan control and let the kernel live
> without any MTRR code at run time as an option. Although the
> documentation says that the same "procedure" is needed for PAT setup,
> I see it possible to split the skeleton of the code and have each
> peace of code live separately and compartmentalized, they'd just have
> respective calls on the skeleton of the procedure.

I agree that the MTRR rendezvous handler can be improved for PAT, but I do
not see a compelling reason to make such change now. ÂWith my fix, I think
the code works reasonably for Xen.

> > > What is the default for PAT?
> >
> > There is no such thing as the default for PAT.
> >
> > > Can't
> > > the same be used so that we way by default all ranges match what is
> > > also the default by PAT? Would that really break fan control ? If we
> > > have a match should't we be able to not have to worry about MTRRs at
> > > all in-kernel even on bare metal?
> >
> > We do not need to know about BIOS impl, such as fan control, etc.ÂÂThe
> > point is that if BIOS sets MTRRs, then the kernel keeps their setup.
>
> Right, if the kernel no longer uses it directly it seems like an
> aweful lot of code to keep updating simply for a BIOS requirement, I'm
> trying to see if we can have the option to live without this
> requirement.

Please be aware of the hibernation case. I think this procedure involves
setting MTRRs back to the original setup.

> > If (virtual) BIOS does not enable MTRRs, the kernel keeps them
> > disabled.ÂÂWe just need not to mess with the setup.
>
> Sure, thanks! I'm trying to see if we can have a similar option on bare
> metal.
>
> > > Another option, which I've alluded to on the Xen thread is skipping
> > > over the MTRR space from the e820 map. Is that not possible ? This
> > > could be last resort... but which I'm hinting more for the Xen side
> > > of things if we *really* need get_mtrr() on the Xen guest side of
> > > things...
> >
> > There is no MTRR space in the e820 map since they are MSRs.ÂÂSince Xen
> > guests disable MTRRs, I do not think you have any issue here...
>
> Xen seems to clip the e820 map given to a guest in certain MTRR
> conditions, see init_e820(), this calls
> machine_specific_memory_setup() which later clips MTRR if
> mtrr_top_of_ram(). This is an Intel check that trims the e820 map if
> MTRRs were found to be enabled and the default MTRR is not write-back.
> If returns the address of the first non write-back variable MTRR, it
> uses clip_to_limit() to limit the exposed memory [0], notice how
> clip_to_limit() is also used to generally limit exposed memory through
> the opt_mem boot parameter as well. Its not exactly clear why that's
> done, but this looks very similar to the Linux MTRR cleanup -- see
> x86_get_mtrr_mem_range().
>
> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/e820.c

It looks to me that the code makes sure all E820_RAM ranges in the e820
table are covered by WB entries of MTRRs. ÂIf not, it trims the e820 table.

I suppose it tries to react on a case when someone modified MTRRs and
resulted in mismatch with the e820 table. ÂI'd think you do not need this
code as long as you do not modify the MTRR setup.

Thanks,
-Toshi