Re: runtime regression with "x86/mm/pat: Emulate PAT when it is disabled"

From: Toshi Kani
Date: Tue Mar 08 2016 - 10:46:29 EST


On Mon, 2016-03-07 at 22:28 -0500, Paul Gortmaker wrote:
> [Re: runtime regression with "x86/mm/pat: Emulate PAT when it is
> disabled"] On 07/03/2016 (Mon 18:35) Toshi Kani wrote:
>
> > On Mon, 2016-03-07 at 17:56 -0700, Toshi Kani wrote:
> > > On Mon, 2016-03-07 at 18:53 -0500, Paul Gortmaker wrote:
> > > > [Re: runtime regression with "x86/mm/pat: Emulate PAT when it is
> > > > disabled"] On 07/03/2016 (Mon 16:38) Toshi Kani wrote:
> > > >
> > > > > On Mon, 2016-03-07 at 16:08 -0500, Paul Gortmaker wrote:
> > > > > > [dropping oe list and lkml since attaching dmesg files.]
> > > > > >
> > > >
> > > > [...]
> > > >
> > > > > > > Yes, please send me full dmesg files. ÂSince I do not know
> > > > > > > your original state, the diff does not give me the whole
> > > > > > > picture.
> > > > > >
> > > > > > Attached.
> > > > >
> > > > > Thanks for the dmesg files! ÂAs I suspected, there is no message
> > > > > from pat_init() in both cases. ÂThat is, you are missing the
> > > > > following message, which shows how PAT is configured to support
> > > > > cache attributes.
> > > > >
> > > > > # dmesg | grep PAT
> > > > > [0.000000] x86/PAT: Configuration [0-7]:
> > > > > WBÂÂWCÂÂUC-ÂUCÂÂWBÂÂWCÂÂUC- WT Â
> > > >
> > > > Interesting...
> > > >
> > > > >
> > > > > It may have seemed working before, but you did not have WC
> > > > > configured to PAT without calling pat_init(). ÂThere was not a
> > > > > proper check in place to detect this error before. ÂCan you
> > > > > please check your code to see what caused this skip of
> > > > > pat_init()? ÂIf you have a git tree, I can take a look as well.Â
> > > >
> > > > You already have git copies of what I'm running, since it is
> > > > vanilla mainline commits.ÂÂNo code changes at this end
> > > > whatsoever.ÂÂI did the bisect on vanilla mainline.ÂÂAll I took from
> > > > yocto was their ".config"
> > > >
> > > > To recap, v4.1-rc5-21-g9dac62909451 works,ÂÂv4.1-rc5-22-
> > > > g9cd25aac1f44 fails, and v4.5-rc6 also fails.ÂÂIf pat_init() isn't
> > > > called then this is a bug in current mainline.ÂÂI'll have a look
> > > > later myself and see if I can trace out how we expect to get to
> > > > pat_init() and how that might be skipped inadvertently unless
> > > > someone beats me to it.
> > >
> > > Oh, I see. ÂCan you send me the ".config" file?
> >
> > And also an output of /proc/cpuinfo, please?
>
> Host?ÂÂGuest?ÂÂBoth?

Guest.


> > I think I know what's going on. ÂI noticed that you have the following
> > message in your dmesg files.
> >
> > Â[ÂÂÂÂ0.000000] MTRR: Disabled
> >
> > MTRR is set to disabled when your CPU is Intel but does not support
> > MTRR.
>
> I've run the test on a modern expensive xeon, a 4-5 year old xeon, and
> on an old pentium dual core (the cheaper dumbed down core2-duo that
> doesn't support virtualization) from around 2007.ÂÂIn all cases the
> result was the same.ÂÂPerhaps that is because the qemu launch script
> appears to set the CPU type regardless?ÂÂ(it uses "-cpu qemu32" but I
> confess that I do not know exactly what silicon that tries to emulate).

There is a matter of how qemu emulates CPU features. ÂThere is no such
Intel CPU that supports PAT w/o MTRR. ÂThis is why the current code assumes
this dependency.

> > ÂPerhaps, QEMU does not emulate MTRR?
>
> I will be the 1st to admit that I am not a seasoned qemu user, so I've
> no idea if the above is true.ÂÂI still prefer testing on real hardware,
> even if that comes across as "old school".ÂÂ:)

We can check it with /proc/cpuinfo on a guest.Â

> > pat_init() is not called when MTRR is disabled. ÂI think this
> > dependency is wrong, and it needs to be fixed.
> >
> > This issue has been there for a long time, and you have been running
> > essentially as PAT disabled in the past. ÂThe commit in question simply
> > detected this issue.
>
> OK, that sounds good -- in that it seems we are finally getting to the
> bottom of what happened here.ÂÂAny thoughts on why built-in vs. modular
> somehow managed to mask the issue?

No idea. ÂI do not think uvesafb being built-in vs. module has anything to
do with it. ÂBut we need to verify your /proc/cpuinfo to be sure. ÂI will
work on the PAT fix once this issue is confirmed.

For the time being, please use "nopat" boot option to workaround this
issue. ÂThis keeps the PAT state consistent as disabled in your env.

Thanks,
-Toshi