Re: Fwd: [PATCH] x86: Use larger chunks in mtrr_cleanup
From: Luis R. Rodriguez
Date: Tue Mar 29 2016 - 13:08:07 EST
On Wed, Mar 16, 2016 at 1:20 PM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
> On Thu, Nov 05, 2015 at 11:43:59AM -0800, Luis R. Rodriguez wrote:
>> On Thu, Nov 5, 2015 at 11:14 AM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>> > On Mon, Sep 14, 2015 at 7:46 AM, Stuart Hayes <stuart.w.hayes@xxxxxxxxx> wrote:
>> >>
>> >> Booting with 'disable_mtrr_cleanup' works, but the system I am working with
>> >> isn't actually failing--it just gets ugly error messages. And the BIOS on the
>> >> system I am working with had set up the MTRRs correctly.
>> >
>> > Please post boot log and /proc/mtrr for:
>> > 1. without your patch
>> > 2. without your patch and with disable_mtrr_cleanup in boot command line.
>> > 3. with your patch.
>>
>> Stuart,
>>
>> to provide some context -- I reached out to Yinghai as he wrote the
>> original mtrr cleanup code. The commit logs seem to read that a crash
>> was possible on systems with > 4 GiB RAM with some types of BIOSes...
>> The cleanup code seems to trigger when variable MTRRs do not exist
>> that are UC, or when all varible MTRRs that exist are just UC + WB
>> (Yinghai correct me if I'm wrong). The commit log in question
>> (95ffa2438d0e9 "x86: mtrr cleanup for converting continuous to
>> discrete layout, v8") was not very clear about the cause of the crash
>> -- but suppose the issue here was the BIOS on some systems might want
>> to create some UC variable MTRRs early on and there was no UC MTRRs
>> available, and I can only guess the cleanup exists as hack for those
>> BIOSes. Even if that was the case -- its still not clear *why* the
>> crash would happen but I suppose a driver mishap can happen without UC
>> guarantees for some devices the BIOS may want to enable UC MTRR on.
>>
>> To be able to determine what we do upstream we need to understand the
>> above first. We also need to understand if the cleanup might also be
>> implicated by userspace drivers using /proc/mtrr, or if a proprietary
>> driver exists that does use mtrr_add() directly even though PAT has
>> been available for ages and all drivers are now properly converted.
>>
>> With clear answers to the above we'll be able to determine what the
>> right course of action should be for this patch. For instance I'm
>> inclined to strive to disable the complex cleanup code if we don't
>> need it anymore, but if we do need it your patch makes sense. If the
>> patch makes sense then though are we going to have to keep updating
>> the segment size *every time* as systems grow? That seems rather
>> silly. And if PAT is prevalent why are vendors adding MTRRs still? The
>> cleanup seems complex and a major hack for a fix for some BIOSes, I'd
>> much rather identify the exact issue and only have a fix to address
>> that case.
>
> I never heard back... so let's take this up on the other thread I just
> raised.
Although we never got an answer, it seems now its clear, at least
Toshi has provided a confirmation that the cleanup is no longer needed
given that "cleanup was needed to allocate more free slots to MTRRs.
We do not need to care about the number of free slots as long as the
kernel does not insert any new entry to MTRRs" [0]. So instead of
enhancing the cleanup code for larger systems we should now just
remove this cleanup code completely now.
[0] http://lkml.kernel.org/r/1458336958.6393.544.camel@xxxxxxx
Luis