Re: Intel's response Linux/MTRR/8GB Memory Support / Why doesn't the kernel realize the BIOS has problems and re-map appropriately?

From: Jesse Barnes
Date: Mon Jun 04 2007 - 11:54:32 EST


On Monday, June 4, 2007 8:48:37 Ray Lee wrote:
> On 6/4/07, Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx> wrote:
> > On Sunday, June 3, 2007 2:15:06 Matt Keenan wrote:
> > > Justin Piszcz wrote:
> > > > On Sat, 2 Jun 2007, Andi Kleen wrote:
> > > >>> I feel, having a silent/transparent workaround is not a good idea.
> > > >>> With that
> > > >>
> > > >> If enough RAM is chopped off users will notice. They tend to
> > > >> complain when they miss RAM. I don't like panic very much because
> > > >> for many users it will be a show stopper (even when they are not
> > > >> blessed with "quiet" boots like some distributions do)
> > > >>
> > > >> The message in dmesg could be also emphasized a bit with a little
> > > >> ASCII art (but no <blink> tag in there)
> > > >>
> > > >> The problem I'm more worried about is if the system will be really
> > > >> stable --- could it be that the memory controller is still
> > > >> misconfigured and cause other stability issues? (we've had such
> > > >> cases in the past). Also I'm not sure we can handle the case of
> > > >> the MTRR wrong not at the end of memory but at the hole sanely.
> > > >>
> > > >> -Andi
> > > >
> > > > So far I have been booting with mem=8832M and have run stress/loaded
> > > > the memory subsystem pretty good; what other tests should I run?
> > > >
> > > > It'd be nice if we could pose some sort of solution/warning for the
> > > > future so other people do not have to experience the same problems.
> > > >
> > > > What are the next steps?
> > >
> > > Wouldn't it be possible for the e820/MTRR set up code detect the
> > > problem and suggest a mem=xxxx that would fix the problem (while also
> > > complaining that the BIOS is broken)?
> >
> > Yes, that should be fairly easy, though as Andi points out, if there are
> > holes in the MTRR setup, things get a little trickier (I had an earlier
> > patch to deal with this, but ended up with too many early boot issues).
> >
> > Maybe what Venki suggested would be best: just detect the condition and
> > panic, with a string telling the user to use mem=xxx (we can figure that
> > out) and/or upgrade their BIOS.
>
> Ick. Systems that used to boot fine would then panic on a kernel
> upgrade. That's rather rude for a condition that's merely an
> optimization (using all memory), rather than one of correctness. A
> panic seems entirely inappropriate.

No, existing kernels would have been so slow as to be nearly unusable on
machines with this problem. Reducing the amount of available memory
automatically might work in most cases, but as Venki pointed out, people will
have to check their logs to notice that anything is wrong.

But I don't have a strong preference, maybe just a boot time message (with
suitably obnoxious ascii art) would be sufficient.

Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/