Re: [PATCH] x86/pci: make pci_mem_start to be aligned only -v4

From: Ingo Molnar
Date: Sat Apr 18 2009 - 05:23:42 EST



* Yinghai Lu <yinghai@xxxxxxxxxx> wrote:

> Linus Torvalds wrote:
> >
> > On Thu, 16 Apr 2009, Yinghai Lu wrote:
> >> please check.
> >>
> >> [PATCH] x86/pci: make pci_mem_start to be aligned only -v4
> >
> > I like the approach. That said, I think that rather than do the "modify
> > the e820 array" thing, why not just do it in the in the resource tree, and
> > do it at "e820_reserve_resources_late()" time?
> >
> > IOW, something like this.
> >
> > TOTALLY UNTESTED! The point is to take all RAM resources we haev, and
> > _after_ we've added all the resources we've seen in the E820 tree, we then
> > _also_ try to add fake reserved entries for any "round up to X" at the end
> > of the RAM resources.
> >
> > NOTE! I really didn't want to use "reserve_region_with_split()". I didn't
> > want to recurse into any conflicting resources, I really wanted to just do
> > the other failure cases.
> >
> > THIS PATCH IS NOT MEANT TO BE USED. Just a rough "almost like this" kind
> > of thing. That includes the rough draft of how much to round things up to
> > based on where the end of RAM region is etc. I'm really throwing this out
> > more as a "wouldn't this be a readable way to handle any missing reserved
> > entries" kind of thing..
> >
> > Linus
> >
> > ---
> > arch/x86/kernel/e820.c | 34 ++++++++++++++++++++++++++++++++++
> > 1 files changed, 34 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > index ef2c356..e8b8d33 100644
> > --- a/arch/x86/kernel/e820.c
> > +++ b/arch/x86/kernel/e820.c
> > @@ -1370,6 +1370,23 @@ void __init e820_reserve_resources(void)
> > }
> > }
> >
> > +/* How much should we pad RAM ending depending on where it is? */
> > +static unsigned long ram_alignment(resource_size_t pos)
> > +{
> > + unsigned long mb = pos >> 20;
> > +
> > + /* To 64kB in the first megabyte */
> > + if (!mb)
> > + return 64*1024;
> > +
> > + /* To 1MB in the first 16MB */
> > + if (mb < 16)
> > + return 1024*1024;
> > +
> > + /* To 32MB for anything above that */
> > + return 32*1024*1024;
> > +}
> > +
> > void __init e820_reserve_resources_late(void)
> > {
> > int i;
> > @@ -1381,6 +1398,23 @@ void __init e820_reserve_resources_late(void)
> > insert_resource_expand_to_fit(&iomem_resource, res);
> > res++;
> > }
> > +
> > + /*
> > + * Try to bump up RAM regions to reasonable boundaries to
> > + * avoid stolen RAM
> > + */
> > + for (i = 0; i < e820.nr_map; i++) {
> > + struct e820entry *entry = &e820_saved.map[i];
> > + resource_size_t start, end;
> > +
> > + if (entry->type != E820_RAM)
> > + continue;
> > + start = entry->addr + entry->size;
> > + end = round_up(start, ram_alignment(start));
> > + if (start == end)
> > + continue;
> > + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer");
> > + }
> > }
> >
> > char *__init default_machine_specific_memory_setup(void)
>
> except need to change
> > + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer");
> ==> > + reserve_region_with_split(&iomem_resource, start, end - 1, "RAM buffer");
>
> it will make sure dynmical allocating code will not use those range.
>
> and could make e820_setup_gap much simple.
>
> ---
> arch/x86/kernel/e820.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> Index: linux-2.6/arch/x86/kernel/e820.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/e820.c
> +++ linux-2.6/arch/x86/kernel/e820.c
> @@ -635,14 +635,12 @@ __init void e820_setup_gap(void)
> #endif
>
> /*
> - * See how much we want to round up: start off with
> - * rounding to the next 1MB area.
> + * e820_reserve_resources_late will protect stolen RAM
> + * so just round it to 1M
> */
> round = 0x100000;
> - while ((gapsize >> 4) > round)
> - round += round;
> - /* Fun with two's complement */
> - pci_mem_start = (gapstart + round) & -round;
> +
> + pci_mem_start = roundup(gapstart, round);
>
> printk(KERN_INFO
> "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n",
>
> Ingo, can you put those two patches in tip?

I think the point would be to explore the possibility to have no
'gap' logic at all - we should extend the e820 table with Linus's
scheme to add 'RAM buffer' entries.

That way, if you search for a sufficient size hole, it will be
correctly aligned straight away - with no rounding/gap logic at all.
(Maybe add a debug warning to catch when this is not the case.)

Am i missing something?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/