Re: [PATCH -mm 11/14] bootmem: respect goal more likely

From: Johannes Weiner
Date: Thu Jun 05 2008 - 12:42:17 EST


Hi,

Yasunori Goto <y-goto@xxxxxxxxxxxxxx> writes:

> I gotcha! :-)
>
> max -= PFN_DOWN(bdata->node_boot_start);
> start -= PFN_DOWN(bdata->node_boot_start);
> + fallback -= PFN_DOWN(bdata->node_boot_start);
>
> if (bdata->last_success > start) {
> - /* Set goal here to trigger a retry on failure */
> - start = goal = ALIGN(bdata->last_success, step);
> + fallback = start; -------------------------------- (*)
> + start = ALIGN(bdata->last_success, step);
> }
>
> (*) is root cause. "fallback" is set as 0, because start is index of bitmap
> at here. When normal zone is allocated first, and DMA zone is required by
> alloc_bootmem_low() later and first page is free yet, fallback is set as 0.
>
> + if (fallback) {
> + start = ALIGN(fallback, step);
> + fallback = 0;
> + goto find_block;
> + }
> +
>
> As a result, this retry code is skipped, and alloc_bootmem_low() fails.
> So, when I change here from fallback to a retry_flag, my box can boot
> up.

Ah, you got there too :-)

Here is the original Email I wrote but waited to send out until after I
slept:

---

Hi Yasunori,

here is hopefully the fix to your bootup problem. Let me explain:

This is the last but one bootmem allocation on your box:

Jun 5 11:50:43 localhost kernel: bootmem::alloc_bootmem_core nid=0 size=40000 [16 pages] align=80 goal=100000000 limit=0
Jun 5 11:50:43 localhost kernel: bootmem::__reserve nid=0 start=1020588 end=1020598 flags=1

(->last_success is at 1020588 now)

And this is the last one:

Jun 5 11:50:43 localhost kernel: bootmem::alloc_bootmem_core nid=0 size=8000 [2 pages] align=80 goal=0 limit=ffffffff

Goal is zero. So sidx is set to zero as well. last_success is bigger
than sidx, so fallback gets the value of sidx and sidx is updated to
last_succes. But now sidx (1020588) is bigger than the limit (3ffff)
and we fail the first search iteration because of that. We should
fall back to the original sidx stored in fallback now but take a look
at what value fallback must have for this branch to be taken...

I am pretty sure this is the bug you encountered. I spotted it in the
code but your logfile proves my theory.

Hannes

--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -444,7 +444,11 @@ static void * __init alloc_bootmem_core(
midx = max - PFN_DOWN(bdata->node_boot_start);

if (bdata->last_success > sidx) {
- fallback = sidx;
+ /*
+ * Handle the valid case of sidx being zero and still
+ * catch the fallback below.
+ */
+ fallback = sidx + 1;
sidx = ALIGN(bdata->last_success, step);
}

@@ -493,7 +497,7 @@ find_block:
}

if (fallback) {
- sidx = ALIGN(fallback, step);
+ sidx = ALIGN(fallback - 1, step);
fallback = 0;
goto find_block;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/