Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()
From: Johannes Weiner
Date: Tue Apr 15 2008 - 07:51:40 EST
Hi,
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes:
> On Tue, 15 Apr 2008 00:28:34 -0700 "Yinghai Lu" <yhlu.kernel@xxxxxxxxx> wrote:
>
>> On Tue, Apr 15, 2008 at 12:15 AM, Andrew Morton
>> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>> >
>> > On Tue, 15 Apr 2008 00:04:03 -0700 "Yinghai Lu" <yhlu.kernel@xxxxxxxxx> wrote:
>> >
>> > > On Mon, Apr 14, 2008 at 11:23 PM, Andrew Morton
>> > > <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>> > > >
>> > > > On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
>> > > >
>> > > > > Johannes Weiner <hannes@xxxxxxxxxxxx> writes:
>> > > > >
>> > > > > > Make free_bootmem() look up the node holding the specified address
>> > > > > > range which lets it work transparently on single-node and multi-node
>> > > > > > configurations.
>> > > > >
>> > > > > Acked-by: Andi Kleen <andi@xxxxxxxxxxxxxx>
>> > > > >
>> > > > > This is far better than the original change it replaces and which
>> > > > > I also objected to in review.
>> > > > >
>> > > >
>> > > > So... do we think these two patches are sufficiently safe and important for
>> > > > 2.6.25?
>> > >
>> > > the patch is wrong
>> > >
>> >
>> > The last I saw was this:
>> >
>> >
>> > On Sun, 13 Apr 2008 12:57:22 +0200 Johannes Weiner <hannes@xxxxxxxxxxxx> wrote:
>> >
>> > > Hi,
>> > >
>> > > "Yinghai Lu" <yhlu.kernel@xxxxxxxxx> writes:
>> > >
>> > > > On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@xxxxxxxxxxxx> wrote:
>> > > > ...
>> >
>> > > >
>> > > > could have chance that bootmem with reserved_early that is crossing
>> > > > the nodes.
>> > >
>> > > Upstream reserve_bootmem_core() would BUG() on a caller trying to cross
>> > > nodes, so I don't see where this chance could come from.
>> >
>> > Is that what you're referring to?
>> >
>> > Was Johannes observation incorrect? If so, why?
>>
>> my patch with free_bootmem will make sure free_bootmem_core only free
>> bootmem in the bdata scope.
>> so free_bootmem can handle the cross_node bootmem that is done by
>> reserve_early ( done in another patch, is dropped by you because took
>> Jonannes).
>>
>> in setup_arch for x86_64 there is one free_bootmem that is used when
>> ramdisk is falled out of ram map. that could be crossed by bootloader
>> and kexec, and kernel or second kernel is memmap=NN@SS to execlue some
>> memory.
>>
>> anyway that is extrem case, but my patch could handle that.
Has this case ever occured? If this could become real, I have no
objections to implement a way to handle it (why would I?), but until now
you just said that in some time in the future, this could be useful.
>>
>> I wonder if any regression caused by my previous patch? maybe on other platform?
>>
>
> Not that I'm aware of.
It papers over buggy usage of free_bootmem(). If its arguments are
bogus, it will just return again where it BUG()ed out before. The pages
might be never marked free and therefor never reach the buddy allocator.
> I restored mm-make-reserve_bootmem-can-crossed-the-nodes.patch. Johannes,
> can you please check 2.6.28-rc8-mm2, see if it looks OK?
I object to the way it is implemented. If it is really needed, that
should be done properly:
- remove the double loop over the area on the likely succeeding
path and unroll the reserving on the unlikely path as it was
done before. Better to punish exceptional branches than
the working paths.
- make reserve_bootmem_core be strict with its arguments. If
you want to iterate over the bdata list, you should not just
throw every item at the _core functions and let them work it
out for themselves. The correct parameters should be
calculated in advance and then passed to a strict
_bootmem_core() function that BUG()s on failure.
But still, Yinghai, you never brought in practical reasons for this
whole thing. You talked about extreme and theoretical cases and I don't
think that this justifies breaking API or pessimizing code at all.
Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/