Re: divide error in select_task_rq_fair()
From: Yinghai Lu
Date: Sat Nov 13 2010 - 20:16:52 EST
On Thu, Nov 11, 2010 at 10:28 AM, Myron Stowe <myron.stowe@xxxxxx> wrote:
> On Fri, 2010-11-05 at 07:17 +0100, Eric Dumazet wrote:
>> Le jeudi 04 novembre 2010 à 20:00 -0600, Bjorn Helgaas a écrit :
>>
>> > Is that going to help you debug the problem? The solution is not going
>> > to be something like "set NR_CPUS=x". If NR_CPUS is too small, the
>> > machine should still *boot*, even if we can't use all the CPUs in the
>> > box.
>> >
>>
>> Yes, it will help to understand the layout of cpu / domains and make
>> appropriate changes.
>>
>> Alternative is you send me such a machine :=)
>
> I opened a BZ on this issue as it seems to be a regression -
> https://bugzilla.kernel.org/show_bug.cgi?id=22662
>
> I also, as indicated in the BZ, bisected the kernel which gave the
> following results and reverting 50f2d7f682f9c0ed58191d0982fe77888d59d162
> did re-enable booting on the box in question (an HP dl980g7). Let me
> know what further info you need or patches to test for debugging this.
>
> Thanks,
>
> commit 50f2d7f682f9c0ed58191d0982fe77888d59d162
> Author: Nikanth Karthikesan <knikanth@xxxxxxx>
> Date: Thu Sep 30 17:34:10 2010 +0530
>
> x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA
>
> commit d9c2d5ac6af87b4491bff107113aaf16f6c2b2d9 "x86, numa: Use near(er)
> online node instead of roundrobin for NUMA" changed NUMA initialization on
> Intel to choose the nearest online node or first node. Fake NUMA would be
> better of with round-robin initialization, instead of the all CPUS on
> first node. Change the choice of first node, back to round-robin.
>
> For testing NUMA kernel behaviour without cpusets and NUMA aware
> applications, it would be better to have cpus in different nodes, rather
> than all in a single node. With cpusets migration of tasks scenarios
> cannot not be tested.
>
> I guess having it round-robin shouldn't affect the use cases for all cpus
> on the first node.
>
> The code comments in arch/x86/mm/numa_64.c:759 indicate that this used to
> be the case, which was changed by commit d9c2d5ac6. It changed from
> roundrobin to nearer or first node. And I couldn't find any reason for
> this change in its changelog.
>
> Signed-off-by: Nikanth Karthikesan <knikanth@xxxxxxx>
> Cc: David Rientjes <rientjes@xxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
please check
http://lkml.org/lkml/2010/11/13/176
Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/