Re: [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug

From: Lai Jiangshan
Date: Fri Jan 16 2015 - 03:03:14 EST


On 01/16/2015 01:22 PM, Yasuaki Ishimatsu wrote:
> Hi Lai,
>
> Thanks you for posting the patch-set.
>
> I'll try your it next Monday. So, please wait a while.
>

I think it is just waste for testing before the maintainer make the decision.
(discussions/ideas are welcome.)

Before TJ's decision, maybe you can do this at first:

"Make numa code maintain the mapping to the best of its knowledge and
invoke notification callbacks when it changes. "

Thanks,
Lai

> Thanks,
> Yasuaki Ishimatsu
>
>
> (2015/01/14 17:54), Lai Jiangshan wrote:
>> Hi, All
>>
>> This patches are un-changloged, un-compiled, un-booted, un-tested,
>> they are just shits, I even hope them un-sent or blocked.
>>
>> The patches include two -solutions-:
>>
>> Shit_A:
>> workqueue: reset pool->node and unhash the pool when the node is
>> offline
>> update wq_numa when cpu_present_mask changed
>>
>> kernel/workqueue.c | 107 +++++++++++++++++++++++++++++++++++++++++------------
>> 1 file changed, 84 insertions(+), 23 deletions(-)
>>
>>
>> Shit_B:
>> workqueue: reset pool->node and unhash the pool when the node is
>> offline
>> workqueue: remove wq_numa_possible_cpumask
>> workqueue: directly update attrs of pools when cpu hot[un]plug
>>
>> kernel/workqueue.c | 135 +++++++++++++++++++++++++++++++++++++++--------------
>> 1 file changed, 101 insertions(+), 34 deletions(-)
>>
>>
>> Both patch1 of the both solutions are: reset pool->node and unhash the pool,
>> it is suggested by TJ, I found it is a good leading-step for fixing the bug.
>>
>> The other patches are handling wq_numa_possible_cpumask where the solutions
>> diverge.
>>
>> Solution_A uses present_mask rather than possible_cpumask. It adds
>> wq_numa_notify_cpu_present_set/cleared() for notifications of
>> the changes of cpu_present_mask. But the notifications are un-existed
>> right now, so I fake one (wq_numa_check_present_cpumask_changes())
>> to imitate them. I hope the memory people add a real one.
>>
>> Solution_B uses online_mask rather than possible_cpumask.
>> this solution remove more coupling between numa_code and workqueue,
>> it just depends on cpumask_of_node(node).
>>
>> Patch2_of_Solution_B removes the wq_numa_possible_cpumask and add
>> overhead when cpu hot[un]plug, Patch3 reduce this overhead.
>>
>> Thanks,
>> Lai
>>
>>
>> Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx>
>> Cc: Tejun Heo <tj@xxxxxxxxxx>
>> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx>
>> Cc: "Gu, Zheng" <guz.fnst@xxxxxxxxxxxxxx>
>> Cc: tangchen <tangchen@xxxxxxxxxxxxxx>
>> Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@xxxxxxxxxxxxxx>
>>
>
>
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/