Re: [PATCH] workqueue: update numa affinity when node hotplug

From: Kamezawa Hiroyuki
Date: Tue Mar 03 2015 - 01:56:24 EST

On 2015/03/03 1:28, Tejun Heo wrote:

On Mon, Mar 02, 2015 at 05:41:05PM +0900, Kamezawa Hiroyuki wrote:
Let me start from explaining current behavior.

- cpu-id is determined when a new processor(lapicid/x2apicid) is founded.
cpu-id<->nodeid relationship is _not_ recorded.

Is this something from the firmware side or is it just that we aren't
maintaining the association right now?

I think it's not just maintained.

- node-id is determined when a new pxm(firmware info) is founded.
pxm<->nodeid relationship is recorded.

By this, there are 2 cases of cpu<->nodeid change.

Case A) In x86, cpus on memory-less nodes are all tied to existing nodes(round robin).
At memory-hotadd happens and a new node comes, cpus are moved to a newly added node
based on pxm.

Ah, okay, so the firmware doesn't provide proximity information at all
for memory-less nodes so we end up putting all of them somewhere
random and when memory is added to one of the memory-less nodes, the
mapping information changes?

With memory-less node, proximity domain for processors are given but ignored.
When memory(node) hotplug happens, the information revisited and cpuid<->nodeid
relationship is updated.

Am I understanding it correctly? If so, it's super weird tho. Why
wouldn't there be proximity information for a memless node? Not
having memory doesn't mean it's at the same distance from all existing

Firmware gives pxm for memory-less node but it's ignored.
I'm not sure why the current implemetaion is.

Case B) Adding a node after removing another node, if pxm of them were different from
each other, cpu<->node relatiionship changes.

I don't get this either. Does proximity relationship actually change?
Or is it that we're assigning different IDs to the same thing?Isn't
proximity pretty much hardwired to how the system is architected to
begin with?

relationship between proximity domain and lapic id doesn't change.
relationship between lapic-id and cpu-id changes.

pxm <-> memory address : no change
pxm <-> lapicid : no change
pxm <-> node id : no change
lapicid <-> cpu id : change.

I personally thinks proper fix is building persistent cpu-id <-> lapicid relationship as
pxm does rather than creating band-aid.

Oh if this is possible, I agree that's the right direction too.

Implementation is a bit complicated now :(.



