Re: [workqueue/driver-core PATCH v2 4/5] driver core: Attach devices on CPU local to device node
From: Alexander Duyck
Date: Thu Oct 11 2018 - 11:50:43 EST
On 10/11/2018 3:45 AM, Greg KH wrote:
On Wed, Oct 10, 2018 at 04:08:40PM -0700, Alexander Duyck wrote:
This change makes it so that we call the asynchronous probe routines on a
CPU local to the device node. By doing this we should be able to improve
our initialization time significantly as we can avoid having to access the
device from a remote node which may introduce higher latency.
This is nice in theory, but what kind of real numbers does this show?
There's a lot of added complexity here, and what is the benifit?
Benchmarks or bootcharts that we can see would be great to have, thanks.
greg k-h
In the case of persistent memory init the cost for getting the wrong
node is pretty significant. On my test system with 3TB per node just
getting the initialization node matched up to the memory node dropped
initialization time per node from 39 seconds down to about 26 seconds
per node.
We are already starting to see code like this pop up in subsystems
anyway. For example the PCI code already has logic similar to what I am
adding here floating around in it[1]. I'm hoping that by placing this
change in the core device code we could start consolidating it so we
don't have all the individual drivers or subsystems implementing their
own NUMA specific init logic.
This is likely going to become more of an issue in the future as we now
have CPUs like the AMD Ryzen Threadripper out there that have people
starting to discuss NUMA in the consumer space.
- Alex
[1]
https://elixir.bootlin.com/linux/v4.19-rc7/source/drivers/pci/pci-driver.c#L331