Re: [driver-core PATCH v6 9/9] libnvdimm: Schedule device registration on node local to the device

From: Bart Van Assche
Date: Tue Nov 27 2018 - 15:33:06 EST

On Tue, 2018-11-27 at 11:34 -0800, Dan Williams wrote:
+AD4 On Tue, Nov 27, 2018 at 10:04 AM Alexander Duyck
+AD4 wrote:
+AD4 +AD4
+AD4 +AD4 On Mon, 2018-11-26 at 18:21 -0800, Dan Williams wrote:
+AD4 +AD4 +AD4 On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck
+AD4 +AD4 +AD4 wrote:
+AD4 +AD4 +AD4 +AD4
+AD4 +AD4 +AD4 +AD4 Force the device registration for nvdimm devices to be closer to the actual
+AD4 +AD4 +AD4 +AD4 device. This is achieved by using either the NUMA node ID of the region, or
+AD4 +AD4 +AD4 +AD4 of the parent. By doing this we can have everything above the region based
+AD4 +AD4 +AD4 +AD4 on the region, and everything below the region based on the nvdimm bus.
+AD4 +AD4 +AD4 +AD4
+AD4 +AD4 +AD4 +AD4 By guaranteeing NUMA locality I see an improvement of as high as 25+ACU for
+AD4 +AD4 +AD4 +AD4 per-node init of a system with 12TB of persistent memory.
+AD4 +AD4 +AD4 +AD4
+AD4 +AD4 +AD4
+AD4 +AD4 +AD4 It seems the speed-up is achieved with just patches 1, 2, and 9 from
+AD4 +AD4 +AD4 this series, correct? I wouldn't want to hold up that benefit while
+AD4 +AD4 +AD4 the driver-core bits are debated.
+AD4 +AD4
+AD4 +AD4 Actually patch 6 ends up impacting things for persistent memory as
+AD4 +AD4 well. The problem is that all the async calls to add interfaces only do
+AD4 +AD4 anything if the driver is already loaded. So there are cases such as
+AD4 +AD4 the X86+AF8-PMEM+AF8-LEGACY+AF8-DEVICE case where the memory regions end up still
+AD4 +AD4 being serialized because the devices are added before the driver.
+AD4 Ok, but is the patch 6 change generally useful outside of the
+AD4 libnvdimm case? Yes, local hacks like MODULE+AF8-SOFTDEP are terrible for
+AD4 global problems, but what I'm trying to tease out if this change
+AD4 benefits other async probing subsystems outside of libnvdimm, SCSI
+AD4 perhaps? Bart can you chime in with the benefits you see so it's clear
+AD4 to Greg that the driver-core changes are a generic improvement?

Hi Dan,

For SCSI asynchronous probing is really important because when scanning SAN
LUNs there is plenty of potential for concurrency due to the network delay.

I think the following quote provides the information you are looking for:

+ACI-This patch reduces the time needed for loading the scsi+AF8-debug kernel
module with parameters delay+AD0-0 and max+AF8-luns+AD0-256 from 0.7s to 0.1s. In
other words, this specific test runs about seven times faster.+ACI


Best regards,