Re: [PATCH v2 0/3] Add NUMA-node-aware synchronous probing to driver core

From: Jinhui Guo

Date: Mon Jan 26 2026 - 04:19:27 EST


On Fri Jan 23, 2026 17:04:27 -0800, Dan Williams wrote:
> Jinhui Guo wrote:
> > Hi all,
> >
> > ** Overview **
> >
> > This patchset introduces NUMA-node-aware synchronous probing.
> >
> > Drivers can initialize and allocate memory on the device’s local
> > node without scattering kmalloc_node() calls throughout the code.
> > NUMA-aware probing was added to PCI drivers in 2005 and has
> > benefited them ever since.
> >
> > The asynchronous probe path already supports NUMA-node-aware
> > probing via async_schedule_dev() in the driver core. Since NUMA
> > affinity is orthogonal to sync/async probing, this patchset adds
> > NUMA-node-aware support to the synchronous probe path.
> >
> > ** Background **
> >
> > The idea arose from a discussion with Bjorn and Danilo about a
> > PCI-probe issue [1]:
> >
> > when PCI devices on the same NUMA node are probed asynchronously,
> > pci_call_probe() calls work_on_cpu(), pins every probe worker to
> > the same CPU inside that node, and forces the probes to run serially.
> >
> > Testing three NVMe devices on the same NUMA node of an AMD EPYC 9A64
> > 2.4 GHz processor (all on CPU 0):
> >
> > nvme 0000:01:00.0: CPU: 0, COMM: kworker/0:1, probe cost: 53372612 ns
> > nvme 0000:02:00.0: CPU: 0, COMM: kworker/0:2, probe cost: 49532941 ns
> > nvme 0000:03:00.0: CPU: 0, COMM: kworker/0:3, probe cost: 47315175 ns
> >
> > Since the driver core already provides NUMA-node-aware asynchronous
> > probing, we can extend the same capability to the synchronous probe
> > path. This solves the issue and lets other drivers benefit from
> > NUMA-local initialization as well.
>
> I like that from a global benefit perspective, but not necessarily from
> a regression perspective. Is there a minimal fix to PCI to make its
> current workqueue unbound, then if that goes well come back and move all
> devices into this scheme?

Hi Dan,

Thank you for your time, and apologies for the delayed reply.

I understand your concerns about stability and hope for better PCI regression
handling. However, I believe introducing NUMA-node awareness to the driver
core's asynchronous probe path is the better solution:

1. The asynchronous path already uses async_schedule_dev() with queue_work_node()
to bind workers to specific NUMA nodes—this causes no side effects to driver
probing.
2. I initially submitted a PCI-only fix [1], but handling asynchronous probing in
PCI driver proved difficult. Using current_is_async() works but feels fragile.
After discussions with Bjorn and Danilo [2][3], moving the solution to driver
core makes distinguishing async/sync probing straightforward. Testing shows
minimal impact on synchronous probe time.
3. If you prefer a PCI-only approach, we could add a flag in struct device_driver
(default false) that PCI sets during registration. This limits the new path to
PCI devices while others retain existing behavior. The extra code is ~10 lines
and can be removed once confidence is established.
4. I'm committed to supporting this: I'll include "Fixes:" tags for any fallout
and provide patches within a month of any report. Since the logic mirrors the
core async helper, risk should be low—but I'll take full responsibility
regardless.

Please let me know if you have other concerns.

[1] https://lore.kernel.org/all/20251230142736.1168-1-guojinhui.liam@xxxxxxxxxxxxx/
[2] https://lore.kernel.org/all/20251231165503.GA159243@bhelgaas/
[3] https://lore.kernel.org/all/DFFXIZR1AGTV.2WZ1G2JAU0HFQ@xxxxxxxxxx/

Best Regards,
Jinhui