Re: [PATCH] nvme-pci: do not set the NUMA node of device if it has none
From: Keith Busch
Date: Wed Jul 26 2023 - 18:26:06 EST
On Wed, Jul 26, 2023 at 09:32:33PM +0200, Pratyush Yadav wrote:
> On Wed, Jul 26 2023, Keith Busch wrote:
> > Could you send the output of:
> >
> > numactl --hardware
>
> $ numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
> node 0 size: 245847 MB
> node 0 free: 245211 MB
> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
> node 1 size: 245932 MB
> node 1 free: 245328 MB
> node distances:
> node 0 1
> 0: 10 21
> 1: 21 10
>
> >
> > and then with and without your patch:
> >
> > for i in $(cat /proc/interrupts | grep nvme0 | sed "s/^ *//g" | cut -d":" -f 1); do \
> > cat /proc/irq/$i/{smp,effective}_affinity_list; \
> > done
>
> Without my patch:
>
> $ for i in $(cat /proc/interrupts | grep nvme0 | sed "s/^ *//g" | cut -d":" -f 1); do \
> > cat /proc/irq/$i/{smp,effective}_affinity_list; \
> > done
Hm, I wonder if there's something wrong with my script. All the cpu's
should be accounted for in the smp_affinity_list, assuming it captured
all the vectors of the nvme device, but both examples are missing half
the CPUs. It looks like you have 32 vectors. Does that sound right?
This does show the effective affinity is indeed always on node 0 without
your patch. I don't see why, though: the "group_cpus_evenly()" function
that spreads the interrupts doesn't know anything about the device the
resource is being grouped for, so it shouldn't even take its NUMA node
into consideration. It's just supposed to ensure all CPUs have a shared
resource, preferring to not share across numa nodes.
I'll emulate a similar CPU topology with similar nvme vector count and
see if I can find anything suspicious. I'm a little concerned we may
have the same problem for devices that have an associated NUMA node that
your patch isn't addressing.
> 41
> 40
> 33
> 33
> 44
> 44
> 9
> 9
> 32
> 32
> 2
> 2
> 6
> 6
> 11
> 11
> 1
> 1
> 35
> 35
> 39
> 39
> 13
> 13
> 42
> 42
> 46
> 46
> 41
> 41
> 46
> 46
> 15
> 15
> 5
> 5
> 43
> 43
> 0
> 0
> 14
> 14
> 8
> 8
> 12
> 12
> 7
> 7
> 10
> 10
> 47
> 47
> 38
> 38
> 36
> 36
> 3
> 3
> 34
> 34
> 45
> 45
> 5
> 5
>
> With my patch:
>
> $ for i in $(cat /proc/interrupts | grep nvme0 | sed "s/^ *//g" | cut -d":" -f 1); do \
> > cat /proc/irq/$i/{smp,effective}_affinity_list; \
> > done
> 9
> 9
> 15
> 15
> 5
> 5
> 23
> 23
> 38
> 38
> 52
> 52
> 21
> 21
> 36
> 36
> 13
> 13
> 56
> 56
> 44
> 44
> 42
> 42
> 31
> 31
> 48
> 48
> 5
> 5
> 3
> 3
> 1
> 1
> 11
> 11
> 28
> 28
> 18
> 18
> 34
> 34
> 29
> 29
> 58
> 58
> 46
> 46
> 54
> 54
> 59
> 59
> 32
> 32
> 7
> 7
> 56
> 56
> 62
> 62
> 49
> 49
> 57
> 57