Re: [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing the idlest CPU

From: Yu Chen
Date: Thu Sep 07 2017 - 04:32:06 EST


On Thu, Sep 07, 2017 at 07:54:09AM +0200, Thomas Gleixner wrote:
> On Thu, 7 Sep 2017, Yu Chen wrote:
> > On Wed, Sep 06, 2017 at 10:03:58AM +0200, Thomas Gleixner wrote:
> > > Can you please apply the debug patch below, boot the machine and right
> > > after login provide the output of
> > >
> > > # cat /sys/kernel/debug/tracing/trace
> > >
> > kworker/0:2-303 [000] .... 9.135467: msi_domain_alloc_irqs: dev: 0000:bb:00.0 nvec 1 virq 34
> > kworker/0:2-303 [000] .... 9.135476: msi_domain_alloc_irqs: dev: 0000:bb:00.0 nvec 1 virq 35
> > kworker/0:2-303 [000] .... 9.135484: msi_domain_alloc_irqs: dev: 0000:bb:00.0 nvec 1 virq 36
>
> <SNIP>
>
> > kworker/0:2-303 [000] .... 9.762268: msi_domain_alloc_irqs: dev: 0000:bb:00.3 nvec 1 virq 331
> > kworker/0:2-303 [000] .... 9.762278: msi_domain_alloc_irqs: dev: 0000:bb:00.3 nvec 1 virq 332
> > kworker/0:2-303 [000] .... 9.762288: msi_domain_alloc_irqs: dev: 0000:bb:00.3 nvec 1 virq 333
>
> That's 300 vectors.
>
> > bb:00.[0-3] Ethernet controller: Intel Corporation Device 37d0 (rev 03)
> >
> > -+-[0000:b2]-+-00.0-[b3-bc]----00.0-[b4-bc]--+-00.0-[b5-b6]----00.0
> > | | +-01.0-[b7-b8]----00.0
> > | | +-02.0-[b9-ba]----00.0
> > | | \-03.0-[bb-bc]--+-00.0
> > | | +-00.1
> > | | +-00.2
> > | | \-00.3
> >
> > and they are using i40e driver, the vectors should be reserved by:
> > i40e_probe() ->
> > i40e_init_interrupt_scheme() ->
> > i40e_init_msix() ->
> > i40e_reserve_msix_vectors() ->
> > pci_enable_msix_range()
> >
> > # ls /sys/kernel/debug/irq/irqs
> > 0 10 11 13 142 184 217 259 292 31 33
> > 337 339 340 342 344 346 348 350 352 354 356
> > 358 360 362 364 366 368 370 372 374 376 378
> > 380 382 384 386 388 390 392 394 4 6 7 9
> > 1 109 12 14 15 2 24 26 3 32 335
> > 338 34 341 343 345 347 349 351 353 355 357
> > 359 361 363 365 367 369 371 373 375 377 379
> > 381 383 385 387 389 391 393 395 5 67 8
>
> Out of these 300 interrupts exactly 8 randomly selected ones are actively
> used. And the other 292 interrupts are just there because it might need
> them in the future when the 32 CPU machine gets magically upgraded to 4096
> cores at runtime?
>
Humm, the 292 vectors remain disabled due to the network devices have
not been enabled(say,ifconfig up does not get invoked), so request_irq()
does not get invoked for these vectors? I have an impression that once
I've borrowed some fiber cables to connect the platform, the active IRQ
from i40e raised a lot, although I don't have these expensive cables
now...
> Can the i40e people @intel please fix this waste of resources and sanitize
> their interrupt allocation scheme?
>
> Please switch it over to managed interrupts so the affinity spreading
> happens in a sane way and the interrupts are properly managed on CPU
> hotplug.
Ok, I think currently in i40e driver the reservation of vectors
leverages pci_enable_msix_range() and did not provide the affinity
hit to low level IRQ system thus the managed interrupts is not enabled
there(although later in i40e driver we use irq_set_affinity_hint() to
spread the IRQs)

Thanks,
Yu
>
> Thanks,
>
> tglx