On Mon, 28 May 2018, Song Liu wrote:
This doesn't fix the issue with bnxt. Here is a trace with this patch:
 ^^^^^^^^^^^^
Thanks for providing the data!
> 19d... 1610359248us : vector_deactivate: irq=31 is_managed=0 can_reserve=1 reserve=0
> 19d... 1610359248us : vector_clear: irq=31 vector=33 cpu=20 prev_vector=0 prev_cpu=2
> 19d... 1610359249us : irq_matrix_free: bit=33 cpu=20 online=1 avl=201 alloc=0 managed=0 online_maps=56 global_avl=11241, global_rsvd=25, total_alloc=15
Here IRQ 31 is shutdown and the vector freed.
> 19d... 1610359249us : irq_matrix_reserve: online_maps=56 global_avl=11241, global_rsvd=26, total_alloc=15
> 19d... 1610359249us : vector_reserve: irq=31 ret=0
> 19d... 1610359249us : vector_config: irq=31 vector=239 cpu=0 apicdest=0x00000000
And set to the magic reservation vector 239 to catch spurious interrupts.
> 20dN.. 1610366654us : vector_activate: irq=31 is_managed=0 can_reserve=1 reserve=0
> 20dN.. 1610366654us : vector_alloc: irq=31 vector=4294967268 reserved=1 ret=0
> 20dN.. 1610366655us : irq_matrix_alloc: bit=33 cpu=9 online=1 avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240, global_rsvd=28, total_alloc=16
> 20dN.. 1610366655us : vector_update: irq=31 vector=33 cpu=9 prev_vector=0 prev_cpu=20
> 20dN.. 1610366656us : vector_alloc: irq=31 vector=33 reserved=1 ret=0
> 20dN.. 1610366656us : vector_config: irq=31 vector=33 cpu=9 apicdest=0x00000014
So here it gets initialized again and targets CPU9 now.
> 20dN.. 1610366662us : irq_matrix_alloc: bit=33 cpu=20 online=1 avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240, global_rsvd=28, total_alloc=16
> 20dN.. 1610366662us : vector_update: irq=31 vector=33 cpu=20 prev_vector=33 prev_cpu=9
> 20dN.. 1610366662us : vector_alloc: irq=31 vector=33 reserved=1 ret=0
> 20dN.. 1610366662us : vector_config: irq=31 vector=33 cpu=20 apicdest=0x0000002c
Here it is reconfigured to CPU 20. Now that update schedules vector 33 on
CPU9 for cleanup.
> 20dN.. 1610366666us : irq_matrix_alloc: bit=34 cpu=2 online=1 avl=199 alloc=2 managed=0 online_maps=56 global_avl=11239, global_rsvd=28, total_alloc=17
> 20dN.. 1610366666us : vector_update: irq=31 vector=34 cpu=2 prev_vector=33 prev_cpu=20
> 20dN.. 1610366666us : vector_alloc: irq=31 vector=34 reserved=1 ret=0
> 20dN.. 1610366666us : vector_config: irq=31 vector=34 cpu=2 apicdest=0x00000004
So here the shit hits the fan because that update schedules vector 33 on
CPU20 for cleanup while the previous cleanup for CPU9 has not been done
yet. Cute. or not so cute....
> 20dNh. 1610366669us : vector_free_moved: irq=31 cpu=20 vector=33 is_managed=0
> 20dNh. 1610366670us : irq_matrix_free: bit=33 cpu=20 online=1 avl=201 alloc=0 managed=0 online_maps=56 global_avl=11240, global_rsvd=28, total_alloc=16
And frees the CPU 20 vector
> 9d.h. 1610366696us : vector_free_moved: irq=31 cpu=20 vector=0 is_managed=0
And then CPU9 claims that it's queued for cleanup. Bah.
I'm still working on a fix as the elegant solution refused to work.
Thanks,
tglx