Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt

From: John Garry
Date: Wed Dec 11 2019 - 12:09:23 EST

Next message: Marco Felsch: "Re: [PATCH v3 3/6] dt-bindings: mfd: da9062: add regulator voltage selection documentation"
Previous message: Mircea CIRJALIU - MELIU: "RE: [RFC PATCH v1 3/4] thp: fix huge page zapping for special PMDs"
In reply to: Marc Zyngier: "Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt"
Next in thread: Ming Lei: "Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 10/12/2019 01:43, Ming Lei wrote:

For when the interrupt is managed, allow the threaded part to run on all
cpus in the irq affinity mask.

I remembered that performance drop is observed by this approach in some
test.

From checking the thread about the NVMe interrupt swamp, just switching to
threaded handler alone degrades performance. I didn't see any specific
results for this change from Long Li -https://lkml.org/lkml/2019/8/21/128

Hi Ming,

I am pretty clear the reason for Azure, which is caused by aggressive interrupt
coalescing, and this behavior shouldn't be very common, and it can be
addressed by the following patch:

I am running some NVMe perf tests with Marc's patch.

I see this almost always eventually (with or without that patch):

[ 66.018140] rcu: INFO: rcu_preempt self-detected stall on CPU2% done] [5058MB/0KB/0KB /s] [1295K/0/0 iops] [eta 01m:39s]
[ 66.023885] rcu: 12-....: (5250 ticks this GP) idle=182/1/0x4000000000000004 softirq=517/517 fqs=2529
[ 66.033306] (t=5254 jiffies g=733 q=2241)
[ 66.037394] Task dump for CPU 12:
[ 66.040696] fio R running task 0 798 796 0x00000002
[ 66.047733] Call trace:
[ 66.050173] dump_backtrace+0x0/0x1a0
[ 66.053823] show_stack+0x14/0x20
[ 66.057126] sched_show_task+0x164/0x1a0
[ 66.061036] dump_cpu_task+0x40/0x2e8
[ 66.064686] rcu_dump_cpu_stacks+0xa0/0xe0
[ 66.068769] rcu_sched_clock_irq+0x6d8/0xaa8
[ 66.073027] update_process_times+0x2c/0x50
[ 66.077198] tick_sched_handle.isra.14+0x30/0x50
[ 66.081802] tick_sched_timer+0x48/0x98
[ 66.085625] __hrtimer_run_queues+0x120/0x1b8
[ 66.089968] hrtimer_interrupt+0xd4/0x250
[ 66.093966] arch_timer_handler_phys+0x28/0x40
[ 66.098398] handle_percpu_devid_irq+0x80/0x140
[ 66.102915] generic_handle_irq+0x24/0x38
[ 66.106911] __handle_domain_irq+0x5c/0xb0
[ 66.110995] gic_handle_irq+0x5c/0x148
[ 66.114731] el1_irq+0xb8/0x180
[ 66.117858] efi_header_end+0x94/0x234
[ 66.121595] irq_exit+0xd0/0xd8
[ 66.124724] __handle_domain_irq+0x60/0xb0
[ 66.128806] gic_handle_irq+0x5c/0x148
[ 66.132542] el0_irq_naked+0x4c/0x54
[ 97.152870] rcu: INFO: rcu_preempt self-detected stall on CPU8% done] [4736MB/0KB/0KB /s] [1212K/0/0 iops] [eta 01m:08s]
[ 97.158616] rcu: 8-....: (1 GPs behind) idle=08e/1/0x4000000000000002 softirq=462/505 fqs=2621
[ 97.167414] (t=5253 jiffies g=737 q=5507)
[ 97.171498] Task dump for CPU 8:
[pu_task+0x40/0x2e8
[ 97.198705] rcu_dump_cpu_stacks+0xa0/0xe0
[ 97.202788] rcu_sched_clock_irq+0x6d8/0xaa8
[ 97.207046] update_process_times+0x2c/0x50
[ 97.211217] tick_sched_handle.isra.14+0x30/0x50
[ 97.215820] tick_sched_timer+0x48/0x98
[ 97.219644] __hrtimer_run_queues+0x120/0x1b8
[ 97.223989] hrtimer_interrupt+0xd4/0x250
[ 97.227987] arch_timer_handler_phys+0x28/0x40
[ 97.232418] handle_percpu_devid_irq+0x80/0x140
[ 97.236935] generic_handle_irq+0x24/0x38
[ 97.240931] __handle_domain_irq+0x5c/0xb0
[ 97.245015] gic_handle_irq+0x5c/0x148
[ 97.248751] el1_irq+0xb8/0x180
[ 97.251880] find_busiest_group+0x18c/0x9e8
[ 97.256050] load_balance+0x154/0xb98
[ 97.259700] rebalance_domains+0x1cc/0x2f8
[ 97.263783] run_rebalance_domains+0x78/0xe0
[ 97.268040] efi_header_end+0x114/0x234
[ 97.271864] run_ksoftirqd+0x38/0x48
[ 97.275427] smpboot_thread_fn+0x16c/0x270
[ 97.279511] kthread+0x118/0x120
[ 97.282726] ret_from_fork+0x10/0x18
[ 97.286289] Task dump for CPU 12:
[ 97.289591] kworker/12:1 R running task 0 570 2 0x0000002a
[ 97.296634] Workqueue: 0x0 (mm_percpu_wq)
[ 97.300718] Call trace:
[ 97.303152] __switch_to+0xbc/0x218
[ 97.306632] page_wait_table+0x1500/0x1800

Would this be the same interrupt "swamp" issue?

http://lists.infradead.org/pipermail/linux-nvme/2019-November/028008.html

What is the status of these patches? I did not see them in mainline.

Then please share your lockup story, such as, which HBA/drivers, test steps,
if you complete IOs from multiple disks(LUNs) on single CPU, if you have
multiple queues, how many active LUNs involved in the test, ...

Thanks,
John

Next message: Marco Felsch: "Re: [PATCH v3 3/6] dt-bindings: mfd: da9062: add regulator voltage selection documentation"
Previous message: Mircea CIRJALIU - MELIU: "RE: [RFC PATCH v1 3/4] thp: fix huge page zapping for special PMDs"
In reply to: Marc Zyngier: "Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt"
Next in thread: Ming Lei: "Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]