RE: Observing Softlockup's while running heavy IOs
From: Elliott, Robert (Persistent Memory)
Date: Thu Aug 18 2016 - 21:51:44 EST
> -----Original Message-----
> From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Sreekanth Reddy
> Sent: Thursday, August 18, 2016 12:56 AM
> Subject: Observing Softlockup's while running heavy IOs
>
> Problem statement:
> Observing softlockups while running heavy IOs on 8 SSD drives
> connected behind our LSI SAS 3004 HBA.
>
...
> Observing a loop in the IO path, i.e only one CPU is busy with
> processing the interrupts and other CPUs (in the affinity_hint mask)
> are busy with sending the IOs (these CPUs are not yet all receiving
> any interrupts). For example, only CPU6 is busy with processing the
> interrupts from IRQ 219 and remaining CPUs i.e CPU 7,8,9,10 & 11 are
> just busy with pumping the IOs and they never processed any IO
> interrupts from IRQ 219. So we are observing softlockups due to
> existence this loop in the IO Path.
>
> We may not observe these softlockups if irqbalancer might have
> balanced the interrupts among the CPUs enabled in the particular
> irq's
> affinity_hint mask. so that all the CPUs are equaly busy with send
> IOs
> and processing the interrupts. I am not sure how irqbalancer balance
> the load among the CPUs, but here I see only one CPU from irq's
> affinity_hint mask is busy with interrupts and remaining CPUs won't
> receive any interrupts from this IRQ.
>
> Please help me with any suggestions/recomendations to slove/limit
> these kind of softlockups. Also please let me known if I have missed
> any setting in the irqbalance.
>
The CPUs need to be forced to self-throttle by processing interrupts for
their own submissions, which reduces the time they can submit more IOs.
See https://lkml.org/lkml/2014/9/9/931 for discussion of this
problem when blk-mq was added.
---
Robert Elliott, HPE Persistent Memory