Re: [RFC 1/1] pmem: Add cond_resched() in bio_for_each_segment loop in pmem_make_request

From: Ritesh Harjani
Date: Mon Aug 03 2020 - 03:14:18 EST




On 8/3/20 4:31 AM, Dave Chinner wrote:
On Wed, Jul 29, 2020 at 02:15:18PM +0530, Ritesh Harjani wrote:
For systems which do not have CONFIG_PREEMPT set and
if there is a heavy multi-threaded load/store operation happening
on pmem + sometimes along with device latencies, softlockup warnings like
this could trigger. This was seen on Power where pagesize is 64K.

To avoid softlockup, this patch adds a cond_resched() in this path.

<...>
watchdog: BUG: soft lockup - CPU#31 stuck for 22s!
<...>
CPU: 31 PID: 15627 <..> 5.3.18-20
<...>
NIP memcpy_power7+0x43c/0x7e0
LR memcpy_flushcache+0x28/0xa0

Call Trace:
memcpy_power7+0x274/0x7e0 (unreliable)
memcpy_flushcache+0x28/0xa0
write_pmem+0xa0/0x100 [nd_pmem]
pmem_do_bvec+0x1f0/0x420 [nd_pmem]
pmem_make_request+0x14c/0x370 [nd_pmem]
generic_make_request+0x164/0x400
submit_bio+0x134/0x2e0
submit_bio_wait+0x70/0xc0
blkdev_issue_zeroout+0xf4/0x2a0
xfs_zero_extent+0x90/0xc0 [xfs]
xfs_bmapi_convert_unwritten+0x198/0x230 [xfs]
xfs_bmapi_write+0x284/0x630 [xfs]
xfs_iomap_write_direct+0x1f0/0x3e0 [xfs]
xfs_file_iomap_begin+0x344/0x690 [xfs]
dax_iomap_pmd_fault+0x488/0xc10
__xfs_filemap_fault+0x26c/0x2b0 [xfs]
__handle_mm_fault+0x794/0x1af0
handle_mm_fault+0x12c/0x220
__do_page_fault+0x290/0xe40
do_page_fault+0x38/0xc0
handle_page_fault+0x10/0x30

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx>
Signed-off-by: Ritesh Harjani <riteshh@xxxxxxxxxxxxx>
---
drivers/nvdimm/pmem.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 2df6994acf83..fcf7af13897e 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -214,6 +214,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
bio->bi_status = rc;
break;
}
+ cond_resched();

There are already cond_resched() calls between submitted bios in
blkdev_issue_zeroout() via both __blkdev_issue_zero_pages() and
__blkdev_issue_write_zeroes(), so I'm kinda wondering where the
problem is coming from here.

This problem is coming from that bio call- submit_bio()


Just how big is the bio being issued here that it spins for 22s
trying to copy it?

It's 256 (due to BIO_MAX_PAGES) * 64KB (pagesize) = 16MB.
So this is definitely not an easy trigger as per tester was mainly seen
on a VM.

Looking at the cond_resched() inside dax_writeback_mapping_range()
in xas_for_each_marked() loop, I thought it should be good to have a
cond_resched() in the above path as well.

Hence an RFC for discussion.


And, really, if the system is that bound on cacheline bouncing that
it prevents memcpy() from making progress, I think we probably
should be issuing a soft lockup warning like this... >
Cheers,

Dave.