On 11/20/2015 04:28 PM, Ewan Milne wrote:
On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:And indeed, it doesn't.
Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.
Yes. I'm not the one primarily looking at it, and we don't have a
reproducer in-house. We just have the one dump right now.
I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...
That might not fix it if it is a problem with the merge code, though.
Seems I finally found the culprit.
What happens is this:
We have two paths, with these seg_boundary_masks:
path-1: seg_boundary_mask = 65535,
path-2: seg_boundary_mask = 4294967295,
consequently the DM request queue has this:
md-1: seg_boundary_mask = 65535,
What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.
So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
->blk_insert_cloned_request()
->blk_rq_check_limits()
If the above assessment is correct, the following patch should
fix it:
diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
*/
int blk_rq_check_limits(struct request_queue *q, struct request *rq)
{
- if (!rq_mergeable(rq))
+ if (rq->cmd_type != REQ_TYPE_FS)
return 0;
if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {
Mike? Jens?
Can you comment on it?