Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

From: Neil Brown
Date: Wed Apr 25 2007 - 05:38:04 EST


On Wednesday April 25, jens.axboe@xxxxxxxxxx wrote:
>
> That's pretty close to where I think the problem is (the front merging
> and cfq_reposition_rq_rb()). The issue with that is that you'd only get
> aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
> here. Given that front merges are equally not very likely, I'd be
> surprised is something like that has ever happened.

Well it certainly doesn't happen very often....
And I can imagine a filesystem genuinely wanting to read the same
block twice - maybe a block that contained packed tails of two
different files.
>
> BUT... That may explain while we are only seeing it on md. Would md
> ever be issuing such requests that trigger this condition?

Can someone remind me which raid level(s) was/were involved?

I think one was raid0 - that just passes requests on from the
filesystem, so md would only issue requests like that if the
filesystem did.
I guess it could happen with raid4/5/6. A read request that was
properly aligned (and we do encourage proper alignment) will be passed
directly to the underlying device. A concurrent write elsewhere could
require the same block to be read into the stripe-cache to enable a
parity calculation. So you could get two reads at the same block
address.
Getting a front-merge would probably require two stripe-heads to be
processed in reverse-sector order, and it tends to preserve the order
of incoming requests (though that isn't firmly enforced).

raid1 is much like raid0 (with totally different code) in that the
request pattern seen by the underlying device matches the request
pattern generated by the filesystem.

If I read the debugging output correctly, the request which I
hypothesise was the subject of a front-merge is a 'sync' request.
raid5 does not generate sync requests to fill the stripe cache (maybe
it should?) so I really think it must have come directly from the
filesystem.

(just checked previous email for more detail of when it hits)
The fact that it hits degraded arrays more easily is interesting.
Maybe we try to read a block on the missing device and so schedule
reads for the other devices. Then we try to read a block on a good
device and issue a request for exactly the same block that raid5 asked
for. That still doesn't explain the 'sync' and the 'front merge'.
But that is quite possible, just not common maybe.

It doesn't help us understand the raid0 example though. May it is
just a 'can happen, but only rarely' thing.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/