Re: [SUGGESTION]: drop virtual merge accounting in I/O requests

From: FUJITA Tomonori
Date: Sun Jul 13 2008 - 22:20:37 EST


On Sun, 13 Jul 2008 17:41:19 -0700 (PDT)
David Miller <davem@xxxxxxxxxxxxx> wrote:

> From: Andi Kleen <andi@xxxxxxxxxxxxxx>
> Date: Sun, 13 Jul 2008 22:13:17 +0200
>
> > David Miller wrote:
> > > From: Andi Kleen <andi@xxxxxxxxxxxxxx>
> > > Date: Sun, 13 Jul 2008 15:50:55 +0200
> > >
> > >> Still I would expect that modern IO controllers are typically fast
> > >> enough at processing SG lists that it shouldn't matter much.
> > >
> > > I know it matters a lot on sparc64 ESP scsi controllers.
> > >
> > > You can only have one address/len pair DMA'ing at a time and you have
> > > to service an interrupt to load in the the next DMA sg elements into
> > > the chips registers.
> > >
> > > Merging is essentially a must for performance on those cards.
> >
> > Well right now your setup breaks all controllers with "weird requirements"
> > like 64k DMA or similar. You'll need to find some way to turn off BIO
> > merge for those at least.
> >
> > Perhaps this needs to be really a block queue attribute instead of a global?
>
> Like I said, that code was written at a time when none of the block
> segment check stuff existed, and therefore worked perfectly fine in
> the environment in which it was created.
>
> Someone added the segmenting code, but didn't bother to add proper
> checking to the merging bits.
>
> Usually we revert code that breaks things like that, right?
>
> So I find it unusual for people to talk about turning off the
> code that was working perfectly fine previously in situations
> like this.

Seems that there are some confusion.

It's not likely that the DMA boundary restriction causes this issue
because we set it to 4G by default.

As Mikulas pointed out, before my IOMMU work, this problem existed in
SPARC64 (and other architectures that set BIO_VMERGE_BOUNDARY to non
zero) because IOMMUs can't guarantee that they merge sg segments.

I think that now we hit this problems due to the max_segment_size.

In the past, IOMMUs aggressively merged sg entries. IOMMUs ignored the
block layer's max_segment_size (64K by default), merge sg segments,
and create a large segment. But most of LLDs can handle a segment size
larger than 64K. So everything was fine.

Now IOMMUs don't ignore the max_segment_size. We hit this problem.

It's the right thing that IOMMUs don't ignore the max_segment_size. I
guess that if the A100u2w driver sets a max_segment_size to a larger
value, the problem will be fixed. However, as we discussed, IOMMUs
can't guarantee that they merge sg segments. It's possible that we
still hit this problem.

We tell SCSI driver developers that the drivers don't get the larger
number of segments than they tell the SCSI subsystem. If we keep the
virtual merge concept, we need to fix this first.


Sorry about this problem. As I said, this problem existed before my
IOMMU work, but I should have taken care about this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/