Re: [SUGGESTION]: drop virtual merge accounting in I/O requests

From: Mikulas Patocka
Date: Fri Jul 11 2008 - 16:23:33 EST


On Fri, 11 Jul 2008, David Miller wrote:

From: FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>
Date: Fri, 11 Jul 2008 20:15:52 +0900

On Fri, 11 Jul 2008 06:52:09 -0400 (EDT)
Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:

On Fri, 11 Jul 2008, FUJITA Tomonori wrote:

Yeah, IOMMUs can't guarantee that. The majority of architectures set
BIO_VMERGE_BOUNDARY to 0 so they don't hit this, I think.

Yes, the architectures without IOMMU don't hit this problem.

I meant that even if some architectures support IOMMUs, they set
BIO_VMERGE_BOUNDARY to 0.

Keep in mind that these settings were added long before
we supported segment boundary restrictions.

Someone added code to handle segment boundaries, but didn't
fix any of the block I/O layer infrastructure :-)

Several platforms that have IOMMU but set these values to zero
actually did so for another reason. They considered being
required to always merge page-adjacent mappings virtually too
strong a requirement to meet %100 of the time.

It is broken on Sparc64 even without boundary restrictions --- if you skip over already allocated entry in IOMMU table, you don't merge too.

I'd just drop it, because these requirements seem to me too brittle to maintain. It is too easy to make bug here and too hard to check for it. Basically there are few independent code parts (I/O layer and arch-specific IOMMUs) that are attempting to do the same calculation and if they differ, the driver will crash. Even if we managed to fix it, someone will likely break it again after year or two :-(

Would it mean that nr_hw_segments entry in bio and request could be dropped too? Or is it used for some other purpose?


BTW.: what's the reason that by default (without any driver intervention) device DMA is restricted to cross 64k boundary?

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/