Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi:Fix address translation failure of HighMem pages used by sg list)

From: Boaz Harrosh
Date: Mon Jul 30 2012 - 04:57:08 EST


On 07/30/2012 10:12 AM, Paolo Bonzini wrote:

> Il 30/07/2012 01:50, Rusty Russell ha scritto:
>>> Also, being the first user of chained scatterlist doesn't exactly give
>>> me warm fuzzies.
>>
>> We're far from the first user: they've been in the kernel for well over
>> 7 years. They were introduced for the block layer, but they tended to
>> ignore private uses of scatterlists like this one.
>
> Yeah, but sg_chain has no users in drivers, only a private one in
> lib/scatterlist.c. The internal API could be changed to something else
> and leave virtio-scsi screwed...
>
>> Yes, we should do this. But note that this means an iteration, so we
>> might as well combine the loops :)
>
> I'm really bad at posting pseudo-code, but you can count the number of
> physically-contiguous entries at the beginning of the list only. So if
> everything is contiguous, you use a single non-indirect buffer and save
> a kmalloc. If you use indirect buffers, I suspect it's much less
> effective to collapse physically-contiguous entries. More elaborate
> heuristics do need a loop, though.
>


[All the below with a grain of salt, from my senile memory]

You must not forget some facts about the scatterlist received here at the LLD.
It has already been DMA mapped and locked by the generic layer.
Which means that the DMA engine has already collapsed physically-contiguous
entries. Those you get here are already unique physically.
(There were bugs in the past, where this was not true, please complain
if you find them again)

A scatterlist is two different lists taking the same space, but with two
different length.
- One list is the PAGE pointers plus offset && length, which is bigger or
equal to the 2nd list. The end marker corresponds to this list.

This list is the input into the DMA engine.

- Second list is the physical DMA addresses list. With their physical-lengths.
Offset is not needed because it is incorporated in the DMA address.

This list is the output from the DMA engine.

The reason 2nd list is shorter is because the DMA engine tries to minimize
the physical scatter-list entries which is usually a limited HW resource.

This list might follow chains but it's end is determined by the received
sg_count from the DMA engine, not by the end marker.

At the time my opinion, and I think Rusty agreed, was that the scatterlist
should be split in two. The input page-ptr list is just the BIO, and the
output of the DMA-engine should just be the physical part of the sg_list,
as a separate parameter. But all this was berried under too much APIs and
the noise was two strong, for any single brave sole.

So I'd just trust blindly the returned sg_count from the DMA engine, it is
already optimized. I THINK

> Paolo


Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/