Re: [PATCH] scatterlist: don't overflow length field

From: David Dillow
Date: Mon Feb 06 2017 - 13:31:37 EST


+Jens, Christoph, and Ming based on off-list suggestion

On Fri, Feb 3, 2017 at 11:57 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Feb 1, 2017 at 1:29 PM, David Dillow <dillow@xxxxxxxxxx> wrote:
> > When called with a region of contiguous pages totaling > 4 GB of memory,
> > sg_alloc_table_from_pages() will overflow the length field, leading to a
> > corrupt scatter list. Fix this by tracking the number of pages we've
> > merged and start a new chunk when we would overflow.
>
> So what allows these things to be built in the first place?
>
> We limit IO sizes to fit in a signed int (so just below 2GB) not only
> because it's often an effective denial of service, but also because
> we've had issues with various drivers (and filesystems) getting
> int/long wrong.
>
> So nothing should be building those kinds of scatterlists, and it
> something is able to, it might result in other problems downstreams..

This isn't from normal read/write IO -- some applications want to
access large amounts
of userspace memory directly from hardware, and it is cleaner for them
to manage one
mapping than multiple 1GB or 2GB mappings -- assuming the hardware can even
support multiple mappings. If they have room in their container to
allocate and pin the
memory, we'd like to allow it.

There's definitely potential for problems downstream, even without
going through the
filesystems and block layers -- we noticed this potential issue while
tracking down an
bug in the IOMMU code when an entry in the list was over 1GB. We still
see a benefit
from building the large entries, though -- it allows superpages in the
IOMMU mapping
which helps the IOTLB cache.

We currently use sg_alloc_table_from_pages() to build the scatterlist
for dma_map_sg()
but we could do it ourselves if you'd rather add a length limit to the
more general code.