Re: [PATCH v2] vfio/type1: Limit DMA mappings per container

From: Alex Williamson
Date: Wed Apr 03 2019 - 16:00:45 EST


On Wed, 3 Apr 2019 15:24:26 -0400
Jerome Glisse <jglisse@xxxxxxxxxx> wrote:

> On Tue, Apr 02, 2019 at 10:15:38AM -0600, Alex Williamson wrote:
> > Memory backed DMA mappings are accounted against a user's locked
> > memory limit, including multiple mappings of the same memory. This
> > accounting bounds the number of such mappings that a user can create.
> > However, DMA mappings that are not backed by memory, such as DMA
> > mappings of device MMIO via mmaps, do not make use of page pinning
> > and therefore do not count against the user's locked memory limit.
> > These mappings still consume memory, but the memory is not well
> > associated to the process for the purpose of oom killing a task.
> >
> > To add bounding on this use case, we introduce a limit to the total
> > number of concurrent DMA mappings that a user is allowed to create.
> > This limit is exposed as a tunable module option where the default
> > value of 64K is expected to be well in excess of any reasonable use
> > case (a large virtual machine configuration would typically only make
> > use of tens of concurrent mappings).
> >
> > This fixes CVE-2019-3882.
> >
> > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
>
> Have you tested with GPU passthrough ? GPU have huge BAR from
> hundred of mega bytes to giga bytes (some driver resize them
> to cover the whole GPU memory). Driver need to map those to
> properly work. I am not sure what path is taken by mmap of
> mmio BAR by a guest on the host but i just thought i would
> point that out.

The limit introduced is the number of mappings that a user can have
outstanding, not the size of the mappings. We don't try to estimate
the overhead of a mapping based on the mapping size since IOMMU
super-page support can make a 1GB mapping comparable in overhead to a
4KB mapping. QEMU will generally try to map a bar with a single
mapping, unless it's split by something like an MSI-X vector table or
quirks, which still results in a low single digit number of mappings per
BAR. This does not affect how the guest drivers use the device, BARs
cannot be partially enabled from a DMA address space perspective.

If a userspace driver were trying to map a large GPU BAR with separate
4K mappings, they could indeed hit the limit, but it's far from the
common or expected use case and the module tunable could be used to
provide this functionality if it were really necessary.

There's really no support for resizable BARs through vfio-pci right now,
we get the device in its base configuration, QEMU maps that and exposes
a rather fixed device to the VM. If this is something we need to
address for GPU assignment, let's talk. Thanks,

Alex