Re: [PATCH kernel v9 28/32] powerpc/mmu: Add userspace-to-physical addresses translation cache

From: Paul Mackerras
Date: Thu Apr 30 2015 - 04:25:41 EST


On Thu, Apr 30, 2015 at 04:34:55PM +1000, David Gibson wrote:
> On Sat, Apr 25, 2015 at 10:14:52PM +1000, Alexey Kardashevskiy wrote:
> > We are adding support for DMA memory pre-registration to be used in
> > conjunction with VFIO. The idea is that the userspace which is going to
> > run a guest may want to pre-register a user space memory region so
> > it all gets pinned once and never goes away. Having this done,
> > a hypervisor will not have to pin/unpin pages on every DMA map/unmap
> > request. This is going to help with multiple pinning of the same memory
> > and in-kernel acceleration of DMA requests.
> >
> > This adds a list of memory regions to mm_context_t. Each region consists
> > of a header and a list of physical addresses. This adds API to:
> > 1. register/unregister memory regions;
> > 2. do final cleanup (which puts all pre-registered pages);
> > 3. do userspace to physical address translation;
> > 4. manage a mapped pages counter; when it is zero, it is safe to
> > unregister the region.
> >
> > Multiple registration of the same region is allowed, kref is used to
> > track the number of registrations.
>
> [snip]
> > +long mm_iommu_alloc(unsigned long ua, unsigned long entries,
> > + struct mm_iommu_table_group_mem_t **pmem)
> > +{
> > + struct mm_iommu_table_group_mem_t *mem;
> > + long i, j;
> > + struct page *page = NULL;
> > +
> > + list_for_each_entry_rcu(mem, &current->mm->context.iommu_group_mem_list,
> > + next) {
> > + if ((mem->ua == ua) && (mem->entries == entries))
> > + return -EBUSY;
> > +
> > + /* Overlap? */
> > + if ((mem->ua < (ua + (entries << PAGE_SHIFT))) &&
> > + (ua < (mem->ua + (mem->entries << PAGE_SHIFT))))
> > + return -EINVAL;
> > + }
> > +
> > + mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> > + if (!mem)
> > + return -ENOMEM;
> > +
> > + mem->hpas = vzalloc(entries * sizeof(mem->hpas[0]));
> > + if (!mem->hpas) {
> > + kfree(mem);
> > + return -ENOMEM;
> > + }
>
> So, I've thought more about this and I'm really confused as to what
> this is supposed to be accomplishing.
>
> I see that you need to keep track of what regions are registered, so
> you don't double lock or unlock, but I don't see what the point of
> actualy storing the translations in hpas is.
>
> I had assumed it was so that you could later on get to the
> translations in real mode when you do in-kernel acceleration. But
> that doesn't make sense, because the array is vmalloc()ed, so can't be
> accessed in real mode anyway.

We can access vmalloc'd arrays in real mode using real_vmalloc_addr().

> I can't think of a circumstance in which you can use hpas where you
> couldn't just walk the page tables anyway.

The problem with walking the page tables is that there is no guarantee
that the page you find that way is the page that was returned by the
gup_fast() we did earlier. Storing the hpas means that we know for
sure that the page we're doing DMA to is one that we have an elevated
page count on.

Also, there are various points where a Linux PTE is made temporarily
invalid for a short time. If we happened to do a H_PUT_TCE on one cpu
while another cpu was doing that, we'd get a spurious failure returned
by the H_PUT_TCE.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/