Re: [Intel IOMMU 04/10] IOVA allocation and management routines
From: Keshavamurthy, Anil S
Date: Tue Jun 26 2007 - 12:21:17 EST
On Mon, Jun 25, 2007 at 11:07:47PM -0700, Andrew Morton wrote:
> On Tue, 19 Jun 2007 14:37:05 -0700 "Keshavamurthy, Anil S" <anil.s.keshavamurthy@xxxxxxxxx> wrote:
>
> All the inlines in this code are pretty pointless: all those functions have
> a single callsite so the compiler inlines them anyway. If we later add
> more callsites for these functions, they're too big to be inlined.
>
> inline is usually wrong: don't do it!
Yup, I agree and will follow in future.
> > +
> > +/**
> > + * find_iova - find's an iova for a given pfn
> > + * @iovad - iova domain in question.
> > + * pfn - page frame number
> > + * This function finds and returns an iova belonging to the
> > + * given doamin which matches the given pfn.
> > + */
> > +struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn)
> > +{
> > + unsigned long flags;
> > + struct rb_node *node;
> > +
> > + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > + node = iovad->rbroot.rb_node;
> > + while (node) {
> > + struct iova *iova = container_of(node, struct iova, node);
> > +
> > + /* If pfn falls within iova's range, return iova */
> > + if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
> > + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > + return iova;
> > + }
> > +
> > + if (pfn < iova->pfn_lo)
> > + node = node->rb_left;
> > + else if (pfn > iova->pfn_lo)
> > + node = node->rb_right;
> > + }
> > +
> > + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > + return NULL;
> > +}
>
> So we take the lock, look up an item, then drop the lock then return the
> item we just found. We took no refcount on it and we didn't do anything to
> keep this object alive.
>
> Is that a bug, or does the (afacit undocumented) lifecycle management of
> these things take care of it in some manner? If yes, please reply via an
> add-a-comment patch.
Nope, this is not a bug. Adding a comment patch which explains the same.
>
>
> > +/**
> > + * __free_iova - frees the given iova
> > + * @iovad: iova domain in question.
> > + * @iova: iova in question.
> > + * Frees the given iova belonging to the giving domain
> > + */
> > +void
> > +__free_iova(struct iova_domain *iovad, struct iova *iova)
> > +{
> > + unsigned long flags;
> > +
> > + if (iova) {
> > + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > + __cached_rbnode_delete_update(iovad, iova);
> > + rb_erase(&iova->node, &iovad->rbroot);
> > + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > + free_iova_mem(iova);
> > + }
> > +}
>
> Can this really be called with NULL? If so, under what circumstances?
> (This reader couldn't work it out from a brief look at the code, so perhaps
> others will not be able to either. Perhaps a comment is needed)
It was getting called from only one place free_iova().
Below patch address your concern.
>
> > +/**
> > + * free_iova - finds and frees the iova for a given pfn
> > + * @iovad: - iova domain in question.
> > + * @pfn: - pfn that is allocated previously
> > + * This functions finds an iova for a given pfn and then
> > + * frees the iova from that domain.
> > + */
> > +void
> > +free_iova(struct iova_domain *iovad, unsigned long pfn)
> > +{
> > + struct iova *iova = find_iova(iovad, pfn);
> > + __free_iova(iovad, iova);
> > +
> > +}
> > +
> > +/**
> > + * put_iova_domain - destroys the iova doamin
> > + * @iovad: - iova domain in question.
> > + * All the iova's in that domain are destroyed.
> > + */
> > +void put_iova_domain(struct iova_domain *iovad)
> > +{
> > + struct rb_node *node;
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > + node = rb_first(&iovad->rbroot);
> > + while (node) {
> > + struct iova *iova = container_of(node, struct iova, node);
> > + rb_erase(node, &iovad->rbroot);
> > + free_iova_mem(iova);
> > + node = rb_first(&iovad->rbroot);
> > + }
> > + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > +}
>
> Right, so I suspect what's happening here is that all iova's remain valid
> until their entire domain is destroyed, yes?
Nope. IOVA are valid only for the duration of DMA MAP and DMA UNMAP calls.
In case of Intel-iommu driver, the iova's are valid only for the duration
of __intel_map_singl() and __intel_unmap_single() calls.
>
> What is the upper bound to the memory consumpotion here, and what provides
> it?
As explained above, iova are freed and reused again during the DMA map calls.
>
> Again, some code comments about these design issues are appropriate.
>
> > +/*
> > + * We need a fixed PAGE_SIZE of 4K irrespective of
> > + * arch PAGE_SIZE for IOMMU page tables.
> > + */
> > +#define PAGE_SHIFT_4K (12)
> > +#define PAGE_SIZE_4K (1UL << PAGE_SHIFT_4K)
> > +#define PAGE_MASK_4K (((u64)-1) << PAGE_SHIFT_4K)
> > +#define PAGE_ALIGN_4K(addr) (((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
>
> Am still wondering why we cannot use PAGE_SIZE, PAGE_SHIFT, etc here.
VT-d hardware(a.k.a Intel IOMMU hardware) page table size is always
4K irrespective of the OS PAGE_SIZE. We want to use the same code for
IA64 too where the OS PAGE_SIZE may not be 4K size and hence
had to define here for IOMMU.
>
> > +#define IOVA_START_ADDR (0x1000)
>
> What determined that address? (Needs comment)
Fixed. Please see bleow patch.
>
> > +#define IOVA_START_PFN (IOVA_START_ADDR >> PAGE_SHIFT_4K)
> > +
> > +#define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT_4K)
>
> So I'm looking at this and wondering "what type does addr have"?
>
> If it's unsigned long then perhaps we have a problem on x86_32 PAE. Maybe
> we don't support x86_32 PAE, but still, I'd have thought that the
> appropriate type here is dma_addr_t.
Yup that is correct. We don;t support x86_32.
>
> But alas, it was needlessly implemented as a macro, so the reader cannot
> tell.
I am in the process of making the same code base to work for
IA64 architecure too and in this process I will do more cleanup.
Thanks.
Please apply the below patch as a fix the existing patch.
signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@xxxxxxxxx>
---
drivers/pci/iova.c | 22 ++++++++++++++--------
drivers/pci/iova.h | 4 ++--
2 files changed, 16 insertions(+), 10 deletions(-)
Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/iova.c 2007-06-26 07:50:34.000000000 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/iova.c 2007-06-26 08:25:23.000000000 -0700
@@ -166,6 +166,7 @@
unsigned long flags;
struct rb_node *node;
+ /* Take the lock so that no other thread is manipulating the rbtree */
spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
node = iovad->rbroot.rb_node;
while (node) {
@@ -174,6 +175,12 @@
/* If pfn falls within iova's range, return iova */
if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+ /* We are not holding the lock while this iova
+ * is referenced by the caller as the same thread
+ * which called this function also calls __free_iova()
+ * and it is by desing that only one thread can possibly
+ * reference a particular iova and hence no conflict.
+ */
return iova;
}
@@ -198,13 +205,11 @@
{
unsigned long flags;
- if (iova) {
- spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
- __cached_rbnode_delete_update(iovad, iova);
- rb_erase(&iova->node, &iovad->rbroot);
- spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
- free_iova_mem(iova);
- }
+ spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+ __cached_rbnode_delete_update(iovad, iova);
+ rb_erase(&iova->node, &iovad->rbroot);
+ spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+ free_iova_mem(iova);
}
/**
@@ -218,7 +223,8 @@
free_iova(struct iova_domain *iovad, unsigned long pfn)
{
struct iova *iova = find_iova(iovad, pfn);
- __free_iova(iovad, iova);
+ if (iova)
+ __free_iova(iovad, iova);
}
Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/iova.h 2007-06-26 07:50:34.000000000 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/iova.h 2007-06-26 08:28:05.000000000 -0700
@@ -24,8 +24,8 @@
#define PAGE_MASK_4K (((u64)-1) << PAGE_SHIFT_4K)
#define PAGE_ALIGN_4K(addr) (((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
-#define IOVA_START_ADDR (0x1000)
-#define IOVA_START_PFN (IOVA_START_ADDR >> PAGE_SHIFT_4K)
+/* IO virtual address start page frame number */
+#define IOVA_START_PFN (1)
#define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT_4K)
#define DMA_32BIT_PFN IOVA_PFN(DMA_32BIT_MASK)
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/