Re: [PATCH kernel v9 21/32] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window

From: David Gibson
Date: Thu Apr 30 2015 - 00:39:12 EST


On Wed, Apr 29, 2015 at 07:26:28PM +1000, Alexey Kardashevskiy wrote:
> On 04/29/2015 02:45 PM, David Gibson wrote:
> >On Sat, Apr 25, 2015 at 10:14:45PM +1000, Alexey Kardashevskiy wrote:
> >>This is a part of moving DMA window programming to an iommu_ops
> >>callback. pnv_pci_ioda2_set_window() takes an iommu_table_group as
> >>a first parameter (not pnv_ioda_pe) as it is going to be used as
> >>a callback for VFIO DDW code.
> >>
> >>This adds pnv_pci_ioda2_tvt_invalidate() to invalidate TVT as it is
> >>a good thing to do.
> >
> >What's the TVT and why is invalidating it a good thing?
>
>
> "TCE Validation Table". Yeah, I need to rephrase it. Will do.
>
>
> >Also, it looks like it didn't add it, just move it.
>
> Agrh. Lost it in rebases. Will fix.
>
>
> >>It does not have immediate effect now as the table
> >>is never recreated after reboot but it will in the following patches.
> >>
> >>This should cause no behavioural change.
> >>
> >>Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
> >>Reviewed-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx>
> >
> >Really? I don't remember this one.
>
>
> Message-ID: <20150416064351.GK3632@xxxxxxxxxxxxxxx>
> :)
>
> But I believe it did not have TVT stuff then so I should have removed your
> RB from here.

Yeah, that's probably why I didn't recognize it.

>
> >
> >>---
> >>Changes:
> >>v9:
> >>* initialize pe->table_group.tables[0] at the very end when
> >>tbl is fully initialized
> >>* moved pnv_pci_ioda2_tvt_invalidate() from earlier patch
> >>---
> >> arch/powerpc/platforms/powernv/pci-ioda.c | 67 +++++++++++++++++++++++--------
> >> 1 file changed, 51 insertions(+), 16 deletions(-)
> >>
> >>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>index b9b3773..59baa15 100644
> >>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>@@ -1960,6 +1960,52 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
> >> __free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
> >> }
> >>
> >>+static long pnv_pci_ioda2_set_window(struct iommu_table_group *table_group,
> >>+ struct iommu_table *tbl)
> >>+{
> >>+ struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
> >>+ table_group);
> >>+ struct pnv_phb *phb = pe->phb;
> >>+ int64_t rc;
> >>+ const __u64 start_addr = tbl->it_offset << tbl->it_page_shift;
> >>+ const __u64 win_size = tbl->it_size << tbl->it_page_shift;
> >>+
> >>+ pe_info(pe, "Setting up window at %llx..%llx "
> >>+ "pgsize=0x%x tablesize=0x%lx\n",
> >>+ start_addr, start_addr + win_size - 1,
> >>+ 1UL << tbl->it_page_shift, tbl->it_size << 3);
> >>+
> >>+ tbl->it_table_group = &pe->table_group;
> >>+
> >>+ /*
> >>+ * Map TCE table through TVT. The TVE index is the PE number
> >>+ * shifted by 1 bit for 32-bits DMA space.
> >>+ */
> >>+ rc = opal_pci_map_pe_dma_window(phb->opal_id,
> >>+ pe->pe_number,
> >>+ pe->pe_number << 1,
> >>+ 1,
> >>+ __pa(tbl->it_base),
> >>+ tbl->it_size << 3,
> >>+ 1ULL << tbl->it_page_shift);
> >>+ if (rc) {
> >>+ pe_err(pe, "Failed to configure TCE table, err %ld\n", rc);
> >>+ goto fail;
> >>+ }
> >>+
> >>+ pnv_pci_ioda2_tvt_invalidate(pe);
> >>+
> >>+ /* Store fully initialized *tbl (may be external) in PE */
> >>+ pe->table_group.tables[0] = *tbl;
> >
> >Hrm, a non-atomic copy of a whole structure into the array. Is that
> >really what you want?
>
>
> set_window is called from VFIO (protected by mutex there) and the platform
> code which I believe is not racy (or hotplug takes care of it anyway). Or I
> am missing something else?

Sorry, I wasn't clear. It's not that I actually think the copy is
going to race with anything now.

It's more that copying whole structures about is a rather odd way of
doing things, and makes it much less obvious how object lifetimes
interact.

From what I've seen of the rest of the series it seems like the
following scheme would make more sense:

* struct iommu_table has identical lifetime to the actual tables
allocated under it.
* So, the "create" function both allocates the header structure,
all the actual TCE tables under it, and fills in the header
with the details of same (size, levelsize, levels etc.)
* table_group would have an array of pointers to iommu_table
structs, rather than embedding an array of iommu_table structs
* This pointers would be optionally populated
* set_window function would populate the table_group array with
a previously "create"ed iommu_table
* unset window would clear the pointer, and unref the iommu_table
* "free" and "reset" for a single table would be rolled back into a
single function

Unless there's some reason I've missed that you want to embed the
whole array of iommu_table structs.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: pgp9rI_xlqcAG.pgp
Description: PGP signature