Re: [PATCH] CHROMIUM: iommu: rockchip: Make sure that page table state is coherent

From: Tomasz Figa
Date: Mon Jan 24 2022 - 00:37:41 EST


Hi Dafna,

On Fri, Dec 10, 2021 at 12:18 AM Dafna Hirschfeld
<dafna.hirschfeld@xxxxxxxxxxxxx> wrote:
>
>
>
> On 23.03.15 10:38, Tomasz Figa wrote:
> > Sorry, I had to dig my way out through my backlog.
> >
> > On Tue, Mar 3, 2015 at 10:36 PM, Joerg Roedel <joro@xxxxxxxxxx> wrote:
> >> On Mon, Feb 09, 2015 at 08:19:21PM +0900, Tomasz Figa wrote:
> >>> Even though the code uses the dt_lock spin lock to serialize mapping
> >>> operation from different threads, it does not protect from IOMMU
> >>> accesses that might be already taking place and thus altering state
> >>> of the IOTLB. This means that current mapping code which first zaps
> >>> the page table and only then updates it with new mapping which is
> >>> prone to mentioned race.
> >>
> >> Could you elabortate a bit on the race and why it is sufficient to zap
> >> only the first and the last iova? From the description and the comments
> >> in the patch this is not clear to me.
> >
> > Let's start with why it's sufficient to zap only first and last iova.
> >
> > While unmapping, the driver zaps all iovas belonging to the mapping,
> > so the page tables not used by any mapping won't be cached. Now when
> > the driver creates a mapping it might end up occupying several page
> > tables. However, since the mapping area is virtually contiguous, only
> > the first and last page table can be shared with different mappings.
> > This means that only first and last iovas can be already cached. In
> > fact, we could detect if first and last page tables are shared and do
> > not zap at all, but this wouldn't really optimize too much. Why
> > invalidating one iova is enough to invalidate the whole page table is
> > unclear to me as well, but it seems to be the correct way on this
> > hardware.
>
> Hi,
> It seems to me that actually each mapping needs exactly one page.
> Since (as the inline doc in rk_iommu_map states) the pgsize_bitmap
> makes sure that iova mappings fits exactly into one page table
> since the mapping size is maximum 4M.
>
> This actually means that if rk_dte_get_page_table does not allocate a
> new page table but returns one that is already partially used from previous
> mappings then two page tables might be required, but I think the iova
> allocation somehow make sure that this will not be the case.

Yes, it was exactly for the case. Note that the zap operation is
per-IO-page and not per IOPT and there is some prefetching going on in
the TLB of this IOMMU. So neighboring mappings can interfere with each
other.

>
> If it was the case then the code would be buggy because it means
> that the loop in rk_iommu_map_iova will write behind the page table
> given in rk_dte_get_page_table (which we didn't allocate)

Sorry, I don't see how it could write behind the page table. Could you
give me an example?

>
> So I it seems to me that calling 'rk_iommu_zap_iova(rk_domain, iova, SPAGE_SIZE);'
> as done before this patch should be used, but be moved from
> rk_dte_get_page_table to where rk_iommu_zap_iova_first_last is now
>
> Thanks,
> Dafna
>
> >
> > As for the race, it's also kind of explained by the above. The already
> > running hardware can trigger page table look-ups in the IOMMU and so
> > caching of the page table between our zapping and updating its
> > contents. With this patch zapping is performed after updating the page
> > table so the race is gone.
> >
> > Best regards,
> > Tomasz
> >
> > From mboxrd@z Thu Jan 1 00:00:00 1970
> > Return-Path: <linux-kernel-owner@xxxxxxxxxxxxxxx>
> > Received: (majordomo@xxxxxxxxxxxxxxx) by vger.kernel.org via listexpand
> > id S1753210AbbCWM3R (ORCPT <rfc822;w@xxxxxx>);
> > Mon, 23 Mar 2015 08:29:17 -0400
> > Received: from 8bytes.org ([81.169.241.247]:33957 "EHLO theia.8bytes.org"
> > rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
> > id S1752552AbbCWM3M (ORCPT <rfc822;linux-kernel@xxxxxxxxxxxxxxx>);
> > Mon, 23 Mar 2015 08:29:12 -0400
> > Date: Mon, 23 Mar 2015 13:29:10 +0100
> > From: Joerg Roedel <joro@xxxxxxxxxx>
> > To: Tomasz Figa <tfiga@xxxxxxxxxxxx>
> > Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx,
> > "linux-arm-kernel@xxxxxxxxxxxxxxxxxxx"
> > <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>,
> > "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>,
> > "open list:ARM/Rockchip SoC..." <linux-rockchip@xxxxxxxxxxxxxxxxxxx>,
> > Heiko Stuebner <heiko@xxxxxxxxx>, Daniel Kurtz <djkurtz@xxxxxxxxxxxx>
> > Subject: Re: [PATCH] CHROMIUM: iommu: rockchip: Make sure that page table
> > state is coherent
> > Message-ID: <20150323122910.GO4441@xxxxxxxxxx>
> > References: <1423480761-33453-1-git-send-email-tfiga@xxxxxxxxxxxx>
> > <20150303133659.GD10502@xxxxxxxxxx>
> > <CAAFQd5Abk6X7AVTFaNuUSiShn31pzwwTE3VjfLnE4kyziAjy2A@xxxxxxxxxxxxxx>
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=us-ascii
> > Content-Disposition: inline
> > In-Reply-To: <CAAFQd5Abk6X7AVTFaNuUSiShn31pzwwTE3VjfLnE4kyziAjy2A@xxxxxxxxxxxxxx>
> > User-Agent: Mutt/1.5.21 (2010-09-15)
> > Sender: linux-kernel-owner@xxxxxxxxxxxxxxx
> > List-ID: <linux-kernel.vger.kernel.org>
> > X-Mailing-List: linux-kernel@xxxxxxxxxxxxxxx
> >
> > Hi Tomasz,
> >
> > On Mon, Mar 23, 2015 at 05:38:45PM +0900, Tomasz Figa wrote:
> >> While unmapping, the driver zaps all iovas belonging to the mapping,
> >> so the page tables not used by any mapping won't be cached. Now when
> >> the driver creates a mapping it might end up occupying several page
> >> tables. However, since the mapping area is virtually contiguous, only
> >> the first and last page table can be shared with different mappings.
> >> This means that only first and last iovas can be already cached. In
> >> fact, we could detect if first and last page tables are shared and do
> >> not zap at all, but this wouldn't really optimize too much. Why
> >> invalidating one iova is enough to invalidate the whole page table is
> >> unclear to me as well, but it seems to be the correct way on this
> >> hardware.
> >>
> >> As for the race, it's also kind of explained by the above. The already
> >> running hardware can trigger page table look-ups in the IOMMU and so
> >> caching of the page table between our zapping and updating its
> >> contents. With this patch zapping is performed after updating the page
> >> table so the race is gone.
> >
> > Okay, this makes sense. Can you add this information to the patch
> > changelog and resend please?
> >
> > Thanks,
> >
> > Joerg
> >
> >
> > From mboxrd@z Thu Jan 1 00:00:00 1970
> > From: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@xxxxxxxxxxxxxxxx>
> > Subject: [PATCH] CHROMIUM: iommu: rockchip: Make sure that page table state is
> > coherent
> > Date: Mon, 9 Feb 2015 20:19:21 +0900
> > Message-ID: <1423480761-33453-1-git-send-email-tfiga@xxxxxxxxxxxx>
> > Mime-Version: 1.0
> > Content-Type: text/plain; charset="us-ascii"
> > Content-Transfer-Encoding: 7bit
> > Return-path: <iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@xxxxxxxxxxxxxxxx>
> > List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/iommu>,
> > <mailto:iommu-request-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@xxxxxxxxxxxxxxxx?subject=unsubscribe>
> > List-Archive: <http://lists.linuxfoundation.org/pipermail/iommu/>
> > List-Post: <mailto:iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@xxxxxxxxxxxxxxxx>
> > List-Help: <mailto:iommu-request-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@xxxxxxxxxxxxxxxx?subject=help>
> > List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/iommu>,
> > <mailto:iommu-request-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@xxxxxxxxxxxxxxxx?subject=subscribe>
> > Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@xxxxxxxxxxxxxxxx
> > Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@xxxxxxxxxxxxxxxx
> > To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@xxxxxxxxxxxxxxxx
> > Cc: Heiko Stuebner <heiko-4mtYJXux2i+zQB+pC5nmwQ@xxxxxxxxxxxxxxxx>, linux-kernel-u79uwXL29TY76Z2rM5mHXA@xxxxxxxxxxxxxxxx, Daniel Kurtz <djkurtz-F7+t8E8rja9g9hUCZPvPmw@xxxxxxxxxxxxxxxx>, Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@xxxxxxxxxxxxxxxx>, linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@xxxxxxxxxxxxxxxx, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@xxxxxxxxxxxxxxxx
> > List-Id: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > Even though the code uses the dt_lock spin lock to serialize mapping
> > operation from different threads, it does not protect from IOMMU
> > accesses that might be already taking place and thus altering state
> > of the IOTLB. This means that current mapping code which first zaps
> > the page table and only then updates it with new mapping which is
> > prone to mentioned race.
> >
> > In addition, current code assumes that mappings are always > 4 MiB
> > (which translates to 1024 PTEs) and so they would always occupy
> > entire page tables. This is not true for mappings created by V4L2
> > Videobuf2 DMA contig allocator.
> >
> > This patch changes the mapping code to always zap the page table
> > after it is updated, which avoids the aforementioned race and also
> > zap the last page of the mapping to make sure that stale data is
> > not cached from an already existing mapping.
> >
> > Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@xxxxxxxxxxxxxxxx>
> > Reviewed-by: Daniel Kurtz <djkurtz-F7+t8E8rja9g9hUCZPvPmw@xxxxxxxxxxxxxxxx>
> > ---
> > drivers/iommu/rockchip-iommu.c | 23 +++++++++++++++++------
> > 1 file changed, 17 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> > index 6a8b1ec..b06fe76 100644
> > --- a/drivers/iommu/rockchip-iommu.c
> > +++ b/drivers/iommu/rockchip-iommu.c
> > @@ -544,6 +544,15 @@ static void rk_iommu_zap_iova(struct rk_iommu_domain *rk_domain,
> > spin_unlock_irqrestore(&rk_domain->iommus_lock, flags);
> > }
> >
> > +static void rk_iommu_zap_iova_first_last(struct rk_iommu_domain *rk_domain,
> > + dma_addr_t iova, size_t size)
> > +{
> > + rk_iommu_zap_iova(rk_domain, iova, SPAGE_SIZE);
> > + if (size > SPAGE_SIZE)
> > + rk_iommu_zap_iova(rk_domain, iova + size - SPAGE_SIZE,
> > + SPAGE_SIZE);
> > +}
> > +
> > static u32 *rk_dte_get_page_table(struct rk_iommu_domain *rk_domain,
> > dma_addr_t iova)
> > {
> > @@ -568,12 +577,6 @@ static u32 *rk_dte_get_page_table(struct rk_iommu_domain *rk_domain,
> > rk_table_flush(page_table, NUM_PT_ENTRIES);
> > rk_table_flush(dte_addr, 1);
> >
> > - /*
> > - * Zap the first iova of newly allocated page table so iommu evicts
> > - * old cached value of new dte from the iotlb.
> > - */
> > - rk_iommu_zap_iova(rk_domain, iova, SPAGE_SIZE);
> > -
> > done:
> > pt_phys = rk_dte_pt_address(dte);
> > return (u32 *)phys_to_virt(pt_phys);
> > @@ -623,6 +626,14 @@ static int rk_iommu_map_iova(struct rk_iommu_domain *rk_domain, u32 *pte_addr,
> >
> > rk_table_flush(pte_addr, pte_count);
> >
> > + /*
> > + * Zap the first and last iova to evict from iotlb any previously
> > + * mapped cachelines holding stale values for its dte and pte.
> > + * We only zap the first and last iova, since only they could have
> > + * dte or pte shared with an existing mapping.
> > + */
> > + rk_iommu_zap_iova_first_last(rk_domain, iova, size);
> > +
> > return 0;
> > unwind:
> > /* Unmap the range of iovas that we just mapped */
> >