Re: IO_PAGE_FAULTs on unity mapped regions during amd_iommu_init()in Linux 3.4

From: Shuah Khan
Date: Fri Feb 01 2013 - 13:32:11 EST


On Fri, 2013-02-01 at 14:00 +0100, Joerg Roedel wrote:
> Hi Shuah,
>
> On Thu, Jan 31, 2013 at 11:33:30AM -0700, Shuah Khan wrote:
> > Access to these ranges continues to work with no errors until AMD IOMMU
> > driver disables and re-enables IOMMU in enable_iommus(). These faults
> > don't persist and appear between the enable_iommus() call and before
> > amd_iommu_init() gets done printing "AMD-Vi: Lazy IO/TLB flushing
> > enabled" message.
>
> Hmm, okay. I had a look into the v3.4 sources. This looks like a race
> condition. The IOMMUs are enabled in amd_iommu_init_hardware() but the
> unity-mapped regions are created later in amd_iommu_init_dma_ops(). This
> leaves a small window where the page-faults happen that you see.
>
> But I am not sure why this doesn't hit on 3.7 and above. The race is
> still there. Anyway, definitly something that needs to be fixed.
>

Hi Joerg,

Yes, 3.7 has the same window of opportunity for this race condition,
however I couldn't figure out why it doesn't happen on 3.7. On 3.7 the
window between amd_iommu_init_hardware() and amd_iommu_init_dma_ops()
might actually be wider than the window in 3.4.

I think understanding why it doesn't happen on 3.7 is probably key. On
3.6, I experimented with back-porting your Split device table
initialization patch (33f28c59e18d83fd2aeef258d211be66b9b80eb3) from 3.7
and the patch that moved iommu_init from subsys_initcall() to
arch_initcall() and that solved the problem on 3.6. I am attaching those
patches. I can't easily back-port either one of those to 3.4 though.

That experiment made me think that this problem has something to do with
when device_table gets initialized vs. dma_ops are initialized. However,
there is no change to when unity mapped regions are created in 3.4 and
3.7.

If you look at 3.4 initialization sequence closely, you will notice that
init_device_table() gets called before init_iommu_all() and
init_memory_definitions() get done.

Another big difference is 3.4 init_device_table() sets DEV_ENTRY_VALID,
and DEV_ENTRY_TRANSLATION bits way earlier than 3.7 and these bits get
set in init_device_table_dma() which is called much later in 3.7.

init_unity_mappings_for_device() has a strong dependency on pci
sub-system having been initialized. Is it possible to move it up closer
to amd_iommu_init_hardware()?

I have a system I can reproduce the problem easily and I have a tried
making a few changes to the initialization sequence, with no results.
Any thoughts what other changes should I be looking at to solve the
problem besides the ones I already tried.

Thanks,
-- Shuah