Re: Problems with alpha/pci + radeon/ttm

From: FUJITA Tomonori
Date: Sun Jun 27 2010 - 00:21:36 EST


On Thu, 24 Jun 2010 21:51:40 +1200
Michael Cree <mcree@xxxxxxxxxxxx> wrote:

> >> Is this a regression (what kernel version worked)?
> >>
> >> Seems that the IOMMU can't find 128 pages. It's likely due to:
> >>
> >> - out of the IOMMU space (possibly someone doesn't free the IOMMU
> >> space).
> >>
> >> or
> >>
> >> - the mapping parameters (such as align) aren't appropriate so the
> >> IOMMU can't find space.
> >
> > I don't think KMS drivers have ever worked on alpha so its not a
> > regression, they are working fine on x86 + powerpc and sparc has been
> > run at least once.
>
> KMS on the console boot up has worked since about 2.6.32, but starting
> up the X server has always failed and, in my case, the system becomes
> unstable and eventually OOPs.
>
> > I suspect we are simply hitting the limits of the iommu, how big an
> > address space does it handle? since generally graphics drivers try to
> > bind a lot of things to the GART.
>
> No idea on the address space limit. I applied the patch of Fujita that
> logs all IOMMU allocations, and also inserted some extra printks in the
> ttm kernel code so that I could see which routines failed and the error
> code returned. Running the radeon test on boot exhibits the following:
>
> [ 238.712768] [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset
> 0x1a312000
> [ 239.281127] [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset
> 0x1a412000
> [ 239.281127] ttm_tt_bind belched -12
> [ 239.282104] ttm_bo_handle_move_mem belched -12
> [ 239.282104] ttm_bo_move_buffer belched -12
> [ 239.282104] ttm_bo_validate belched -12
> [ 239.282104] radeon 0000:01:00.0: object_init failed for (1048576,
> 0x00000002) err=-12
> [ 239.282104] [drm:radeon_test_moves] *ERROR* Failed to create GTT
> object 419
> [ 239.399291] Error while testing BO move.
>
> Note that no IOMMU allocations are printed while radeon_test_moves is
> running so iommu_arena_alloc doesn't appear to be called. Also the
> error code returned up to radeon_test_moves is -12 which is ENOMEM. So
> does appear to be some memory limit.

Hmm, not related with IOMMU? looks like ttm_tt_populate could return
ENOMEM too. Can we locate where we hit ENOMEM first?
--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html