Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy

From: Jerome Glisse
Date: Fri Dec 02 2016 - 10:17:37 EST


On Fri, Dec 02, 2016 at 11:26:02AM +0100, Lucas Stach wrote:
> Am Donnerstag, den 01.12.2016, 15:11 +0100 schrieb Michal Hocko:
> > Let's also CC Marek
> >
> > On Thu 01-12-16 08:43:40, Vlastimil Babka wrote:
> > > On 12/01/2016 08:21 AM, Michal Hocko wrote:
> > > > Forgot to CC Joonsoo. The email thread starts more or less here
> > > > http://lkml.kernel.org/r/20161130092239.GD18437@xxxxxxxxxxxxxx
> > > >
> > > > On Thu 01-12-16 08:15:07, Michal Hocko wrote:
> > > > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote:
> > > > > [...]
> > > > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy
> > > > >
> > > > > Huh, do I get it right that the request was for a _single_ page? Why do
> > > > > we need CMA for that?
> > >
> > > Ugh, good point. I assumed that was just the PFNs that it failed to migrate
> > > away, but it seems that's indeed the whole requested range. Yeah sounds some
> > > part of the dma-cma chain could be smarter and attempt CMA only for e.g.
> > > costly orders.
> >
> > Is there any reason why the DMA api doesn't try the page allocator first
> > before falling back to the CMA? I simply have a hard time to see why the
> > CMA should be used (and fragment) for small requests size.
>
> On x86 that is true, but on ARM CMA is the only (low memory) region that
> can change the memory attributes, by being excluded from the lowmem
> section mapping. Changing the memory attributes to
> uncached/writecombined for DMA is crucial on ARM to fulfill the
> requirement that no there aren't any conflicting mappings of the same
> physical page.
>
> On ARM we can possibly do the optimization of asking the page allocator,
> but only if we can request _only_ highmem pages.
>

So this memory allocation strategy should only apply to ARM and not x86 we
already had fall out couple year ago when Ubuntu decided to enable CMA on
x86 where it does not make sense as i don't think we have any single device
we care that is not behind an IOMMU and thus does not require contiguous
memory allocation.

The DMA API should only use CMA on architecture where it is necessary not
on all of them.

Cheers,
Jérôme