Re: [PATCH v2 00/21] Refine memblock API

From: Mike Rapoport
Date: Fri Oct 04 2019 - 13:29:16 EST


On Fri, Oct 04, 2019 at 10:27:27AM +0100, Russell King - ARM Linux admin wrote:
> On Thu, Oct 03, 2019 at 02:30:10PM +0300, Mike Rapoport wrote:
> > On Thu, Oct 03, 2019 at 09:49:14AM +0100, Russell King - ARM Linux admin wrote:
> > > On Thu, Oct 03, 2019 at 08:34:52AM +0300, Mike Rapoport wrote:
> > > > (trimmed the CC)
> > > >
> > > > On Wed, Oct 02, 2019 at 06:14:11AM -0500, Adam Ford wrote:
> > > > > On Wed, Oct 2, 2019 at 2:36 AM Mike Rapoport <rppt@xxxxxxxxxxxxx> wrote:
> > > > > >
> > > > >
> > > > > Before the patch:
> > > > >
> > > > > # cat /sys/kernel/debug/memblock/memory
> > > > > 0: 0x10000000..0x8fffffff
> > > > > # cat /sys/kernel/debug/memblock/reserved
> > > > > 0: 0x10004000..0x10007fff
> > > > > 34: 0x2fffff88..0x3fffffff
> > > > >
> > > > >
> > > > > After the patch:
> > > > > # cat /sys/kernel/debug/memblock/memory
> > > > > 0: 0x10000000..0x8fffffff
> > > > > # cat /sys/kernel/debug/memblock/reserved
> > > > > 0: 0x10004000..0x10007fff
> > > > > 36: 0x80000000..0x8fffffff
> > > >
> > > > I'm still not convinced that the memblock refactoring didn't uncovered an
> > > > issue in etnaviv driver.
> > > >
> > > > Why moving the CMA area from 0x80000000 to 0x30000000 makes it fail?
> > >
> > > I think you have that the wrong way round.
> >
> > I'm relying on Adam's reports of working and non-working versions.
> > According to that etnaviv works when CMA area is at 0x80000000 and does not
> > work when it is at 0x30000000.
> >
> > He also sent logs a few days ago [1], they also confirm that.
> >
> > [1] https://lore.kernel.org/linux-mm/CAHCN7xJEvS2Si=M+BYtz+kY0M4NxmqDjiX9Nwq6_3GGBh3yg=w@xxxxxxxxxxxxxx/
>
> Sorry, yes, you're right. Still, I've reported this same regression
> a while back, and it's never gone away.
>
> > > > BTW, the code that complained about "command buffer outside valid memory
> > > > window" has been removed by the commit 17e4660ae3d7 ("drm/etnaviv:
> > > > implement per-process address spaces on MMUv2").
> > > >
> > > > Could be that recent changes to MMU management of etnaviv resolve the
> > > > issue?
> > >
> > > The iMX6 does not have MMUv2 hardware, it has MMUv1. With MMUv1
> > > hardware requires command buffers within the first 2GiB of physical
> > > RAM.
> >
> > I've mentioned that patch because it removed the check for cmdbuf address
> > for MMUv1:
> >
> > @@ -785,15 +768,7 @@ int etnaviv_gpu_init(struct etnaviv_gpu *gpu)
> > PAGE_SIZE);
> > if (ret) {
> > dev_err(gpu->dev, "could not create command buffer\n");
> > - goto unmap_suballoc;
> > - }
> > -
> > - if (!(gpu->identity.minor_features1 & chipMinorFeatures1_MMU_VERSION) &&
> > - etnaviv_cmdbuf_get_va(&gpu->buffer, &gpu->cmdbuf_mapping) > 0x80000000) {
> > - ret = -EINVAL;
> > - dev_err(gpu->dev,
> > - "command buffer outside valid memory window\n");
> > - goto free_buffer;
> > + goto fail;
> > }
> >
> > /* Setup event management */
> >
> >
> > I really don't know how etnaviv works, so I hoped that people who
> > understand it would help.
>
> From what I can see, removing that check is a completely insane thing
> to do, and I note that these changes are _not_ described in the commit
> message. The problem was known about _before_ (June 22) the patch was
> created (July 5).

The memblock refactoring went in in 5.1 which was May 5, and likely it
caused the regression.

Unless I'm missing something, before the memblock refactoring the CMA
reservation could use the entire physical memory because
memblock_phys_alloc() didn't enforce memblock.current_limit.

Since memblock default is to allocate from top, cma_declare_contiguous()
could grab the memory close to the end of DRAM and thus have physical
address close enough to the virtual address to fit in the 2G limit.

When I've made memblock_phys* limit the memblock allocations to
memblock.current_limit the CMA area moved too far away down and the gap
became larger than 2G.

It does not seem like dealing with this in etnaviv driver and DMA and CMA
APIs would happen fast and the "revert" of the memblock changes I've sent
earlier in this thread does fix the problem.

Andrew, would you like me to resend the patch in a separate e-mail?

> Lucas, please can you explain why removing the above check, which is
> well known to correctly trigger on various platforms to prevent
> incorrect GPU behaviour, is safe?
>
> Thanks.
>

--
Sincerely yours,
Mike.