Re: [PATCH v2 00/21] Refine memblock API

From: Mike Rapoport
Date: Fri Oct 04 2019 - 13:10:30 EST


On Fri, Oct 04, 2019 at 03:21:03PM +0200, Lucas Stach wrote:
> Am Freitag, den 04.10.2019, 10:27 +0100 schrieb Russell King - ARM
> Linux admin:
> > On Thu, Oct 03, 2019 at 02:30:10PM +0300, Mike Rapoport wrote:
> > > On Thu, Oct 03, 2019 at 09:49:14AM +0100, Russell King - ARM Linux
> > > admin wrote:
> > > > On Thu, Oct 03, 2019 at 08:34:52AM +0300, Mike Rapoport wrote:
> > > > > (trimmed the CC)
> > > > >
> > > > > On Wed, Oct 02, 2019 at 06:14:11AM -0500, Adam Ford wrote:
> > > > > > On Wed, Oct 2, 2019 at 2:36 AM Mike Rapoport <
> > > > > > rppt@xxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > Before the patch:
> > > > > >
> > > > > > # cat /sys/kernel/debug/memblock/memory
> > > > > > 0: 0x10000000..0x8fffffff
> > > > > > # cat /sys/kernel/debug/memblock/reserved
> > > > > > 0: 0x10004000..0x10007fff
> > > > > > 34: 0x2fffff88..0x3fffffff
> > > > > >
> > > > > >
> > > > > > After the patch:
> > > > > > # cat /sys/kernel/debug/memblock/memory
> > > > > > 0: 0x10000000..0x8fffffff
> > > > > > # cat /sys/kernel/debug/memblock/reserved
> > > > > > 0: 0x10004000..0x10007fff
> > > > > > 36: 0x80000000..0x8fffffff
> > > > >
> > > > > I'm still not convinced that the memblock refactoring didn't
> > > > > uncovered an
> > > > > issue in etnaviv driver.
> > > > >
> > > > > Why moving the CMA area from 0x80000000 to 0x30000000 makes it
> > > > > fail?
> > > >
> > > > I think you have that the wrong way round.
> > >
> > > I'm relying on Adam's reports of working and non-working versions.
> > > According to that etnaviv works when CMA area is at 0x80000000 and
> > > does not
> > > work when it is at 0x30000000.
> > >
> > > He also sent logs a few days ago [1], they also confirm that.
> > >
> > > [1]
> > > https://lore.kernel.org/linux-mm/CAHCN7xJEvS2Si=M+BYtz+kY0M4NxmqDjiX9Nwq6_3GGBh3yg=w@xxxxxxxxxxxxxx/
> >
> > Sorry, yes, you're right. Still, I've reported this same regression
> > a while back, and it's never gone away.
> >
> > > > > BTW, the code that complained about "command buffer outside
> > > > > valid memory
> > > > > window" has been removed by the commit 17e4660ae3d7
> > > > > ("drm/etnaviv:
> > > > > implement per-process address spaces on MMUv2").
> > > > >
> > > > > Could be that recent changes to MMU management of etnaviv
> > > > > resolve the
> > > > > issue?
> > > >
> > > > The iMX6 does not have MMUv2 hardware, it has MMUv1. With MMUv1
> > > > hardware requires command buffers within the first 2GiB of
> > > > physical
> > > > RAM.
> > >
> > > I've mentioned that patch because it removed the check for cmdbuf
> > > address
> > > for MMUv1:
> > >
> > > @@ -785,15 +768,7 @@ int etnaviv_gpu_init(struct etnaviv_gpu *gpu)
> > > PAGE_SIZE);
> > > if (ret) {
> > > dev_err(gpu->dev, "could not create command
> > > buffer\n");
> > > - goto unmap_suballoc;
> > > - }
> > > -
> > > - if (!(gpu->identity.minor_features1 &
> > > chipMinorFeatures1_MMU_VERSION) &&
> > > - etnaviv_cmdbuf_get_va(&gpu->buffer, &gpu-
> > > >cmdbuf_mapping) > 0x80000000) {
> > > - ret = -EINVAL;
> > > - dev_err(gpu->dev,
> > > - "command buffer outside valid memory
> > > window\n");
> > > - goto free_buffer;
> > > + goto fail;
> > > }
> > >
> > > /* Setup event management */
> > >
> > >
> > > I really don't know how etnaviv works, so I hoped that people who
> > > understand it would help.
> >
> > From what I can see, removing that check is a completely insane thing
> > to do, and I note that these changes are _not_ described in the
> > commit
> > message. The problem was known about _before_ (June 22) the patch
> > was
> > created (July 5).
> >
> > Lucas, please can you explain why removing the above check, which is
> > well known to correctly trigger on various platforms to prevent
> > incorrect GPU behaviour, is safe?
>
> It isn't. It's a pretty big oversight in this commit to remove this
> check. It can't be done at the same spot in the code anymore, as we
> don't have a mapping context at this time anymore, but it should have
> moved into etnaviv_iommu_context_init(). I'll send a patch to fix this
> up.

Lucas, can you make the check use SZ_2G instead of 0x80000000 and add a
comment about 2G limitation of the aperture window?

> Regards,
> Lucas
>

--
Sincerely yours,
Mike.