Re: [RFC] arm64: mm: Do not use both DMA zones when 30-bit address space unavailable
From: Nicolas Saenz Julienne
Date: Mon Sep 07 2020 - 13:51:07 EST
Hi Catalin, sorry for the late reply, I was on vacation.
On Fri, 2020-08-28 at 18:43 +0100, Catalin Marinas wrote:
> Hi Nicolas,
>
> On Wed, Aug 19, 2020 at 08:24:33PM +0200, Nicolas Saenz Julienne wrote:
> > There is no benefit in splitting the 32-bit address space into two
> > distinct DMA zones when the 30-bit address space isn't even available on
> > a device. If that is the case, default to one big ZONE_DMA spanning the
> > whole 32-bit address space.
> >
> > This will help reduce some of the issues we've seen with big crash
> > kernel allocations.
> >
> > Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@xxxxxxx>
> > ---
> >
> > Whith this patch, on a 8GB RPi4 the setup looks like this:
> >
> > DMA [mem 0x0000000000000000-0x000000003fffffff]
> > DMA32 [mem 0x0000000040000000-0x00000000ffffffff]
> > Normal [mem 0x0000000100000000-0x00000001ffffffff]
> >
> > And stock 8GB virtme/qemu:
> >
> > DMA [mem 0x0000000040000000-0x00000000ffffffff]
> > DMA32 empty
> > Normal [mem 0x0000000100000000-0x000000023fffffff]
> >
> > arch/arm64/mm/init.c | 29 +++++++++++++++++++++++++----
> > 1 file changed, 25 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index b6881d61b818..857a62611d7a 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -183,13 +183,20 @@ static void __init reserve_elfcorehdr(void)
> >
> > /*
> > * Return the maximum physical address for a zone with a given address size
> > - * limit. It currently assumes that for memory starting above 4G, 32-bit
> > - * devices will use a DMA offset.
> > + * limit or zero if memory starts from an address higher than the zone targeted.
> > + * It currently assumes that for memory starting above 4G, 32-bit devices will
> > + * use a DMA offset.
> > */
> > static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> > {
> > - phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, zone_bits);
> > - return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
> > + phys_addr_t base = memblock_start_of_DRAM();
> > + phys_addr_t offset = base & GENMASK_ULL(63, 32);
> > + s64 zone_size = (1ULL << zone_bits) - (base & DMA_BIT_MASK(32));
> > +
> > + if (zone_size <= 0)
> > + return 0;
> > +
> > + return min(base + zone_size + offset, memblock_end_of_DRAM());
> > }
>
> OK, so we can still get some ZONE_DMA if DRAM starts in the first GB.
>
> I don't think it entirely solves the problem.
Agree. Didn't mean to imply it.
> It just happens that the
> other affected SoCs don't have memory in the first GB. With this patch,
> we go by the assumption that ZONE_DMA/DMA32 split is only needed if
> there is memory in the low 1GB and such <32-bit devices don't have a DMA
> offset.
The way I understand it is: "we may have 30 bit DMA limited devices, the rest
can deal with 32 bit."
On top of that, I believe it makes little sense to use an offset in the
physical address space below the 32bit mark. You'd be limiting the amount of
memory available to the system. So, if you're going support DMA limited devices
on your otherwise RAM hungry SoC, you'll have to have that physical address
space directly available, or at least part of it.
All in all, I believe that assuming no 30 bit DMA limited devices exist in the
system if the physical addresses don't exist is a fairly safe.
Also note the usage of 'zone_dma_bits' in the DMA code, which assumes that
ZONE_DMA's physical address space is always smaller than (1 << zone_dma_bits) -
1.
> Adding Rob H (it's easier to ask him than grep'ing the DT files ;)), we
> may be ok with this assumption on current SoCs.
From what I've personally grep'd there is no new devices with odd ranges in
sight.
> An alternative (and I think we had a patch at some point) is to make it
> generic and parse the dma-range in the DT to identify the minimum mask
> and set ZONE_DMA accordingly. But this doesn't solve ACPI, so if Linux
> can boot with ACPI on RPi4 it would still be broken.
ACPI is being worked on by, among others, Jeremy Linton (one of your colleagues
I believe).
We could always use sane defaults for ACPI and be smarter with DT. Yet,
implementing this entails translating nested dma-ranges with the only help of
libfdt, which isn't trivial (see devices/of/address.c). IIRC RobH said that it
wasn't worth the effort just for a board.
Regards,
Nicolas
Attachment:
signature.asc
Description: This is a digitally signed message part