Re: [PATCH v6 3/4] arm64: use both ZONE_DMA and ZONE_DMA32

From: John Stultz
Date: Tue Dec 03 2019 - 00:39:20 EST


On Mon, Dec 2, 2019 at 9:08 PM John Stultz <john.stultz@xxxxxxxxxx> wrote:
>
> On Wed, Sep 11, 2019 at 11:26 AM Nicolas Saenz Julienne
> <nsaenzjulienne@xxxxxxx> wrote:
> > So far all arm64 devices have supported 32 bit DMA masks for their
> > peripherals. This is not true anymore for the Raspberry Pi 4 as most of
> > it's peripherals can only address the first GB of memory on a total of
> > up to 4 GB.
> >
> > This goes against ZONE_DMA32's intent, as it's expected for ZONE_DMA32
> > to be addressable with a 32 bit mask. So it was decided to re-introduce
> > ZONE_DMA in arm64.
> >
> > ZONE_DMA will contain the lower 1G of memory, which is currently the
> > memory area addressable by any peripheral on an arm64 device.
> > ZONE_DMA32 will contain the rest of the 32 bit addressable memory.
> >
> > Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@xxxxxxx>
> > Reviewed-by: Catalin Marinas <catalin.marinas@xxxxxxx>
>
> Hey Nicolas,
> Testing the db845c with linus/master, I found a regression causing
> system hangs in early boot:
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x517f803c]
> [ 0.000000] Linux version 5.4.0-mainline-10675-g957a03b9e38f
> (docker@a4ec90a1e72c) (gcc version 7.4.0 (Ubuntu/Linaro
> 7.4.0-1ubuntu1~18.04.1)) #1209 SMP PREEMPT Tue Dec 3 00:23:15 UTC 2019
> [ 0.000000] Machine model: Thundercomm Dragonboard 845c
> [ 0.000000] earlycon: qcom_geni0 at MMIO 0x0000000000a84000
> (options '115200n8')
> [ 0.000000] printk: bootconsole [qcom_geni0] enabled
> [ 0.000000] efi: Getting EFI parameters from FDT:
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] cma: Reserved 16 MiB at 0x00000000ff000000
> [ 0.000000] psci: probing for conduit method from DT.
> [ 0.000000] psci: PSCIv1.1 detected in firmware.
> [ 0.000000] psci: Using standard PSCI v0.2 function IDs
> [ 0.000000] psci: MIGRATE_INFO_TYPE not supported.
> [ 0.000000] psci: SMC Calling Convention v1.0
> [ 0.000000] psci: OSI mode supported.
> [ 0.000000] percpu: Embedded 31 pages/cpu s87512 r8192 d31272 u126976
> [ 0.000000] Detected VIPT I-cache on CPU0
> [ 0.000000] CPU features: detected: GIC system register CPU interface
> [ 0.000000] CPU features: kernel page table isolation forced ON by KASLR
> [ 0.000000] CPU features: detected: Kernel page table isolation (KPTI)
> [ 0.000000] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware
> [ 0.000000] CPU features: detected: Hardware dirty bit management
> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: -188245
> [ 0.000000] Kernel command line: earlycon
> firmware_class.path=/vendor/firmware/ androidboot.hardware=db845c
> init=/init androidboot.boot_devices=soc/1d84000.ufshc
> printk.devkmsg=on buildvariant=userdebug root=/dev/sda2
> androidboot.bootdevice=1d84000.ufshc androidboot.serialno=c4e1189c
> androidboot.baseband=sda
> msm_drm.dsi_display0=dsi_lt9611_1080_video_display:
> androidboot.slot_suffix=_a skip_initramfs rootwait ro init=/init
>
> <hangs indefinitely here>
>
> I bisected the issue down to this patch (1a8e1cef7603 upstream - the
> previous patch a573cdd7973d works though I need to apply the
> arm64_dma_phys_limit bit from this one as the previous patch doesn't
> build on its own).
>
> In the above log:
> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: -188245
> looks the most suspect, and going back to the working a573cdd7973d +
> build fix I see:
> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 957419
>
> Do you have any suggestions for what might be going wrong?

Digging further, it seems the error is found in calculate_node_totalpages()
real_size = size - zone_absent_pages_in_node(pgdat->node_id, i,
node_start_pfn, node_end_pfn,
zholes_size);

Where for zone DMA32 size is 262144, but real_size is calculated as -883520.

I've not traced through to figure out why zone_absent_pages_in_node is
coming up with such a large number yet, but I'm about to crash so I
wanted to share.

thanks
-john