Re: [PATCH v4 3/5] memory: tegra186-emc: Support non-bpmp icc scaling
From: Jon Hunter
Date: Thu Dec 11 2025 - 02:47:37 EST
On 10/12/2025 22:41, Aaron Kling wrote:
On Wed, Dec 10, 2025 at 3:24 PM Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
On 10/12/2025 18:32, Aaron Kling wrote:
On Wed, Dec 10, 2025 at 9:04 AM Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
On 10/12/2025 05:06, Aaron Kling wrote:
...
Let me try to iterate the potential issues I've seen stated here. If
I'm missing anything, please fill in the blanks.
1) If this change is applied without the related dt change and the
pcie drvier is loaded, the emc clock can become stuck at the lowest
rate. This is caused by the pcie driver providing icc data, but
nothing else is. So the very low requested bandwidth results in the
emc clock being set very low. I'm not sure there is a 'fix' for this,
beyond making sure the dt change is merged to ensure that the cpufreq
driver provides bandwidth info, causing the emc driver to select a
more reasonable emc clock rate. This is a similar situation to what's
currently blocking the tegra210 actmon series. I don't think there is
a way for the drivers to know if icc data is missing/wrong. The
scaling is doing exactly what it's told based on the icc routing given
in the dt.
So this is the fundamental issue with this that must be fixed. We can't
allow the PCIe driver to slow the system down. I think that Krzysztof
suggested we need some way to determine if the necessary ICC clients are
present/registered for ICC to work. Admittedly, I have no idea if there
is a simple way to do this, but we need something like that.
I'm not sure I understand how checking clients would work. Is there a
mechanism for the emc driver to know if cpufreq is registered to icc
in a way that works with probe deferrals, but also allows for it to be
optional?
I am not sure if such a mechanism exists either, but it seems that we
need something like this.
Alternatively if there is not, can we just accept the abi break and
have this and the dt change depend on each other? I know it's not
desirable or the first choice, but if the other option is to rewrite
part of the icc system, then perhaps it should be an option.
I am not sure it is an ABI break, but the default performance might be
worse. I am not sure if you are proposing a way to enforce the
dependency or just saying that there is a dependency. We can't do the
latter, but if there is a way for the kernel to check the dependency and
make the right choice, then that should work.
So we can't accept that older dt's will run slower on a newer kernel
and say that a newer dt is needed for full performance?
If that's not an option, then I have no idea how to resolve this. I'm
not greatly knowledgeable about the icc subsystem. I can try to look
into options, but I'm not greatly optimistic about me finding one. If
someone could suggest a concept on how to make it work, I could
implement it. But I'm not even seeing the concept right now.
2) Jon, you report that even with both this change and the related dt
change, that the issue is still not fixed. But then posted a log
showing that the emc rate is set to max. If the issue is that emc rate
is too low, then how can debugfs report that the rate is max? For
reference, everything scales as expected for me given this change plus
the dt change on both p2771 and p3636+p3509.
To clarify, this broke the boot test on Tegra194 because the boot was
too slow. However, this also broke the EMC test on Tegra186 because
setting the frequency from the debugfs failed. So two different failures
on two different devices. I am guessing the EMC test would also fail on
Tegra194, but given that it does not boot, we did not get that far.
So you're saying that even with the dt changes, this change on
tegra194 still does not boot before the regression test framework
times out? If so, I need some more details about this. I have not seen
issues on p2972 or p3518. For example, if I boot to android recovery
where I set the cpufreq governor to performance, I see emc clock rate
set to 2133 MHz and 1600 MHz respectively. And boot time from kernel
start to pixels on display is 15 seconds, give or take a couple
seconds. This is using the boot stack from l4t r32.7.6.
Yes. The boot failure here is not a hard boot failure, but the device
takes too long to boot and the boot test times out. And no we will not
increase the timeout as it is there for a reason. It could well be
because the default governor is not set to performance. If you boot with
just using the stock 'defconfig' for ARM64 without setting the governor
does it take longer?
So, I checked out next-20251210, then b4 shazam'ed this series and the
matching dt series,
20251021-tegra186-icc-p3-v3-0-68184ee8a89c@xxxxxxxxx. Then built with
LLVM=1 ARCH=arm64 make defconfig
LLVM=1 ARCH=arm64 make -j33 Image nvidia/tegra194-p2972-0000.dtb
I packaged them into an android boot image using a lightly modified
copy of Gnurou's bbinitramfs which just drops to a busybox shell. Note
that this includes no modules, and since the pcie driver is =m in
defconfig, it is not included. Then I flashed that with the l4t
r32.7.6 boot stack to p2972. I got the shell on uart after 4.275
seconds in the kernel. Per sysfs, the cpufreq governor is schedutil
and all policies are idling at min freq, 115200. And per debugfs, the
emc clock is 800000000. All this looks to be as expected.
I have no idea why the regression test setup is timing out. I have not
seen the issue through any of my testing. On pure mainline as per the
above paragraph, or with the patches on the android common kernel, as
per my target use case. I don't know what to do if I can't replicate
the issue. I don't suppose the flash package for the regression test
setup is something that could be released?
I thought we already concluded that you did not see this because you did not have the PCIe module present in your testing? From the above its sounds like you still don't have that driver present and so you don't see the issue. I guess I am not surprised by that but I am not sure why you are now saying you have no idea why this is timing out? I thought this was understood.
--
nvpublic