Re: [PATCH v4 3/5] memory: tegra186-emc: Support non-bpmp icc scaling

From: Jon Hunter
Date: Wed Dec 10 2025 - 10:04:56 EST



On 10/12/2025 05:06, Aaron Kling wrote:

...

Let me try to iterate the potential issues I've seen stated here. If
I'm missing anything, please fill in the blanks.

1) If this change is applied without the related dt change and the
pcie drvier is loaded, the emc clock can become stuck at the lowest
rate. This is caused by the pcie driver providing icc data, but
nothing else is. So the very low requested bandwidth results in the
emc clock being set very low. I'm not sure there is a 'fix' for this,
beyond making sure the dt change is merged to ensure that the cpufreq
driver provides bandwidth info, causing the emc driver to select a
more reasonable emc clock rate. This is a similar situation to what's
currently blocking the tegra210 actmon series. I don't think there is
a way for the drivers to know if icc data is missing/wrong. The
scaling is doing exactly what it's told based on the icc routing given
in the dt.

So this is the fundamental issue with this that must be fixed. We can't allow the PCIe driver to slow the system down. I think that Krzysztof suggested we need some way to determine if the necessary ICC clients are present/registered for ICC to work. Admittedly, I have no idea if there is a simple way to do this, but we need something like that.

2) Jon, you report that even with both this change and the related dt
change, that the issue is still not fixed. But then posted a log
showing that the emc rate is set to max. If the issue is that emc rate
is too low, then how can debugfs report that the rate is max? For
reference, everything scales as expected for me given this change plus
the dt change on both p2771 and p3636+p3509.

To clarify, this broke the boot test on Tegra194 because the boot was too slow. However, this also broke the EMC test on Tegra186 because setting the frequency from the debugfs failed. So two different failures on two different devices. I am guessing the EMC test would also fail on Tegra194, but given that it does not boot, we did not get that far.

3) If icc is requesting enough bandwidth to set the emc clock to a
high value, then a user tries to set debugfs max_freq to a lower
value, this code will reject the change. I do not believe this is an
issue unique to this code. tegra20-emc, tegra30-emc, and tegra124-emc
all have this same flow. And so does my proposed change to
tegra210-emc-core in the actmon series. This is why I asked if
tegra124 ran this test, to see if the failure was unique. If this is
not a unique failure, then I'd argue that all instances need changed,
not just this one causing diverging results depending on the soc being
utilized. A lot of the work I'm doing is to try to bring unity and
feature parity to all the tegra socs I'm working on. I don't want to
cause even more divergence.

Yes that is fair point, however, we need to detect this in the tegra-tests so that we know that this will not work. It would be nice if we could disable ICC from userspace and then run the test.

Bottom line here is that #1 is the problem that needs to be fixed.

Jon

--
nvpublic