On Wed, Jan 08, 2025 at 09:38:19PM +0530, Manaf Meethalavalappu Pallikunhi wrote:Both firmware and software passive mitigations (CPU/GPU) are not permitted for Safe SoC. As mentioned in the commit, it is the responsibility of the safety subsystem to take corrective action for high temperatures. One of the safety requirements (not a functional requirement) for Safe SoC is to avoid scaling of CPUs and bus DCVS, as this could impact safety-critical workloads. Therefore, Safe SoC will operate at maximum frequency with the performance governor. Enabling a cooling device for the CPU would expose the cooling device sysfs interface, allowing userspace to request different cooling states via the cooling device cur_state sysfs, which could potentially lower the frequency and violate the safety requirement.
Hi Dmitry,THere are a lot of interfaces exported to the userspace.
On 1/8/2025 6:16 PM, Dmitry Baryshkov wrote:
On Wed, Jan 08, 2025 at 05:57:06PM +0530, Manaf Meethalavalappu Pallikunhi wrote:From the perspective of a safe SoC, any software mitigation that compromises
Hi Dmitry,"is only necessary" is fine. It means that it is an optional part which
On 1/3/2025 11:21 AM, Dmitry Baryshkov wrote:
On Tue, Dec 31, 2024 at 05:31:41PM +0530, Manaf Meethalavalappu Pallikunhi wrote:As noted in the commit, thermal cooling mitigation is only necessary for
Hi Dmitry,"we don't want" is not a proper justification. So, no.
On 12/30/2024 9:10 PM, Dmitry Baryshkov wrote:
On Sun, Dec 29, 2024 at 08:53:32PM +0530, Wasim Nazir wrote:Sure, I will move idle injection node for each cpu to sa8775p.dtsi and keep
From: Manaf Meethalavalappu Pallikunhi <quic_manafm@xxxxxxxxxxx>Does anything prevent us from having this config as a part of the basic
In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
does corrective action for each subsystem based on sensor violation
to comply safety standards. But as QCS9075 is non-safe SoC it
requires conventional thermal mitigation to control thermal for
different subsystems.
The cpu frequency throttling for different cpu tsens is enabled in
hardware as first defense for cpu thermal control. But QCS9075 SoC
has higher ambient specification. During high ambient condition, even
lowest frequency with multi cores can slowly build heat over the time
and it can lead to thermal run-away situations. This patch restrict
cpu cores during this scenario helps further thermal control and
avoids thermal critical violation.
Add cpu idle injection cooling bindings for cpu tsens thermal zones
as a mitigation for cpu subsystem prior to thermal shutdown.
Add cpu frequency cooling devices that will be used by userspace
thermal governor to mitigate skin thermal management.
sa8775p.dtsi setup? If HW is present in the base version but it is not
accessible for whatever reason, please move it the base device config
and use status "disabled" or "reserved" to the respective board files.
it disabled state. #cooling cells property for CPU, still wanted to keep it
in board files as we don't want to enable any cooling device in base DT.
non-safe SoCs. Adding this cooling cell property to the CPU node in the base
DT (sa8775p.dtsi), which is shared by both safe and non-safe SoCs, would
violate the requirements for safe SoCs. Therefore, we will include it only
in non-safe SoC boards.
is going to be unused / ignored / duplicate functionality on the "safe"
SoCs. What kind of requirement is going to be violated in this way?
the safety subsystem’s compliance should not be allowed. Enabling the
cooling device also opens up the sysfs interface for userspace, which we may
not fully control.
Userspace apps or partner apps might inadvertently useHow can thermal mitigation interface lead to safety failure? Userspace
it. Therefore, we believe it is better not to expose such an interface, as
it is not required for that SoC and helps to avoid opening up an interface
that could potentially lead to a safety failure.
can possibly lower trip points, but it can not override existing
firmware-based mitigation.
And if there is a known problem with the interface, it should be fixed
instead.
Best Regards,
Manaf