Re: [PATCH v2 2/4] arm64: dts: rockchip: enable temperature driven fan control on Rock 5B

From: Dragan Simic
Date: Fri Feb 02 2024 - 15:15:19 EST


Hello Alexey,

On 2024-02-02 15:42, Alexey Charkov wrote:
On Thu, Feb 1, 2024 at 11:43 PM Dragan Simic <dsimic@xxxxxxxxxxx> wrote:
On 2024-02-01 20:31, Dragan Simic wrote:
On 2024-02-01 20:15, Alexey Charkov wrote:
On Thu, Feb 1, 2024 at 9:34 PM Dragan Simic <dsimic@xxxxxxxxxxx> wrote:
On 2024-02-01 15:26, Chen-Yu Tsai wrote:
Is there any reason this can't be enabled by default in the .dtsi file?
The thermal sensor doesn't depend on anything external, so there should
be no reason to push this down to the board level.

Actually, there is a reason. Different boards can handle the
critical overheating differently, by letting the CRU or the PMIC
handle it. This was also the case for the RK3399.

Please, have a look at the following DT properties, which are
consumed by drivers/thermal/rockchip_thermal.c:
- "rockchip,hw-tshut-mode"
- "rockchip,hw-tshut-polarity"

See also page 1,372 of the RK3588 TRM v1.0.

This has also reminded me to check how is the Rock 5B actually wired,
just to make sure. We actually need to provide the two DT properties
listed above, at least to avoid emitting the warnings.

Well the defaults are already provided in rk3588s.dtsi, so there won't
be any warnings (see lines 2222-2223 in Linus' master version), and
according to the vendor kernel those are also what Rock 5B uses.

Yes, I noticed the same a couple of minutes after sending my last
message, but didn't want to make more noise about it. :) I would've
mentioned it in my next message, of course.

Just checked the Rock 5B schematic and it expects the CRU to be used
to perform the hardware reset in case of a thermal runaway, so the
defaults in the RK3588s dtsi are fine. I had to double-check it. :)

I've just looked at Rock 5B, Rock 5A and NanoPC T6 schematics, and
they all seem to have the TSADC_SHUT line connected to RESET_L.

Ah, I see it now in the Rock 5B schematic, thank you for the
correction. I somehow managed to miss it initially; here's
the link to a screenshot from the Rock 5B schematic v1.423, for
future reference: https://i.imgur.com/IGAPPgl.png .

At the same time, Radxa's device tree uses the default
CRU-based option.

Here's the link to a screenshot from the RK3588 Hardware
Design Guide v1.0, which shows the recommended reset signal
paths for RK3588 boards: https://i.imgur.com/DNqhjfP.png .

As visible in the Rock 5B schematic, it expectedly follows
this recommendation from Rockchip, so we should actually use
GPIO-based handling for the thermal runaways on the Rock 5B,
to have the PMIC reset as well. Here's the link to another
screenshot from the Rock 5B schematic v1.423, for future
reference: https://i.imgur.com/BdgZ30C.png .

Of course, it should be tested to make sure that the thermal
runaway resets work fine.

It isn't uncommon for downstream DTs to sometimes contain
some small mistakes that somehow remained undetected.

To me this seems to imply that the CRU option should always work, by
the virtue of CRU being on-chip. At the same time, if the right GPIOs
are wired to the PMIC reset line for a particular board, the board
may also choose to use the GPIO option - or stick with CRU.

If that interpretation is correct, then I suggest we enable TSADC by
default in the .dtsi, and let it handle both throttling and CRU-based
critical resets on all boards. Those who know what they are doing may
then elect to switch their board to GPIO-PMIC based reset.

What do you think?

I think that, if we choose to enable CRU-based thermal runaway
resets without going into too much detail for each board (but
we should go into the publicly available board schematics, as
also noted in my last comment in this message), we should do
that in the board dts files, instead of just enabling the TSADC
in the RK3588(s) SoC dtsi.

That way, we would clearly indicate the TSADC to be a board-
specific feature, and hopefully gain more attention from the
people interested in the boards, to perform some additional
testing, etc.

However, now I have some open questions related to interrupt-driven
operation. I'll research it further and come back with an update.

This made me think however: what if a board doesn't enable TSADC, but
has OPPs in place for higher voltage and frequency states? There won't
be any throttling (as there won't be any thermal monitoring) and there
might not be a critical shutdown at all if it heats up - possibly even
causing hardware damage. In this case it seems that having TSADC
enabled by default would at least trigger passive cooling, hopefully
avoiding the critical shutdown altogether and making those properties
irrelevant in 99% cases.

Those are very good questions. Thumbs up!

The trouble is that the boards can use different wiring to handle the
thermal runaways, by expecting the PMIC to handle it or not. Thus,
it's IMHO better to simply leave that to be tested and enabled on a
board-by-board basis, whenever a new RK3588(s)-based board is added.

Thus, the only right way at this point would be to merge the addition
of the OPPs and the enabling of the TSADC for all currently supported
RK3588(s)-based boards at once, instead of just for the Rock 5B.

If we can agree on a workable 'default-on' configuration for TSADC to
be included in the .dtsi I think that would be preferable, because it
would enable all boards to benefit from higher OPPs and throttling.

Please, see my other comments. I hope we can agree on that.

This would also save us from a scenario when OPPs get included in the
default .dtsi, but TSADC is off by default, and then some poor soul
tries to add a new board with a minimal .dts, forgetting to enable
TSADC and having their SoC fried under high load...

Well, those poor souls can't escape the need to know what are
they doing. :) Also, I think it's much more likely that adding
a new RK3588-based board would actually start by editing the
board dts of some already supported RK3588 board, which the way
I propose this to be handled would already have the TSADC enabled,
eliminating the risks yout pointed out.

Please note that the TSADC has been disabled in the RK3399 SoC
dtsi and enabled on a per-RK3399-board-dtsi basis, so we'd also
have some consistency by following the same approach with the
RK3588(s) SoC dtsi. Consistency is good, if you agree.

I can handle the required changes for the QuartzPro64 dts file. For
other supported RK3588(s)-based boards, if there are no people having
access to them and willing to perform the dts changes and the testing,
I'd be willing to go through the board schematics, to enable the
TSADC for them as well.

Please, let me know are you fine with the above-described approach.

I believe it's great if we can go through the schematics no matter
what! Although if we agree that CRU is an always-working default
option for all, then why don't we just enable TSADC for all, and leave
the conversion to GPIO-PMIC resets for later and for where it's
needed?

Great! We can surely go through the supported RK3588(s) boards
that make their schematics publicly available, and enable the
TSADC in their board dts files accordingly.

For the remaining RK3588(s) boards that remain "black boxes" to
us, we can enable the TSADC in their board dts files with the
let-the-CRU-handle-thermal-runaway defaults, and leave any future
refinements to the people interested in those boards.

That would be a rather clean approach, if you agree.