Re: [PATCH v2] thermal: Add support for device tree thermal zones consumers

From: AngeloGioacchino Del Regno
Date: Tue Dec 05 2023 - 08:48:51 EST


Il 01/12/23 15:18, Daniel Lezcano ha scritto:

Hi Angelo,

On 01/12/2023 10:52, AngeloGioacchino Del Regno wrote:
Il 30/11/23 14:22, Daniel Lezcano ha scritto:

Hi Angelo,

thanks for your proposal

On 15/11/2023 15:48, AngeloGioacchino Del Regno wrote:
Add helpers to support retrieving thermal zones from device tree nodes:
this will allow a device tree consumer to specify phandles to specific
thermal zone(s), including support for specifying thermal-zone-names.
This is useful, for example, for smart voltage scaling drivers that
need to adjust CPU/GPU/other voltages based on temperature, and for
battery charging drivers that need to scale current based on various
aggregated temperature sensor readings which are board-dependant.

IMO these changes are trying to solve something from the DT perspective adding more confusion between phandle, names, types etc ... and it does not really help AFAICT.


I honestly don't see how can assigning thermal zones (like we're doing for other
consumers like clocks, etc) to a node can be confusing?
To me, it looks like a pattern that is repeating over and over in device tree, for
multiple types of consumers.

Because there is no need to add anything. Everything is already available.

Add a phandle in the device node wanting to access the thermal zone, get the thermal zone device node pointer name and use thermal_zone_device_get_by_name(), but see below ...


Overall I'm a bit reluctant to add more API in the thermal.h. From my POV, we should try to remove as much as possible functions from there.


Cleaning up the API is always something that makes sense, but I don't see why this
should prevent useful additions...

That said, the name of a thermal zone does not really exists and there is confusion in the code between a name and a type. (type being assumed to be a name).

That depends on how you see it. What my brain ticks around is:
A thermal zone is a physical zone on the PCB, or a physical zone on a chip,
which has its own "real name", as in, it can be physically identified.

What I meant the thermal framework does not really have a thermal zone name, just a type. So it is possible to find several thermal zone with the same type like "acpitz"

Example: The "Skin area" of a laptop is something "reachable" from the user as an
externally exposed part. This area's temperature is read by thermistor EXTERNAL_1,
not by thermistor "SKIN0".

Same goes for "big cluster area", "little cluster area", "cpu complex area", etc.

Today that is solved with a configuration file mapping a specific thermal zone to a name but still fragile as we can have duplicate thermal zone types.

There could be several thermal zones with the same types for non-DT description. However, the documentation says we should create an unique type in the DT and that is what is happening when registering a thermal zone from the DT [1] as we use the node name.

 From an external driver, it possible to get the np->name from the phandles and call thermal_zone_get_by_name(np->name).


That'd still require you to pass a thermal zone phandle to the node(driver) though?

Yes

The hardening change which may make sense is to check a thermal zone with the same name is not already registered in thermal_of.c by checking thermal_zone_get_by_name() fails before registering it.


Yes we can harden that, but I don't see how is this relevant to thermal zones
device tree consumers (proposed in this patch)?

Putting apart the fact the change you propose is not relevant as there is already everything in. My comment is about the current state of the thermal framework.


I don't really understand this assertion, and I'm afraid that I'm underestimating
something so, in case, please help me to understand what am I missing here.

For how I see it, in the thermal framewoek I don't see any "somewhat standardized"
helper like the one(s) that I'm introducing with this patch (thermal_of_get_zone(),
thermal_of_get_zone_by_index()), and this is the exact reason why I'm proposing
this patch.

Then again - I mean no disrespect - it's just that I don't understand (yet) why you
are saying that "everything is already available", because I really don't see it.

 - A thermal zone does not have a name but a type

 - We use the thermal zone DT node name to register as a name but it is a type from the thermal framework point of view

This is something that I didn't realize before. Thanks for that.

...and yes, we're registering a "name" from DT as a "type" in the framework, this
is highly confusing and needs to be cleaned up.


 - We can register several thermal zones with the same type (so we can have duplicate names if we use type as name)


...which makes sense, after realizing that we're registering a TYPE and not a NAME,
and I agree about the logic for which that multiple zones can be of the same type.

 - We use thermal_zone_device_get_by_name() but actually it checks against the type and as we can have multiple identical types, the function returns the first one found


The first thing that comes to mind is to rename thermal_zone_device_get_by_name()
to thermal_zone_device_get_by_type(), but then I'd be reintroducing the former and
this gives me concerns about OOT drivers using that and developers getting highly
confused (name->type, but name exists again, so they might erroneously just fix the
call to xxx_by_name() instead of changing it to xxx_by_type()).

Should I *not* be concerned about that? Any suggestion?


I'd be glad to go on and "make it clear" that we're doing type comparison and not
name comparison (with that rename, or similar), because (again) I see how confusing
that is. I was confused by that as well, so... :-)

All this is a bit fuzzy and confusing. So if you add these mapping between thermal zone nodes and names, that will be even more confusing.


IMO, not really. The thermal-zone-names are "local to a driver", not to the thermal
framework itself... it's like for clocks, interrupts, etc.: you want to get a TZ
that is declared with name "xyz", but it doesn't matter what the real name of the
actual TZ actually is.

Since I'm not sure I expressed myself in the best possible way, I'm referring to
the following example:

clock-names = "main";

...but the "real name" for the clock in the clk framework is "mfg_bg3d".

That's the same with what I'm introducing here (forget for just one moment that
there is this name<->type issue):

thermal-zone-names = "xyz";

...but the "real name" for the TZ in the thermal framework is "gpu0-thermal".

Ideally, it would make more sense to cleanup this in order to have something like an enum type describing the thermal zone (battery, cpu, npu, gpu, dsp, ...) which would be used as a type of thermal zone and then an unique name (cpu0, cpu1, modem0, modem1, gpu-bottom, gpu-top, gpu-center, skin, ...).


This might get more complicated than how it looks, but would actually make sense
as well: the concern would be about how do we cleanly declare (example, in DT, but
ACPI is the worst case, as ACPI tables are a "set and forget" type of thing,
shipped with BIOSes/EFI and almost never modified).

Cheers,
Angelo