Re: [PATCH 1/4] dt-bindings: misc: tmr-manager: Add device-tree binding for TMR Manager

From: Michal Simek
Date: Thu Jun 30 2022 - 06:59:25 EST




On 6/30/22 12:07, Krzysztof Kozlowski wrote:
On 29/06/2022 14:37, Rao, Appana Durga Kedareswara wrote:
Hi,


On 29/06/22 5:29 pm, Michal Simek wrote:


On 6/29/22 13:45, Krzysztof Kozlowski wrote:
On 29/06/2022 13:23, Michal Simek wrote:


On 6/29/22 12:07, Krzysztof Kozlowski wrote:
On 28/06/2022 07:43, Appana Durga Kedareswara rao wrote:
This commit adds documentation for Triple Modular Redundancy(TMR)
Manager
IP. The Triple Modular Redundancy(TMR) Manager is responsible for
handling
the TMR subsystem state, including fault detection and error recovery
provides soft error detection, correction and recovery features.

Signed-off-by: Appana Durga Kedareswara rao
<appana.durga.rao@xxxxxxxxxx>
---
.../bindings/misc/xlnx,tmr-manager.yaml | 48
+++++++++++++++++++

This is not a misc device. Find appropriate subsystem for it. It's not
EDAC, right?

We were thinking where to put it but it is not EDAC driver.
If you have better suggestion for subsystem please let us know.

I don't know what's the device about. The description does not help:

"TMR Manager is responsible for TMR subsystem state..."

ok. let's improve commit message in v2.

Sure will improve the commit message in v2.

TMR - triple module redundancy.

You design the system with one CPU which is default microblaze
configuration with interrupt controller, timer and other IPs.

And then say I want to do it triple redundant with all that voting, etc.
If you want to get all details you can take a look at this guide

https://www.xilinx.com/content/dam/xilinx/support/documents/ip_documentation/tmr/v1_0/pg268-tmr.pdf


In short TMR manager is servicing all that 3 cores and making sure that
they are all running in sync. If not it has capability recover the
system. It means cpu gets to break handler (it is the part of microblaze
series) and it restarts all cpus.

And TMR inject driver is module which is capable to inject error to
internal memory to cause the exception to exercise that recovery code.

Kedar: Feel free to correct me or add more details.

Thanks Michal for the detailed explanation.

The Triple Modular Redundancy(TMR) subsystem has three Microblaze
processor instances, If any one of the Microblaze processors goes to an
unknown state due to fault injection break handler will get called,
which in turn calls the tmr manager driver API to perform recovery.
like Michal said TMR inject driver is capable of inject error to
internal memory to cause fault in one the Microblaze processor

@Krzysztof : please let me know if more information required about
this TMR subsystem will provide.

Some features sound like watchdog.

watchdog needs to be keep alive which is not the case here. These systems are designed for safety or space applications and I am quite sure that there are going to be couple or regular watchdogs wired too.

If it was ARM, I would suggest to put
it under "soc". Is a term System-on-Chip applicable to Microblaze?

I have never seen microblaze in connection to SOC. You can make SOC based on microblaze cpu (using hard cores) but in most cases microblaze as soft cpu is loaded to fpga or to programmable logic.

Other
option is to store it under microblaze (although for ARM and RISC-V this
is actually discouraged in favor of soc).

Exactly. That was my main concern too that adding to microblaze is likely not the best location that's why we wanted to add it to different location.

Thanks,
Michal