RE: [PATCH v2] EDAC/versalnet: Report PFN and page offset for DDR errors
From: Zhuo, Qiuxu
Date: Fri May 08 2026 - 09:38:30 EST
Hi Shubhrajyoti,
> From: Shubhrajyoti Datta <shubhrajyoti.datta@xxxxxxx>
> Sent: Friday, May 8, 2026 6:42 PM
> To: linux-kernel@xxxxxxxxxxxxxxx; linux-edac@xxxxxxxxxxxxxxx
> Cc: git@xxxxxxx; shubhrajyoti.datta@xxxxxxxxx; Borislav Petkov
> <bp@xxxxxxxxx>; Luck, Tony <tony.luck@xxxxxxxxx>; Sai Krishna Potthuri
> <sai.krishna.potthuri@xxxxxxx>; Shubhrajyoti Datta
> <shubhrajyoti.datta@xxxxxxx>
> Subject: [PATCH v2] EDAC/versalnet: Report PFN and page offset for DDR
> errors
>
> Populate the EDAC location fields when reporting DDRMC CE/UE events.
>
> Compute the physical address from the decoded error info and pass the PFN
> along with the offset-within-page to edac_mc_handle_error(). This allows
> user space (mc_event consumers) to correlate memory errors with a real
> address.
>
> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xxxxxxx>
> ---
>
> Changes in v2:
> Refactor the common code
>
> drivers/edac/versalnet_edac.c | 34 ++++++++++++++++------------------
> 1 file changed, 16 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/edac/versalnet_edac.c b/drivers/edac/versalnet_edac.c
> index ec1315582414..85d1a829a8aa 100644
> --- a/drivers/edac/versalnet_edac.c
> +++ b/drivers/edac/versalnet_edac.c
> @@ -429,6 +429,7 @@ static unsigned long convert_to_physical(struct
> mc_priv *priv, static void handle_error(struct mc_priv *priv, struct ecc_status
> *stat,
> int ctl_num, int *error_data)
> {
> + enum hw_event_mc_err_type type;
> union ecc_error_info pinf;
> struct mem_ctl_info *mci;
> unsigned long pa;
> @@ -440,29 +441,26 @@ static void handle_error(struct mc_priv *priv, struct
> ecc_status *stat,
>
> mci = priv->mci[ctl_num];
>
> - if (stat->error_type == MC5_ERR_TYPE_CE) {
> + if (stat->error_type == MC5_ERR_TYPE_UE) {
> + pinf = stat->ueinfo[stat->channel];
> + type = HW_EVENT_ERR_UNCORRECTED;
> + } else {
> pinf = stat->ceinfo[stat->channel];
> - snprintf(priv->message, sizeof(priv->message),
> - "Error type:%s Controller %d Addr at %lx\n",
> - "CE", ctl_num, convert_to_physical(priv, pinf,
> ctl_num, error_data));
> -
> - edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
> - 1, 0, 0, 0, 0, 0, -1,
> - priv->message, "");
> + type = HW_EVENT_ERR_CORRECTED;
> }
>
> - if (stat->error_type == MC5_ERR_TYPE_UE) {
> - pinf = stat->ueinfo[stat->channel];
> - snprintf(priv->message, sizeof(priv->message),
> - "Error type:%s controller %d Addr at %lx\n",
> - "UE", ctl_num, convert_to_physical(priv, pinf,
> ctl_num, error_data));
> + pa = convert_to_physical(priv, pinf, ctl_num, error_data);
I noticed that the return comment of convert_to_physical() says "physical address of the
DDR memory" [1]. I'm just curious if this refers to:
- the DDR address space (DRAM-only, without MMIO holes), or
- the system physical address space (including MMIO holes)
If it's the former, it may need to convert it to the latter one by properly adding
back the MMIO hole(s), since the OS uses system physical addresses.
[1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/edac/versal_edac.c#n363
> + pfn = PHYS_PFN(pa);
[...]