Re: [PATCH V10 RESEND] EDAC: Add EDAC driver for loongson memory controller

From: Zhao Qunqin
Date: Mon Dec 16 2024 - 21:26:22 EST



在 2024/12/16 下午7:55, Borislav Petkov 写道:
On Mon, Dec 16, 2024 at 09:33:51AM +0800, Zhao Qunqin wrote:
+LOONGSON EDAC DRIVER
+M: Zhao Qunqin <zhaoqunqin@xxxxxxxxxxx>
+L: linux-edac@xxxxxxxxxxxxxxx
+S: Maintained
+F: drivers/edac/loongson_edac.c
If you add yourself as a maintainer, I'd expect you to review and/or ack
patches for your driver so that I can pick them up.
OK. I can review the patches for this driver.

+config EDAC_LOONGSON
+ tristate "Loongson Memory Controller"
+ depends on (LOONGARCH && ACPI) || COMPILE_TEST
The COMPILE_TEST thing would mean that you'll make sure this driver always
builds on other arches and it doesn't break randconfig builds of people. If it
happens too often and no one is fixing it, I'll remove the COMPILE_TEST.
There have indeed been build errors on other arches. So i will remove it.
+ help
+ Support for error detection and correction on the Loongson
+ family memory controller. This driver reports single bit
+ errors (CE) only. Loongson-3A5000/3C5000/3D5000/3A6000/3C6000
+ are compatible.
endif # EDAC
+static int read_ecc(struct mem_ctl_info *mci)
+{
+ struct loongson_edac_pvt *pvt = mci->pvt_info;
+ u64 ecc;
+ int cs;
+
+ if (!pvt->ecc_base)
When can that even happen? You're initializing it properly in pvt_init().
Will remove this check.
+ return pvt->last_ce_count;
+
+ ecc = readq(pvt->ecc_base + ECC_CS_COUNT_REG);
+ /* cs0 -- cs3 */
+ cs = ecc & 0xff;
+ cs += (ecc >> 8) & 0xff;
+ cs += (ecc >> 16) & 0xff;
+ cs += (ecc >> 24) & 0xff;
+
+ return cs;
+}
+
+static void edac_check(struct mem_ctl_info *mci)
+{
+ struct loongson_edac_pvt *pvt = mci->pvt_info;
+ int new, add;
+
+ new = read_ecc(mci);
+ add = new - pvt->last_ce_count;
+ pvt->last_ce_count = new;
That last_ce_count is just silly. Kill it.

Then  I  can't calculate the error count added since the last check,  cause

what record in Loongson's ECC register is  the error count  from reset of

the memory controller.

+ if (add <= 0)
+ return;
+
+ edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, add,
+ 0, 0, 0, 0, 0, -1, "error", "");
+ edac_mc_printk(mci, KERN_INFO, "add: %d", add);
"add"? What are you adding? Error count?

No need.
Will remove this printk.
+static int edac_probe(struct platform_device *pdev)
+{
+ struct edac_mc_layer layers[2];
+ struct mem_ctl_info *mci;
+ void __iomem *vbase;
+ int ret;
+
+ vbase = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(vbase))
+ return PTR_ERR(vbase);
+
+ /* allocate a new MC control structure */
+ layers[0].type = EDAC_MC_LAYER_CHANNEL;
+ layers[0].size = 1;
+ layers[0].is_virt_csrow = false;
+ layers[1].type = EDAC_MC_LAYER_SLOT;
+ layers[1].size = 1;
+ layers[1].is_virt_csrow = true;
+ mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers,
+ sizeof(struct loongson_edac_pvt));
+ if (mci == NULL)
+ return -ENOMEM;
+
+ mci->mc_idx = edac_device_alloc_index();
+ mci->mtype_cap = MEM_FLAG_RDDR4;
+ mci->edac_ctl_cap = EDAC_FLAG_NONE;
+ mci->edac_cap = EDAC_FLAG_NONE;
+ mci->mod_name = "loongson_edac.c";
+ mci->ctl_name = "loongson_edac_ctl";
+ mci->dev_name = "loongson_edac_dev";
+ mci->ctl_page_to_phys = NULL;
+ mci->pdev = &pdev->dev;
+ mci->error_desc.grain = 8;
+ /* Set the function pointer to an actual operation function */
Remove that obvious comment.

OK

Thanks for taking the time to review.

Best regards