Re: [PATCH] Raise maximum number of memory controllers

From: Luck, Tony
Date: Thu Sep 27 2018 - 17:46:06 EST


On Thu, Sep 27, 2018 at 06:52:44AM +0200, Borislav Petkov wrote:
> On Wed, Sep 26, 2018 at 04:02:57PM -0700, Luck, Tony wrote:
> > But ... we are at -rc5. Not sure that we'll figure out, write, test & debug
> > the proper solution in the next 3-4 weeks. So perhaps we should apply
> >
> > -#define EDAC_MAX_MCS 16
> > +#define EDAC_MAX_MCS 64
> >
> > as a temporary band-aid to get HPE's 32-socket machine running while
> > we work on the proper fix?
>
> Yeah, after sleeping on it I see it the same way - band-aid it now and
> clean it up properly later.

The problem with your patch that gets rid of EDAC_MAX_MCS is making
device links under /sys/bus/edac. Which is hinted at in some of the
code your patch deleted:

- /*
- * The memory controller needs its own bus, in order to avoid
- * namespace conflicts at /sys/bus/edac.
- */
- name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
- if (!name)
- return -ENOMEM;
-
- mci->bus->name = name;
-
- edac_dbg(0, "creating bus %s\n", mci->bus->name);
-
- err = bus_register(mci->bus);

Just to see if there was anything else wrong I added a patch to
make the names unique:


diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 2ca2012f2857..6ec6d8a2adb8 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -410,7 +410,7 @@ static int edac_create_csrow_object(struct mem_ctl_info *mci,
device_initialize(&csrow->dev);
csrow->dev.parent = &mci->dev;
csrow->mci = mci;
- dev_set_name(&csrow->dev, "csrow%d", index);
+ dev_set_name(&csrow->dev, "mci%d_csrow%d", mci->mc_idx, index);
dev_set_drvdata(&csrow->dev, csrow);

edac_dbg(0, "creating (virtual) csrow node %s\n",
@@ -641,9 +641,9 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,

dimm->dev.parent = &mci->dev;
if (mci->csbased)
- dev_set_name(&dimm->dev, "rank%d", index);
+ dev_set_name(&dimm->dev, "mci%d_rank%d", mci->mc_idx, index);
else
- dev_set_name(&dimm->dev, "dimm%d", index);
+ dev_set_name(&dimm->dev, "mci%d_dimm%d", mci->mc_idx, index);
dev_set_drvdata(&dimm->dev, dimm);
pm_runtime_forbid(&mci->dev);


which seemed to work. But then I began wondering what are ABI expectations
from applications that read the EDAC /sys files?

Is this this current source repository? https://github.com/grondo/edac-utils

This code doesn't seem to know about the "dimm*" directories below the
"mc*" level. It just looks for the csrow* entries.

-Tony