Re: [PATCH] EDAC, ghes: Fix locking and memory barrier issues

From: Robert Richter
Date: Tue Nov 05 2019 - 09:18:04 EST


On 05.11.19 12:05:11, Borislav Petkov wrote:
> On Fri, Oct 25, 2019 at 09:13:14PM +0000, Robert Richter wrote:
> > The ghes registration and refcount is broken in several ways:
> >
> > * ghes_edac_register() returns with success for a 2nd instance even
> > if a first instance is still running.
>
> How?
>
> There's
>
> if (atomic_inc_return(&ghes_init) > 1)
> return 0;
>
> there. How would a second instance bypass this?
>
> > * The refcount was increased even if a registration failed. This
> > leads to stale counters preventing the device from being released.
>
> That I see - the return path should dec ghes_init.

Here the bypass:

1) Entering ghes_edac_register(), instance #1

2) ghes_init atomicly set to 1, instance #1

3) Entering ghes_edac_register(), instance #2

4) ghes_init atomicly set to 2, instance #2

5) Leaving ghes_edac_register() with success, instance #2, note: First
instance not yet done.

6) Leaving ghes_edac_register() with error, instance #1.

There are the following issues with this now:

1) Refcount is 2, though it should be zero as init failed with an
error.

ghes_edac_register(), instance #2 returned already with success even
though:

2) ghes initialization was not yet done, and

3) ghes initialization is going to fail.

>
> > * The ghes refcount may not be decremented properly on
> > unregistration. Always decrement the refcount once
> > ghes_edac_unregister() is called to keep the refcount sane.
>
> Right.
>
> > * The ghes_pvt pointer is handed to the irq handler before
> > registration finished.
> >
> > * The mci structure could be freed while the irq handler is running.
> >
> > Fix this by adding a mutex to ghes_edac_register(). This mutex
> > serializes instances to register and unregister. The refcount is only
> > increased if the registration succeeded. This makes sure the refcount
> > is in a consistent state after registering or unregistering a device.
> > Note: A spinlock cannot be used here as the code section may sleep.
> >
> > The ghes_pvt is protected by ghes_lock now.
>
> This better be documented in the driver with a comment above the
> ghes_pvt thing.

Ok, will add a comment there.

>
> I'm assuming the support for multiple instances is going ontop of this?
> If so, ghes_pvt needs to be an array or so. Also, if you do that, I
> think you should use mc_devices - see edac_mc_find() et al - instead of
> growing a special one just for this driver.

Right, there will be a parent device created for each instance. But an
array is not required as this also can be part of the private data.
Only some sort of list head is needed to collect them all.

>
> > This ensures the pointer
> > is not updated before registration was finished or while the irq
> > handler is running. It is unset before unregistering the device
> > including necessary (implicit) memory barriers making the changes
> > visible to other cpus. Thus, the device can not be used anymore by an
> > interrupt.
> >
> > A refcount is needed. There can be multiple GHES structures being
> > defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error
> > Source, "Some platforms may describe multiple Generic Hardware Error
> > Source structures with different notification types, ...").
> >
> > Another approach to use the mci's device refcount (get_device()) and
> > have a release function does not work here. A release function will be
> > called only for device_release() with the last put_device() call. The
> > device must be deleted *before* that with device_del(). This is only
> > possible by maintaining an own refcount.
> >
> > Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller")
> > Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path")
> > Signed-off-by: Borislav Petkov <bp@xxxxxxx>
> > Signed-off-by: James Morse <james.morse@xxxxxxx>
> > Signed-off-by: Robert Richter <rrichter@xxxxxxxxxxx>
> > ---
> > drivers/edac/ghes_edac.c | 78 ++++++++++++++++++++++++++++------------
> > 1 file changed, 56 insertions(+), 22 deletions(-)
>
> ...
>
> > @@ -457,10 +461,12 @@ static struct acpi_platform_list plat_list[] = {
> > int ghes_edac_register(struct ghes *ghes, struct device *dev)
> > {
> > bool fake = false;
> > - int rc, num_dimm = 0;
> > + int rc = 0, num_dimm = 0;
> > struct mem_ctl_info *mci;
> > + struct ghes_edac_pvt *pvt;
> > struct edac_mc_layer layers[1];
> > struct ghes_edac_dimm_fill dimm_fill;
> > + unsigned long flags;
> > int idx = -1;
> >
> > if (IS_ENABLED(CONFIG_X86)) {
> > @@ -472,11 +478,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> > idx = 0;
> > }
> >
> > + /* finish another registration/unregistration instance first */
> > + mutex_lock(&ghes_reg_mutex);
> > +
> > /*
> > * We have only one logical memory controller to which all DIMMs belong.
> > */
> > - if (atomic_inc_return(&ghes_init) > 1)
> > - return 0;
> > + if (atomic_inc_not_zero(&ghes_init))
>
> That should probably be called ghes_instances now to make it obvious
> what it is.

I already thought of renaming this to ghes_refcount.

>
> Also, you can make it a normal variable now since it is being modified
> under the mutex only.

I would rather avoid an own implementation of reference counting and
also better switch from atomic_* to refcount_* which also provides
range checks. There is no benefit doing this our own.

Any objections here for the renaming and using the refcount API?

>
> > + goto unlock;
> >
> > /* Get the number of DIMMs */
> > dmi_walk(ghes_edac_count_dimms, &num_dimm);
> > @@ -494,12 +503,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> > mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
> > if (!mci) {
> > pr_info("Can't allocate memory for EDAC data\n");
> > - return -ENOMEM;
> > + rc = -ENOMEM;
> > + goto unlock;
> > }
> >
> > - ghes_pvt = mci->pvt_info;
> > - ghes_pvt->ghes = ghes;
> > - ghes_pvt->mci = mci;
> > + pvt = mci->pvt_info;
> > + pvt->ghes = ghes;
> > + pvt->mci = mci;
> >
> > mci->pdev = dev;
> > mci->mtype_cap = MEM_FLAG_EMPTY;
> > @@ -541,23 +551,47 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> > if (rc < 0) {
> > pr_info("Can't register at EDAC core\n");
> > edac_mc_free(mci);
> > - return -ENODEV;
> > + rc = -ENODEV;
>
> This needs to "goto unlock".

Umm, correct.

>
> > }
> > - return 0;
> > +
> > + spin_lock_irqsave(&ghes_lock, flags);
> > + ghes_pvt = pvt;
> > + spin_unlock_irqrestore(&ghes_lock, flags);
> > +
> > + /* only increment on success */
> > + atomic_inc(&ghes_init);
> > +
> > +unlock:
> > + mutex_unlock(&ghes_reg_mutex);
> > +
> > + return rc;
> > }

Will, repost. Thanks for review.

-Robert