Re: [PATCH 17/63] edac_mce: Add an interface driver to report mceerrors via edac

From: Mauro Carvalho Chehab
Date: Fri Sep 25 2009 - 08:12:20 EST


Boris,

Em Fri, 25 Sep 2009 11:48:55 +0200
Borislav Petkov <borislav.petkov@xxxxxxx> escreveu:

> > entry = rcu_dereference(mcelog.next);
> > for (;;) {
> > /*
> > + * If edac_mce is enabled, it will check the error type
> > + * and will process it, if it is a known error.
> > + * Otherwise, the error will be sent through mcelog
> > + * interface
> > + */
> > + if (edac_mce_parse(mce))
> > + return;
>
> for the third time (!): this may run in NMI context and as such does not
> obey to normal kernel locking rules and you cannot safely use almost any
> kernel resources involving locking. This way, your hook calls into a
> module, which is a very bad idea. Please remove that hook and put in the
> polling routine or somewhere more appropriate.

I had answered you already, but let me give a more complete explanation.

For sure all the code called at this point should be carefully analyzed. So,
let's see the complete implementation:

1) edac_mce is not a module (see patch 18). So, just calling a routine on
edac_mce should be safe, even at NMI;

2) While there's nobody registered on edac_mce, it will just return 0, letting
the code proceed. There's no lock there. The called code is as simple as:

int edac_mce_parse(struct mce *mce)
{
struct edac_mce *edac_mce;

list_for_each_entry(edac_mce, &edac_mce_list, list) {
if (edac_mce->check_error(edac_mce->priv, mce))
return 1;
}

/* Nobody queued the error */
return 0;
}

3) i7core_edac will only start handling mce events after being loaded on memory
and registered on edac_mce. If an error occurs before it, normal mce handling
will happen;

4) after registered, edac_mce will call this hook, at i7core_edac:

static int i7core_mce_check_error(void *priv, struct mce *mce)
{
struct mem_ctl_info *mci = priv;
struct i7core_pvt *pvt = mci->pvt_info;
unsigned long flags;

/*
* Just let mcelog handle it if the error is
* outside the memory controller
*/
if (((mce->status & 0xffff) >> 7) != 1)
return 0;

/* Bank 8 registers are the only ones that we know how to handle */
if (mce->bank != 8)
return 0;

/* Only handle if it is the right mc controller */
if (cpu_data(mce->cpu).phys_proc_id != pvt->i7core_dev->socket) {
debugf0("mc%d: ignoring mce log for socket %d. "
"Another mc should get it.\n",
pvt->i7core_dev->socket,
cpu_data(mce->cpu).phys_proc_id);
return 0;
}

spin_lock_irqsave(&pvt->mce_lock, flags);
if (pvt->mce_count < MCE_LOG_LEN) {
memcpy(&pvt->mce_entry[pvt->mce_count], mce, sizeof(*mce));
pvt->mce_count++;
}
spin_unlock_irqrestore(&pvt->mce_lock, flags);

/* Handle fatal errors immediately */
if (mce->mcgstatus & 1)
i7core_check_error(mci);

/* Advice mcelog that the error were handled */
return 1;
}

It will basically check if the error is related to the MC, returning if not. If
the error is for the mc, it will copy the error inside a buffer on i7core_mce.

In this part, it is currently using a spinlock. It might have used an RCU code
instead, but I opted to use a simpler approach, since, on all tests, this worked fine.

Iff the error is a fatal error (Uncorrected Error), it will call the EDAC UE routine
to to display the error. For all Corrected Errors, like the mce driver, the code that
will parse the error will be called outside NMI, by edac polling code.

5) there are very few calls to kernel routines here: list_for_each_entry(),
spin_lock_irqsave(), in_unlock_irqrestore(), memcpy(). I don't think any of them will
have any problems on working even inside NMI context.

--

Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/