[PATCH 09/10] IOCHK interface for I/O error handling/detecting

From: Hidetoshi Seto
Date: Thu Jun 09 2005 - 08:23:09 EST


[This is 9 of 10 patches, "iochk-09-cpeh.patch"]

- SAL behavior doesn't affect only MCA. There are other
chances to call SAL_GET_STATE_INFO, that's when CMC, CPE,
and INIT is happen.

- CMC(Corrected Machine Check) is for non-fatal, processor
local errors. Fortunately, calling SAL_GET_STATE_INFO for
CMC only collect data from a processor issued it, without
touching any bridge and its status. So, this is safe.

- CPE(Corrected Platform Error) is for non-fatal, platform
related errors. Even it says corrected, but calling
SAL procedure for CPE touchs every bridge on the platform,
and "correct" bridge status that's bad for iochk works.

- INIT is a kind of system reset request, as far as I know.
So restarting from INIT is out of design, also iochk after
INIT is not required at this time.

In short, only MCA and CPE have the problem of SAL behavior.
One of the difference from MCA is that SAL will not gather
data before OS actually request it.

MCA:
1) SAL gathers data and keep it internally
2) OS gets control
3) if OS requests, SAL returns data gathered at beginning.
CPE:
1) OS gets control
2) OS request to SAL
3) SAL gathers data and return it to OS

Therefore, we can make CPE handler to care bridge states,
to check states before calling SAL procedure.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@xxxxxxxxxxxxxx>

---

arch/ia64/kernel/mca.c | 21 +++++++++++++++++++++
arch/ia64/lib/iomap_check.c | 17 +++++++++++++++++
2 files changed, 38 insertions(+)

Index: linux-2.6.11.11/arch/ia64/lib/iomap_check.c
===================================================================
--- linux-2.6.11.11.orig/arch/ia64/lib/iomap_check.c
+++ linux-2.6.11.11/arch/ia64/lib/iomap_check.c
@@ -19,6 +19,7 @@ static int have_error(struct pci_dev *de

void notify_bridge_error(struct pci_dev *bridge);
void clear_bridge_error(struct pci_dev *bridge);
+void save_bridge_error(void);

void iochk_init(void)
{
@@ -143,6 +144,22 @@ void clear_bridge_error(struct pci_dev *
}
}

+void save_bridge_error(void)
+{
+ iocookie *cookie;
+
+ if (list_empty(&iochk_devices))
+ return;
+
+ /* mark devices if its root bus bridge have errors */
+ list_for_each_entry(cookie, &iochk_devices, list) {
+ if (cookie->error)
+ continue;
+ if (have_error(cookie->host))
+ notify_bridge_error(cookie->host);
+ }
+}
+
EXPORT_SYMBOL(iochk_read);
EXPORT_SYMBOL(iochk_clear);
EXPORT_SYMBOL(iochk_devices); /* for MCA driver */
Index: linux-2.6.11.11/arch/ia64/kernel/mca.c
===================================================================
--- linux-2.6.11.11.orig/arch/ia64/kernel/mca.c
+++ linux-2.6.11.11/arch/ia64/kernel/mca.c
@@ -80,6 +80,8 @@
#ifdef CONFIG_IOMAP_CHECK
#include <linux/pci.h>
extern void notify_bridge_error(struct pci_dev *bridge);
+extern void save_bridge_error(void);
+extern spinlock_t iochk_lock;
#endif

#if defined(IA64_MCA_DEBUG_INFO)
@@ -288,11 +290,30 @@ ia64_mca_cpe_int_handler (int cpe_irq, v
IA64_MCA_DEBUG("%s: received interrupt vector = %#x on CPU %d\n",
__FUNCTION__, cpe_irq, smp_processor_id());

+#ifndef CONFIG_IOMAP_CHECK
+
/* SAL spec states this should run w/ interrupts enabled */
local_irq_enable();

/* Get the CPE error record and log it */
ia64_mca_log_sal_error_record(SAL_INFO_TYPE_CPE);
+#else
+ /*
+ * Because SAL_GET_STATE_INFO for CPE might clear bridge states
+ * in process of gathering error information from the system,
+ * we should check the states before clearing it.
+ * While OS and SAL are handling bridge status, we have to protect
+ * the states from changing by any other I/Os running simultaneously,
+ * so this should be handled w/ lock and interrupts disabled.
+ */
+ spin_lock(&iochk_lock);
+ save_bridge_error();
+ ia64_mca_log_sal_error_record(SAL_INFO_TYPE_CPE);
+ spin_unlock(&iochk_lock);
+
+ /* Rests can go w/ interrupt enabled as usual */
+ local_irq_enable();
+#endif

spin_lock(&cpe_history_lock);
if (!cpe_poll_enabled && cpe_vector >= 0) {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/