[PATCH 2.6.13-rc1 06/10] IOCHK interface for I/O error handling/detecting

From: Hidetoshi Seto
Date: Wed Jul 06 2005 - 01:49:24 EST


[This is 6 of 10 patches, "iochk-06-mcanotify.patch"]

- This is a headache:
When ia64 get a problem on hardware, OS could request
SAL(System Abstraction Layer: ia64 firmware) to gather
system status via calling SAL_GET_STATE_INFO procedure.

However (depend on implementation of SAL for its platform,
hopefully), on the way of gathering, SAL also checks
every host bridges and its status, and after that, resets
the state...

So we should take care of this reset by SAL.

Handling MCA(Machine Check Abort) is one of a situation
should we take care. Originally MCA is designed as a
critical interruption, so when MCA comes, without OS's
order, SAL gathers system status before OS gets its control.
So since states of bridges are already reset on entrance of
MCA, OS should notify "lost of state" to all "check-in"
contexts, by marking its error flag, iocookie->error.

There would be better way if OS can know the bridge state
from data which SAL gathered, but in the meanwhile, I
just do simple way.

PCI-parity error is one of MCA causes, is it OK?
Next, "data poisoning" helps us... see next (7 of 10).

Changes from previous one for 2.6.11.11:
- (non)

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@xxxxxxxxxxxxxx>

---

arch/ia64/kernel/mca.c | 13 +++++++++++++
arch/ia64/lib/iomap_check.c | 7 ++++++-
2 files changed, 19 insertions(+), 1 deletion(-)

Index: linux-2.6.13-rc1/arch/ia64/kernel/mca.c
===================================================================
--- linux-2.6.13-rc1.orig/arch/ia64/kernel/mca.c
+++ linux-2.6.13-rc1/arch/ia64/kernel/mca.c
@@ -77,6 +77,11 @@
#include <asm/irq.h>
#include <asm/hw_irq.h>

+#ifdef CONFIG_IOMAP_CHECK
+#include <linux/pci.h>
+extern void notify_bridge_error(struct pci_dev *bridge);
+#endif
+
#if defined(IA64_MCA_DEBUG_INFO)
# define IA64_MCA_DEBUG(fmt...) printk(fmt)
#else
@@ -893,6 +898,14 @@ ia64_mca_ucmc_handler(void)
sal_log_record_header_t *rh = IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA);
rh->severity = sal_log_severity_corrected;
ia64_sal_clear_state_info(SAL_INFO_TYPE_MCA);
+
+#ifdef CONFIG_IOMAP_CHECK
+ /*
+ * SAL already reads and clears error bits on bridge registers,
+ * so we should have all running transactions to retry.
+ */
+ notify_bridge_error(0);
+#endif
}
/*
* Wakeup all the processors which are spinning in the rendezvous
Index: linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
===================================================================
--- linux-2.6.13-rc1.orig/arch/ia64/lib/iomap_check.c
+++ linux-2.6.13-rc1/arch/ia64/lib/iomap_check.c
@@ -111,7 +111,12 @@ void notify_bridge_error(struct pci_dev
return;

/* notify error to all transactions using this host bridge */
- if (bridge) {
+ if (!bridge) {
+ /* global notify, ex. MCA */
+ list_for_each_entry(cookie, &iochk_devices, list) {
+ cookie->error = 1;
+ }
+ } else {
/* local notify, ex. Parity, Abort etc. */
list_for_each_entry(cookie, &iochk_devices, list) {
if (cookie->host == bridge)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/