> Hi,
>
> > I think that unless we come up with some ideas on how to implement this,
> > we should drop it.
> >
> > 1. How are we going to implement issues like timings? Think about the
> > recent TCP-stall problem for one.
>
> Timings and race conditions are difficult to test. Another problem is
> that we'll hardly will be able to cover all the possible cases. If
> you try to check for things like for example the handling of hardware
> errors (r/w errors on disks, ...) you'll have to add special code to
> "inject" errors.
Being given that any untested code has every chance to not work properly,
"inject" errors technics is often used in order to perform basic
error recovery testing in pieces of code that deal with the hardware.
Until today, I removed the "inject" errors code before releasing a
version of the BSD ported ncr driver. This interesting thread and
especially your posting let me decide to now release driver versions
with this code embedded, but obviously not compiled by default.
I just add some documentation about the "inject" errors code and here
is a pre-patch of the next version of the driver I will release.
-- Gerard.(This patch is against driver version 1.18d)
--- linux/drivers/scsi/ChangeLog.ncr53c8xx.1.18d Sun Apr 13 17:45:46 1997 +++ linux/drivers/scsi/ChangeLog.ncr53c8xx Mon Apr 14 08:20:13 1997 @@ -1,3 +1,14 @@ +Sun Apr 14 9:00 1997 Gerard Roudier (groudier@club-internet.fr) + * revision 1.18e + - incorporate in the driver, the code used for error recovery + testing. This code is normally not compiled; have to define + SCSI_NCR_DEBUG_ERROR_RECOVERY in order to compile it. + - reset all when an unexpected data cycle is detected while + disconnecting. + - make changes to abort() ans reset() functions according to + Leonard's documentation. + - small fix in some message for hard errors. + Sat Apr 5 13:00 1997 Gerard Roudier (groudier@club-internet.fr) * revision 1.18d - Probe NCR pci device ids in reverse order if asked by user from --- linux/drivers/scsi/README.ncr53c8xx.1.18d Sun Apr 13 17:46:14 1997 +++ linux/drivers/scsi/README.ncr53c8xx Mon Apr 14 08:19:50 1997 @@ -1,10 +1,10 @@ -The linux NCR53C8XX driver README file +The Linux NCR53C8XX driver README file Written by Gerard Roudier <groudier@club-internet.fr> 21 Rue Carnot 95170 DEUIL LA BARRE - FRANCE -6 April 1997 +14 April 1997 =============================================================================== 1. Introduction @@ -22,6 +22,7 @@ 8.5 Set debug mode 8.6 Clear profile counters 8.7 Set flag (no_sync) + 8.8 Debug error recovery 9. Configuration parameters 10. Boot setup commands 10.1 Syntax @@ -203,7 +204,6 @@ IO port address 0x6000, IRQ number 10 Using memory mapped IO at virtual address 0x282c000 Synchronous transfer period 25, max commands per lun 4 - Profiling information: num_trans = 18014 num_kbytes = 671314 @@ -390,6 +390,28 @@ - setflag all will allow disconnection for all devices on the SCSI bus. + +8.8 Debug error recovery + + debug_error_recovery <error to trigger> + + Available error type to trigger: + sge: SCSI gross error + abort: abort command from the middle-level driver + reset: reset command from the middle-level driver + none: restore driver normal behaviour + + The code corresponding to this feature is normally not compiled. + Its purpose is driver testing only. In order to compile the code + that allows to trigger error recovery you must define at compile time + SCSI_NCR_DEBUG_ERROR_RECOVERY. + If you have compiled the driver with this option, nothing will happen + as long as you donnot use the control command 'debug_error_recovery' + with sge, abort or reset as argument. + If you select an error type, it will be triggered by the driver every + 30 seconds. + + 9. Configuration parameters If the firmware of all your devices is perfect enough, all the @@ -736,9 +758,9 @@ You must untar the distribution with the following command: - tar zxvf ncrBsd2Linux-1.18d-src.tar.gz + tar zxvf ncrBsd2Linux-1.18e-src.tar.gz -The sub-directory ncr53c8xx-1.18d will be created. Change to this directory. +The sub-directory ncr53c8xx-1.18e will be created. Change to this directory. 12.2 Installation procedure --- linux/drivers/scsi/ncr53c8xx.h.1.18d Sat Apr 5 12:38:51 1997 +++ linux/drivers/scsi/ncr53c8xx.h Sun Apr 13 18:39:50 1997 @@ -45,7 +45,7 @@ /* ** Name and revision of the driver */ -#define SCSI_NCR_DRIVER_NAME "ncr53c8xx - revision 1.18d" +#define SCSI_NCR_DRIVER_NAME "ncr53c8xx - revision 1.18e" /* ** If SCSI_NCR_SETUP_SPECIAL_FEATURES is defined, --- linux/drivers/scsi/ncr53c8xx.c.1.18d Sun Apr 6 21:08:15 1997 +++ linux/drivers/scsi/ncr53c8xx.c Sun Apr 13 18:41:31 1997 @@ -40,7 +40,7 @@ */ /* -** 6 April 1997, version 1.18d +** 12 April 1997, version 1.18e ** ** Supported SCSI-II features: ** Synchronous negotiation @@ -537,7 +537,29 @@ #define DEBUG_FLAGS SCSI_NCR_DEBUG_FLAGS #endif - +/* +** Embedded error recovery debug code. +** +** This code is conditionned by SCSI_NCR_DEBUG_ERROR_RECOVERY. +** It only can be enabled after boot-up with a control command. +** +** Every 30 seconds the timer handler of the driver decides to +** change the behaviour of the driver in order to trigger errors. +** +** If last command was "debug_error_recovery sge", the driver sets sync +** offset of all targets that use sync transfers to 2, and so will get +** a SCSI gross error at the next read operation. +** +** If last command was "debug_error_recovery abort", the driver does +** not signal new scsi commands to the script processor, until it is asked +** to abort or reset a command by the mid-level driver. +** +** If last command was "debug_error_recovery reset", the driver does +** not signal new scsi commands to the script processor, until it is asked +** to reset a command by the mid-level driver. +** +** The command "debug_error_recovery none" makes the driver behave normaly. +*/ /*========================================================== ** @@ -798,6 +820,10 @@ #define UC_SETFLAG 15 #define UC_CLEARPROF 16 +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY +#define UC_DEBUG_ERROR_RECOVERY 17 +#endif + #define UF_TRACE (0x01) #define UF_NODISC (0x02) @@ -1347,6 +1373,10 @@ /* that we can't put into the squeue */ u_long settle_time; /* Reset in progess */ u_char release_stage; /* Synchronisation stage on release */ +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + u_char stalling; + u_char debug_error_recovery; +#endif /*----------------------------------------------- ** Added field to support differences @@ -4463,6 +4493,9 @@ ** Script processor may be waiting for reselect. ** Wake it up. */ +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + if (!np->stalling) +#endif OUTB (nc_istat, SIGP); /* @@ -4520,7 +4553,7 @@ ** **========================================================== */ -int ncr_reset_bus (Scsi_Cmnd *cmd) +int ncr_reset_bus (Scsi_Cmnd *cmd, int sync_reset) { struct Scsi_Host *host = cmd->host; /* Scsi_Device *device = cmd->device; */ @@ -4530,6 +4563,11 @@ u_long flags; int found; +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + if (np->stalling) + np->stalling = 0; +#endif + save_flags(flags); cli(); /* * Return immediately if reset is in progress. @@ -4572,11 +4610,12 @@ */ ncr_wakeup(np, HS_RESET); /* - * If the involved command was not in a driver queue, and is - * not in the waiting list, complete it with DID_RESET status, + * If the involved command was not in a driver queue, and the + * scsi driver told us reset is synchronous, and the command is not + * currently in the waiting list, complete it with DID_RESET status, * in order to keep it alive. */ - if (!found && cmd && !remove_from_waiting_list(np, cmd)) { + if (!found && sync_reset && !retrieve_from_waiting_list(0, np, cmd)) { cmd->result = ScsiResult(DID_RESET, 0); cmd->scsi_done(cmd); } @@ -4606,6 +4645,11 @@ int found; int retv; +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + if (np->stalling == 2) + np->stalling = 0; +#endif + save_flags(flags); cli(); /* * First, look for the scsi command in the waiting list @@ -4662,6 +4706,9 @@ ** processor will sleep on SEL_WAIT_RESEL. ** Let's wake it up, since it may have to work. */ +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + if (!np->stalling) +#endif OUTB (nc_istat, SIGP); restore_flags(flags); @@ -5928,6 +5975,11 @@ case UC_CLEARPROF: bzero(&np->profile, sizeof(np->profile)); break; +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + case UC_DEBUG_ERROR_RECOVERY: + np->debug_error_recovery = np->user.data; + break; +#endif } np->user.cmd=0; } @@ -5977,6 +6029,54 @@ add_timer(&np->timer); /* + ** If SCSI_NCR_DEBUG_ERROR_RECOVERY is defined, we want to + ** simulate common errors in order to test error recovery. + */ +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + do { + static u_long start = 0l; + static u_long last = 0l; + + if (!np->debug_error_recovery) + break; + if (!start) { + start = jiffies; + last = start; + } + if (jiffies < last + 30*HZ) + break; + last = jiffies; + /* + * This one triggers SCSI gross errors. + */ + if (np->debug_error_recovery == 1) { + int i; + printf("%s: testing error recovery from SCSI gross error...\n", ncr_name(np)); + for (i = 0 ; i < MAX_TARGET ; i++) { + if (np->target[i].sval & 0x1f) { + np->target[i].sval &= ~0x1f; + np->target[i].sval += 2; + } + } + } + /* + * This one triggers abort from the mid-level driver. + */ + else if (np->debug_error_recovery == 2) { + printf("%s: testing error recovery from mid-level driver abort()...\n", ncr_name(np)); + np->stalling = 2; + } + /* + * This one triggers reset from the mid-level driver. + */ + else if (np->debug_error_recovery == 3) { + printf("%s: testing error recovery from mid-level driver reset()...\n", ncr_name(np)); + np->stalling = 3; + } + } while (0); +#endif + + /* ** If we are resetting the ncr, wait for settle_time before ** clearing it. Then command processing will be resumed. */ @@ -6083,6 +6183,9 @@ */ ncr_complete (np, cp); +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + if (!np->stalling) +#endif OUTB (nc_istat, SIGP); } restore_flags(flags); @@ -6137,6 +6240,7 @@ { u_int32 dsp; int script_ofs; + int script_size; char *script_name; u_char *script_base; int i; @@ -6145,11 +6249,13 @@ if (dsp > np->p_script && dsp <= np->p_script + sizeof(struct script)) { script_ofs = dsp - np->p_script; + script_size = sizeof(struct script); script_base = (u_char *) np->script; script_name = "script"; } else { script_ofs = dsp - np->p_scripth; + script_size = sizeof(struct scripth); script_base = (u_char *) np->scripth; script_name = "scripth"; } @@ -6161,7 +6267,7 @@ (unsigned)INL (nc_dbc)); if (((script_ofs & 3) == 0) && - (unsigned)script_ofs < sizeof(struct script)) { + (unsigned)script_ofs < script_size) { printf ("%s: script cmd = %08x\n", ncr_name(np), (int) *(ncrcmd *)(script_base + script_ofs)); } @@ -6192,6 +6298,11 @@ */ while ((istat = INB (nc_istat)) & INTF) { if (DEBUG_FLAGS & DEBUG_TINY) printf ("F "); +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + if (np->stalling) + OUTB (nc_istat, INTF); + else +#endif OUTB (nc_istat, (istat & SIGP) | INTF); np->profile.num_fly++; ncr_wakeup (np, 0); @@ -6363,27 +6474,13 @@ ((INL(nc_dbc) & 0xf8000000) == SCR_WAIT_DISC)) { /* ** Unexpected data cycle while waiting for disconnect. - */ - if (INB(nc_sstat2) & LDSC) { - /* - ** It's an early reconnect. - ** Let's continue ... - */ - OUTONB (nc_dcntl, (STD|NOCOM)); - /* - ** info message - */ - printf ("%s: INFO: LDSC while IID.\n", - ncr_name (np)); - return; - }; - printf ("%s: target %d doesn't release the bus.\n", - ncr_name (np), (int)INB (nc_ctest0)&0x0f); - /* - ** return without restarting the NCR. - ** timeout will do the real work. - */ - return; + ** LDSC and CON bits may help in order to understand + ** what really happened. Print some info message and let + ** the reset function reset the BUS and the NCR. + */ + printf("%s:%d: data cycle while waiting for disconnect, LDSC=%d CON=%d\n", + ncr_name (np), (int)(INB(nc_ctest0)&0x0f), + (0!=(INB(nc_sstat2)&LDSC)), (0!=(INB(nc_scntl1)&ISCON))); }; /*---------------------------------------- @@ -8617,25 +8714,95 @@ ** Linux entry point of reset() function */ -#if LINUX_VERSION_CODE >= LinuxVersionCode(1,3,98) +#if defined SCSI_RESET_SYNCHRONOUS && defined SCSI_RESET_ASYNCHRONOUS + int ncr53c8xx_reset(Scsi_Cmnd *cmd, unsigned int reset_flags) +{ + int sts; + unsigned long flags; + + printk("ncr53c8xx_reset: pid=%lu reset_flags=%x serial_number=%ld serial_number_at_timeout=%ld\n", + cmd->pid, reset_flags, cmd->serial_number, cmd->serial_number_at_timeout); + + save_flags(flags); cli(); +#if 1 + /* + * We have to just ignore reset requests in some situations. + * Seems the mid-level driver sometimes is stuttering too much. + */ +#if defined SCSI_RESET_NOT_RUNNING + if (cmd->serial_number != cmd->serial_number_at_timeout) { + sts = SCSI_RESET_NOT_RUNNING; + goto out; + } +#endif + /* + * If the mid-level driver told us reset is synchronous, it seems + * that we must call the done() callback for the involved command, + * even if this command was not queued to the low-level driver, + * before returning SCSI_RESET_SUCCESS. + */ +#endif + sts = ncr_reset_bus(cmd, + (reset_flags & (SCSI_RESET_SYNCHRONOUS | SCSI_RESET_ASYNCHRONOUS)) == SCSI_RESET_SYNCHRONOUS); + /* + * Since we always reset the controller, when we return success, + * we add this information to the return code. + */ +#if defined SCSI_RESET_HOST_RESET + if (sts == SCSI_RESET_SUCCESS) + sts |= SCSI_RESET_HOST_RESET; +#endif + +out: + restore_flags(flags); + return sts; +} #else int ncr53c8xx_reset(Scsi_Cmnd *cmd) -#endif { printk("ncr53c8xx_reset: command pid %lu\n", cmd->pid); return ncr_reset_bus(cmd); } +#endif /* ** Linux entry point of abort() function */ +#if defined SCSI_RESET_SYNCHRONOUS && defined SCSI_RESET_ASYNCHRONOUS + +int ncr53c8xx_abort(Scsi_Cmnd *cmd) +{ + int sts; + unsigned long flags; + + printk("ncr53c8xx_abort: pid=%lu serial_number=%ld serial_number_at_timeout=%ld\n", + cmd->pid, cmd->serial_number, cmd->serial_number_at_timeout); + + save_flags(flags); cli(); +#if 1 + /* + * We have to just ignore abort requests in some situations. + * Seems the mid-level driver sometimes is stuttering too much. + */ + if (cmd->serial_number != cmd->serial_number_at_timeout) { + sts = SCSI_ABORT_NOT_RUNNING; + goto out; + } +#endif + sts = ncr_abort_command(cmd); +out: + restore_flags(flags); + return sts; +} +#else int ncr53c8xx_abort(Scsi_Cmnd *cmd) { printk("ncr53c8xx_abort: command pid %lu\n", cmd->pid); return ncr_abort_command(cmd); } +#endif #ifdef MODULE int ncr53c8xx_release(struct Scsi_Host *host) @@ -8873,6 +9040,10 @@ uc->cmd = UC_SETFLAG; else if ((arg_len = is_keyword(ptr, len, "clearprof")) != 0) uc->cmd = UC_CLEARPROF; +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + else if ((arg_len = is_keyword(ptr, len, "debug_error_recovery")) != 0) + uc->cmd = UC_DEBUG_ERROR_RECOVERY; +#endif else arg_len = 0; @@ -8973,6 +9144,22 @@ ptr += arg_len; len -= arg_len; } break; +#ifdef SCSI_NCR_DEBUG_ERROR_RECOVERY + case UC_DEBUG_ERROR_RECOVERY: + SKIP_SPACES(1); + if ((arg_len = is_keyword(ptr, len, "sge"))) + uc->data = 1; + else if ((arg_len = is_keyword(ptr, len, "abort"))) + uc->data = 2; + else if ((arg_len = is_keyword(ptr, len, "reset"))) + uc->data = 3; + else if ((arg_len = is_keyword(ptr, len, "none"))) + uc->data = 0; + else + return -EINVAL; + ptr += arg_len; len -= arg_len; + break; +#endif default: break; } ----------------------------- Cut Here ----------------------------------