Re: problem: kernel BUG at ll_rw_blk.c:937!

From: Norbert Scheibner (scno@gmx.net)
Date: Tue Jul 08 2003 - 10:07:36 EST


It seems I found a workaround.
The backup script ejects the medium with a "eject /dev/sda". Directly
after that, the script looks with a "dd" if there is a new medium in
drive and this is exactly the point, where the "dd" hangs.
I inserted a "sleep 30s" between the "eject" and the "dd" and now it
works.

On Sun, 09 Feb 2003 11:56:10 +0100, Norbert Scheibner <scno@gmx.net>
wrote:

> Since changing to kernel version 2.4, sometimes doing a "dd" or a
> "mount", which tries to access a Fujitsu SCSI 1.3 GB MO Drive with a 1.3
> GB medium (HW Sector Size = 2048 Byte) results in an uninterruptable
> sleep of the process, which tried to access the inserted MO medium.
>
> After that happened, the system works still and logins per console or
> ssh are still possible, but the only chance to reboot the system is
> doing a hard reset. Any "shutdown -r now", "lilo" or "hdparm" results
> also in an uninterruptable sleep.
>
> Daily I start a backup script which looks every 3 minutes for a medium
> in the MO drive with a "dd if /dev/sda of /dev/null bs 1 count 1" and if
> one is found, checks for the label "backup" and then writes the
> backup-volume to the medium and then eject it. The mount and umount
> after a timeout of 20 seconds is done by autofs with no fs or blocksize
> specified (the line from the autofs config file "mo -fstype=auto
> :/dev/sda"). Sometimes the script tries this for several hours until
> somebody inserts a new medium. The error occurs with the next
> successfull "dd" or "mount" after a lot of useless tests with no medium
> inserted.
> I could not find a better way to trigger this behavior, Sometimes the
> error occurs after a week sometimes the next day.
>
> I tried every stable kernel from 2.4.13 or so to 2.4.20. Actually I use
> 2.4.21-pre2. The 2.4.20 was the first version wich wrote a log output.
> I tried a NCR810 and a Dawicontrol DC-2976UW SCSI host adapter with no
> effect, an AMD K6-2 on an ALI5 chipset and an Athlon on a VIA KT133
> chipset with no effect too.
>
> Portion of the logfile:
> ------------------------------------------------------------------------
> Jan 1 13:56:34 server automount[974]: attempting to mount entry
> /.autofs/mo
> Jan 1 13:56:34 server automount[13779]: expired /.autofs/mo
> Jan 1 13:56:48 server Device not ready. Make sure there is a disc in
> the drive.
> Jan 1 13:56:48 server VFS: busy inodes on changed media.
> Jan 1 13:56:48 server sda : READ CAPACITY failed.
> Jan 1 13:56:48 server sda : status = 1, message = 00, host = 0, driver
> = 08
> Jan 1 13:56:48 server Current sd00:00: sense key Not Ready
> Jan 1 13:56:48 server Additional sense indicates Medium not present
> Jan 1 13:56:48 server sda : block size assumed to be 512 bytes, disk
> size 1GB.
> Jan 1 13:56:48 server sda: unknown partition table
> Jan 1 13:56:58 server kernel BUG at ll_rw_blk.c:937!
> Jan 1 13:56:58 server invalid operand: 0000
> Jan 1 13:56:58 server CPU: 0
> Jan 1 13:56:58 server EIP: 0010:[<c019f026>] Not tainted
> Jan 1 13:56:58 server EFLAGS: 00010246
> Jan 1 13:56:58 server eax: 00002000 ebx: 00000001 ecx: 00000800
> edx: 00000000
> Jan 1 13:56:58 server esi: c1780d40 edi: 00000004 ebp: 000fffff
> esp: c6799e94
> Jan 1 13:56:58 server ds: 0018 es: 0018 ss: 0018
> Jan 1 13:56:58 server Process umount (pid: 13798, stackpage=c6799000)
> Jan 1 13:56:58 server Stack: 00000800 c1780d40 00000000 000fffff
> c7f33e00 00002000 c624fa40 00000113
> Jan 1 13:56:58 server c7fbb7dc 00000000 00000000 c0123675 c019f68c
> c624fa18 00000001 c1780d40
> Jan 1 13:56:58 server 00000004 00000001 00000001 00000002 c019f6ee
> 00000001 c1780d40 c1780d40
> Jan 1 13:56:58 server Call Trace: [<c0123675>] [<c019f68c>]
> [<c019f6ee>] [<c019f847>] [<c88f97ec>]
> Jan 1 13:56:58 server [<c88f88d9>] [<c88fb440>] [<c0136251>]
> [<c014542e>] [<c0139d77>] [<c0145aaf>]
> Jan 1 13:56:58 server [<c0124995>] [<c0145acc>] [<c0106d03>]
> Jan 1 13:56:58 server
> Jan 1 13:56:58 server Code: 0f 0b a9 03 62 9e 21 c0 0f b6 46 15 0f b7
> 4e 14 8b 14 85 a0
> Jan 1 13:57:16 server <4>
> ------------------------------------------------------------------------
>
> More LogFiles and additional info on
> http://www-user.tu-chemnitz.de/~scno/bugreport/
> bug.log - 2 log outputs from 2.4.20 and 2 from 2.4.21-pre2
> config - .config file from the 2.4.21-pre2
> copies of cpuinfo, iomem, ioports, ksyms, modules, scsi and the output
> from a "ps ax" and "lspci -vv" captured on the last occurence of the
> error with the 2.4.21-pre2
> boot - compiled kernel und symbol map
> lib/modules - compiled modules
>
> Actual system:
> Athlon 750 MHz
> Abit KT133 - VIA KT133 Chipset
> Dawicontrol DC-2976UW
> Fujitsu SCSI 1.3 GigaMO Drive
> 2 SCSI CDROM drives
>
> less important components:
> 128 MB SDRAM
> 4 Port Dec/Tulip Network Adapter
> Orinoco Gold WLAN Adapter
> 37 GB IDE hard disk
>
> One thing, I just saw, is, that it looks like the kernel detects
> erroneously a blocksize of 512 Byte and not of 2048, when the error
> occurs.
>
> Regards
> Norbert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jul 15 2003 - 22:00:27 EST