Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

From: Roland Kuhn
Date: Tue Apr 24 2007 - 09:04:00 EST


Hi Jens!

On 24 Apr 2007, at 14:32, Jens Axboe wrote:

On Tue, Apr 24 2007, Roland Kuhn wrote:
Hi Jens!

[I made a typo in the Cc: list so that lkml is only included as of
now. Actually I copied the typo from you ;-) ]

Well no, you started the typo, I merely propagated it and forgot to fix
it up :-)

Actually, I copied it from your printk() ;-) (thinking helps...)

Sure. You might want to include NFS file access into your tests,
since we've not triggered this with locally accessing the disks.
BTW:

How are you exporting the directory (what exports options) - how
is it
mounted by the client(s)? What chunksize is your raid6 using?

And what are the nature of the files on the raid (huge, small, ?) and
what are the client(s) doing? Just approximately, I know these things
can be hard/difficult/impossible to specify.

The files are 100-400MB in size and the client is merging them into a
new file in the same directory using the ROOT library, which does in
essence alternating sequences of

_llseek(somewhere)
read(n bytes)
_llseek(somewhere+n)
read(m bytes)
...

and then

_llseek(somewhere)
rt_sigaction(ignore INT)
write(n bytes)
rt_sigaction(INT->DFL)
time()
_llseek(somewhere+n)
...

where n is of the the order of 30kB. The input files are treated
sequentially, not randomly.

Ok, I'll see if I can reproduce it. No luck so far, I'm afraid.

Too bad.

BTW: the machine just stopped dead, no sign whatsoever on console or
netconsole, so I rebooted with elevator=deadline
(need to get some work done besides ;-) )

Unfortunately expected, if we can race and lose an update to - >next_rq,
we can race and corrupt some of the internal data structures as well. If
you have the time and inclination, it would be interesting to see if you
can reproduce with some debugging options enabled:

- Enable all preempt, spinlock and lockdep debugging measures
- Possibly slab poisoning, although that may slow you down somewhat

Kernel compilation under way, will report back.

Are you using 4kb stacks?

No idea, 'grep -i stack .config' gives no indication, but ISTR that 4k was made the default some time back?

Ciao,
Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both. - Benjamin Franklin
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL++++ P+++ L+++ E(+) W+ !N K- w--- M + !V Y+
PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++>++++ h---- y+++
------END GEEK CODE BLOCK------


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Attachment: PGP.sig
Description: This is a digitally signed message part