Re: linux v2.2.15pre13 filesystem/kernel race and klog printk data loss. (fwd)

From: M Sweger (mikesw@whiterose.net)
Date: Fri Mar 10 2000 - 09:47:29 EST


On Thu, 9 Mar 2000, Alan Cox wrote:

> > messages into every UMSDOS function call upon entry into
> > the function. When I ran this, the Oops disappeared!!!!
>
> That sniffs of a compiler bug

This is a possiblity. I will gry egcs v1.1.2. However, I've used
egcs v1.0.3 on the old linux 2.0.39 kernels and linux 2.2.1 without
any problems.

I saw linux 2.2.15pre14 came out last night. There were two fixes
that may apply or a along the lines of what I'm seeing.

        a). a semaphore deadlock fix.
        b). a fix in the directory race in ext2. Note: I have UMSDOS

Perhaps this is what is causing my "Resource Busy"/dir race problem. A
wild guess. I'll try this one today.

BTW, did you get rid of the "noise" in the soundmodule file for these
pre releases? It'd would be nice to clean the compile stuff up: however
minor. :)

>
> > printk's will have a partial message written or a
> > printk will overwrite the previous printk in the middle
> > of the first printk message.
>
> yes.. we choose to get something out regardless otherwise we might deadlock
> and not see the critical text

I assume that if the printk's backlog in kernel memory buffer, that due
to the filesystem being slow and trying to catch up that the kernel will
hang and cause deadlock.

If the kernel GFP's (Oops') then this should have a higher priority of
being written out to syslog to catch it per the syslog priority stuff.

In my case, I'm using KERN_INFO and just tracing what the s/w is doing
and related variables. It isn't crashing and thus I think it should be
able to record to disk every message "in the order output" and completely
without the loss of messages. Otherwise, what good is it to record
information when one can't get all of the information, especially for
debugging purposes. It makes the job very hard.

I would be able to excuse one printk() message loss (maybe) but 50-100
and more (if I put more in)? To give an example, if a printk() is put
in the dcache.c file for every function entered and a printk() put in
every function in the UMSDOS routines, all I get is the dcache.c ones's
and only a couple for UMSDOS. If the dcache.c printk()'s disabled, then
the UMSDOS come out alot more although there are a few that don't come
out when they should. I.E. rmdir -rf include/config just recursively
executes the same code over and over. However, not all printk()'s in
that section of code come out. Thus I get the impression that the file/dir
didn't get deleted/unlinked when it actually did. However, there is no
printk() for it even though the printk() is *right* before every unlink()
call.

Thus, I'm stuck trouble shooting it to narrow the problem down.
 
>
> > rm: cannot remove directory `include/config': Device or resource busy
> > make: *** [mrproper] Error 1
>
> Ok that is one for the UMSDOS maintainer.
>
> > Note: the UMSDOS maintainer has been notified of the above problems
> > but there is nothing he can do, since it all seems to be pointing
> > towards the VFS/kernel code IMHO.
>
> Its stuff nobody else is seeing from reports so far. You might want to suggest
> the UMSDOS maintainers have a chat with Al Viro <aviro@redhat.com> as he is
> an expert on the dcache stuff - Im not
>
> Alan
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:18 EST