Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

From: Petr Vandrovec (VANDROVE@vc.cvut.cz)
Date: Mon Oct 23 2000 - 15:35:18 EST


On 23 Oct 00 at 16:13, Alexander Viro wrote:
> On Mon, 23 Oct 2000, Linus Torvalds wrote:
>
> > Al, any ideas? I have this feeling that the simplest fix is just to leave
> > the race open, and make truncate_complete_page() just leave such a "racy"
> > page in the page cache. It will still race, and the invalid page will
> > still exist, but the end result should be harmless.
>
> Provided that we clean it - why the hell do we want to take it out of
> the pagecache? I don't see any fundamental reasons to prohibit pages
> past the ->i_size being hashed. Methods _must_ check for ->i_size,
> but they do it anyway. All race-prevention is based on page locks and
> ->i_sem.
>
> Yes, filemap_nopage() should check for i_size at the very end and fail if
> the page became off-limits. But that's completely unrelated issue - it's
> mmap semantics, not pagecache one.

Yes. Bad news. No problem was catched in filemap_nopage, but one
(of 57000) pages was dirty and had page->mapping == NULL... (maybe
only one was caused that this was just after bootup, with plenty of memory)
Maybe I should look at readahead code? Although to be clear I do not
know why. Unless there is bug in logic in test program, it should
first dirty pages, and AFTER that it should truncate - and unmap and
exit, without ever touching pages of mapping... My first testcases were
with this race (and with raw devices), but then I found (by removing
more and more code) that no race (and no raw devices) are required...
 
> The point being: we should _never_ drop ->mapping unless the page is
> irrevocably going away. We can (and probably should) drop the off-limits
> page as soon as ->count hits zero, but we should not do it before that.

In case of truncate it is going irrevocable away. Accesses after truncate
should (and sometime give you) SIGBUS...

             total used free shared buffers cached
Mem: 255768 42208 213560 0 496 18420
-/+ buffers/cache: 23292 232476
Swap: 530136 13200 516936

Strace of another run:
                                                
1688 22:23:41.748438 execve("./oopsdemo", ["./oopsdemo"], [/* 18 vars */]) = 0
1688 22:23:41.749058 brk(0) = 0x8049ae8
1688 22:23:41.749399 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40016000
1688 22:23:41.749641 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory)
1688 22:23:41.749861 open("/etc/ld.so.cache", O_RDONLY) = 4
1688 22:23:41.750011 fstat(4, {st_mode=S_IFREG|0644, st_size=43818, ...}) = 0
1688 22:23:41.750253 old_mmap(NULL, 43818, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40017000
1688 22:23:41.750408 close(4) = 0
1688 22:23:41.750538 open("/lib/libc.so.6", O_RDONLY) = 4
1688 22:23:41.750676 fstat(4, {st_mode=S_IFREG|0755, st_size=1057576, ...}) = 0
1688 22:23:41.750878 read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\224\314"..., 4096) = 4096
1688 22:23:41.751173 old_mmap(NULL, 1072484, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40022000
1688 22:23:41.751327 mprotect(0x4011e000, 40292, PROT_NONE) = 0
1688 22:23:41.751441 old_mmap(0x4011e000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 4, 0xfb000) = 0x4011e000
1688 22:23:41.751633 old_mmap(0x40124000, 15716, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40124000
1688 22:23:41.751809 close(4) = 0
1688 22:23:41.753291 munmap(0x40017000, 43818) = 0
1688 22:23:41.753554 getpid() = 1688
1688 22:23:41.753785 fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(4, 1), ...}) = 0
1688 22:23:41.753993 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000
1688 22:23:41.754162 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
1688 22:23:41.754430 write(1, "Go\n", 3) = 3
1688 22:23:41.754727 open("ram0", O_RDWR|O_CREAT, 0600) = 4
1688 22:23:41.754949 unlink("ram0") = 0
1688 22:23:41.755103 ftruncate(4, 234881024) = 0
1688 22:23:41.755228 old_mmap(NULL, 234881024, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0x40128000
1688 22:23:41.755700 pipe([5, 6]) = 0
1688 22:23:41.755847 pipe([7, 8]) = 0
1688 22:23:41.756024 fork() = 1689
1689 22:23:41.756735 close(6 <unfinished ...>
1688 22:23:41.756798 close(5 <unfinished ...>
1689 22:23:41.756844 <... close resumed> ) = 0
1688 22:23:41.756894 <... close resumed> ) = 0
1689 22:23:41.756949 close(7 <unfinished ...>
1688 22:23:41.756997 close(8 <unfinished ...>
1689 22:23:41.757041 <... close resumed> ) = 0
1688 22:23:41.757089 <... close resumed> ) = 0
1689 22:23:41.757139 close(4 <unfinished ...>
1688 22:23:41.757188 write(6, "\0", 1 <unfinished ...>
1689 22:23:41.757250 <... close resumed> ) = 0
1688 22:23:41.757301 <... write resumed> ) = 1
1689 22:23:41.757355 read(5, <unfinished ...>
1688 22:23:41.757405 read(7, <unfinished ...>
1689 22:23:41.757450 <... read resumed> "\0", 1) = 1
1689 22:23:49.260756 write(8, "\0", 1) = 1
1689 22:23:49.436204 read(5, <unfinished ...>
1688 22:23:49.442969 <... read resumed> "\0", 1) = 1
1688 22:23:49.450799 write(6, "\0", 1 <unfinished ...>
1689 22:23:49.466563 <... read resumed> "\0", 1) = 1
1689 22:23:49.497699 munmap(0x40017000, 4096) = 0
1689 22:23:49.592644 _exit(0) = ?
1688 22:23:49.652744 <... write resumed> ) = 1
1688 22:23:49.675277 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
1688 22:23:49.675537 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
1688 22:23:49.675661 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
1688 22:23:49.675783 nanosleep({5, 0}, {5, 0}) = 0
1688 22:23:54.676754 ftruncate(4, 0) = 0
1688 22:23:57.622852 --- SIGCHLD (Child exited) ---
1688 22:23:57.660982 munmap(0x40128000, 234881024) = 0
1688 22:23:57.661289 munmap(0x40017000, 4096) = 0
1688 22:23:57.661475 _exit(0) = ?

First page->mapping == NULL entry in syslog is dated 22:23:58, but
couple of entries was lost before (probably I should print only '.' for
each such page; this run there was more than 100 such pages)
Another question is why SIGCHLD was delivered to parent AFTER ftruncate,
but exit was invoked couple of seconds before - maybe it syncs
child address space to disk?
                                            Petr Vandrovec
                                            vandrove@vc.cvut.cz
                                                                                                            
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Oct 23 2000 - 21:00:21 EST