NFS mmap problem in kernel 1.3.94

Kevin Layer (layer@franz.com)
Tue, 07 May 1996 09:43:33 -0700


(We are also running Redhat 3.0.3 with sysvinit 2.60.)

I also tried this patch:

>> In the file linux/mm/filemap.c, function filemap_sync(), there is a line
>> that says:
>>
>> dir = pgd_offset(current->mm, address);
>>
>> but it should be
>>
>> dir = pgd_offset(vma->vm_mm, address);
>>
>> (ie change "current->mm" into "vma->vm_mm"). Does that fix the problem
>> for you?
>>
>> Linus

and it didn't help the problem described below.

Also note that although the mmapped file was a local file, it was
mounted via NFS due to the way I cd'd to the directory--using AMD.
When I cd to the same directory directly, the dump works.

This transcript describes what I believe to be a bug in mmap(), in
which pages which were written to but not yet flushed to the file get
lost when a write to a new page causes a page fault.

The context is this: we are doing what is similar to an unexec() in
GNU emacs, which we call dumplisp. The source file (the original
executable) and the destination file are both mapped into memory, and
the creation and mmap commands for the destination file are

if ((dl_dst_fd = open(dst_file, O_RDWR|O_CREAT, 0777)) < 0) {
dumplisp_return("can't create output file: %s", dst_file);
}

and

dl_dst_base = mmap(0, dst_file_size, PROT_READ|PROT_WRITE, MAP_SHARED,
dl_dst_fd, 0);

dl_dst_base happens to be 0x4036e000 in the transcript below. In the
example, we do 3 memcpy's. The first two are writing the ehdr and the
phdr, both of which occur on the same memory page. At various times I
print out the first part of the destination file (actually, its memory
representation) and it looks like a good ELF header, until the first
write to a different page via the movsl instruction in memcpy (note
that since the addresses are even to a 4-byte boundary, ecx is zero
when the movsb loop is done, so no data is written at that time).

My theory is that the page fault interrupt handler is not properly
saving the physical page(s) that have already been allocated and are
currently mapped in, and so either the mapping is lost or else a
zero-mapping is forced (which has the same effect of losing the
original physical pages).

I would appreciate any help you can offer.

Duane Rettig Franz Inc. http://www.franz.com/ (www)
1995 University Ave, Ste 275 Berkeley, CA 94704 uunet!franz!duane (uucp)
Phone: (510) 548-3600; FAX: (510) 548-8253 duane@Franz.COM (internet)

% gdb cl
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.15.1 (i586-unknown-linux),
Copyright 1995 Free Software Foundation, Inc...(no debugging symbols found)...
(gdb) break elf_dumplisp
Breakpoint 1 at 0x800c4ea
(gdb) run
Starting program: /a/fabi/root/wow/linuxscm/4.3.linux/src/cl
(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...Allegro CL 4.3 [Linux/X86; R1] (5/1/96 0:13)
Copyright (C) 1985-1996, Franz Inc., Berkeley, CA, USA. All Rights Reserved.
Loading home .clinit.cl
; Loading #p"/a/fabi/root/wow/linuxscm/4.3.linux/src/.clinit.cl"
; Loading src:;comp;debugstructs.cl
; (/a/fabi/root/wow/linuxscm/4.3.linux/src/comp/debugstructs.cl)
;; Optimization settings: safety 1, space 1, speed 1, debug 2.
;; For a complete description of all compiler switches given the current
;; optimization settings evaluate (explain-compiler-settings).
user(1): (dumplisp :name "dl1_mt.foo" :libfasl-warning nil)
gc: E=72% N=50168 O+=48232 pfu=708+704 pfg=22+257
gc: E=0% N=49448 O+=720 pfu=1+1 pfg=0+20

Breakpoint 1, 0x800c4ea in elf_dumplisp ()
(gdb) set debug_dumplisp=1
(gdb) break memcpy
Breakpoint 2 at 0x40049f78
(gdb) c
Continuing.
src mapped: 0x400d9000 to 0x4036d660
phdr: type=6 offset= 0x34 paddr=0x8000034 memsz= 0xa0
phdr: type=3 offset= 0xd4 paddr=0x80000d4 memsz= 0x13
phdr: type=1 offset= 0x0 paddr=0x8000000 memsz= 0x4357c
phdr: type=1 offset= 0x43580 paddr=0x8044580 memsz= 0x13b58
phdr: type=2 offset= 0x510f8 paddr=0x80520f8 memsz= 0x98
txtpi=0x2 ehdr_in_text=1 ehdr_offset= 0x34
area1: 0x806f000 to 0x81e9e10 (size: 0x17ae10, rounded: 0x17b000); pad 0x5e1f0
area1: 0x8248000 to 0x82ca000 (size: 0x82000, rounded: 0x82000); pad 0x0
area1: 0x82ca000 to 0x82d2e00 (size: 0x8e00, rounded: 0x9000); pad 0x6f200
area1: 0x8342000 to 0x8350000 (size: 0xe000, rounded: 0xe000); pad 0x86000
dst_file_size += 0x34 bytes (ehdr)
dst_file_size += 0x370 bytes (shdr)

Breakpoint 2, 0x40049f78 in memcpy ()
(gdb) c
Continuing.
dst_file_size += 0x4357c bytes (text segment)

Breakpoint 2, 0x40049f78 in memcpy ()
(gdb) c
Continuing.
dst_file_size += 0xdc10 bytes (data segment--filesz only)
dst_file_size += 0x214000 bytes (heap)
xtra_size is 0
add 0x1e6 to xtra_size
add 0x9a to xtra_size
add 0x5150 to xtra_size
hole: add 0x370 to xtra_size
add 0x435c to xtra_size

Breakpoint 2, 0x40049f78 in memcpy ()
(gdb) c
Continuing.
dst_file_size += 0x9a9c bytes (not-in-core stuff)

Breakpoint 2, 0x40049f78 in memcpy ()
(gdb) c
Continuing.
dst_file_size += 0x4004 bytes (struct dumplisp_info)

Breakpoint 2, 0x40049f78 in memcpy ()
(gdb) c
Continuing.
new dst_file_size is 2799200 (0x2ab660)
dst mapped: 0x4036e000 to 0x40619660
memcpy(0x4036e000, 0x400d9000, 0x34)
dst in file: 0x0 to 0x34
src in file: 0x0

Breakpoint 2, 0x40049f78 in memcpy ()
(gdb) c
Continuing.
memcpy(0x4036e034, 0x400d9034, 0xa0)
dst in file: 0x34 to 0xd4
src in file: 0x34

Breakpoint 2, 0x40049f78 in memcpy ()
(gdb) x/20x 0x4036e000
0x4036e000 <ypall_foreach+2708576>: 0x464c457f 0x00010101 0x00000000 0x00000000
0x4036e010 <ypall_foreach+2708592>: 0x00030002 0x00000001 0x0800bc80 0x00000034
0x4036e020 <ypall_foreach+2708608>: 0x00051410 0x00000000 0x00200034 0x00280005
0x4036e030 <ypall_foreach+2708624>: 0x00130016 0x00000000 0x00000000 0x00000000
0x4036e040 <ypall_foreach+2708640>: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) x/s 0x4036e000
0x4036e000 <ypall_foreach+2708576>: "\177ELF\001\001\001"
(gdb) c
Continuing.
memcpy(0x403bf410, 0x4012a410, 0x370)
dst in file: 0x51410 to 0x51780
src in file: 0x51410

Breakpoint 2, 0x40049f78 in memcpy ()
(gdb) x/20x 0x4036e000
0x4036e000 <ypall_foreach+2708576>: 0x464c457f 0x00010101 0x00000000 0x00000000
0x4036e010 <ypall_foreach+2708592>: 0x00030002 0x00000001 0x0800bc80 0x00000034
0x4036e020 <ypall_foreach+2708608>: 0x00051410 0x00000000 0x00200034 0x00280005
0x4036e030 <ypall_foreach+2708624>: 0x00130016 0x00000006 0x00000034 0x08000034
0x4036e040 <ypall_foreach+2708640>: 0x08000034 0x000000a0 0x000000a0 0x00000005
(gdb) display/i $pc
2: x/i $eip 0x40049f78 <memcpy>: pushl %ebp
(gdb) si
0x40049f79 in memcpy ()
2: x/i $eip 0x40049f79 <memcpy+1>: pushl %edi
(gdb)
0x40049f7a in memcpy ()
2: x/i $eip 0x40049f7a <memcpy+2>: pushl %esi
(gdb)
0x40049f7b in memcpy ()
2: x/i $eip 0x40049f7b <memcpy+3>: movl 0x10(%esp,1),%ebp
(gdb)
0x40049f7f in memcpy ()
2: x/i $eip 0x40049f7f <memcpy+7>: movl 0x18(%esp,1),%edx
(gdb)
0x40049f83 in memcpy ()
2: x/i $eip 0x40049f83 <memcpy+11>: movl %ebp,%edi
(gdb)
0x40049f85 in memcpy ()
2: x/i $eip 0x40049f85 <memcpy+13>: movl 0x14(%esp,1),%esi
(gdb)
0x40049f89 in memcpy ()
2: x/i $eip 0x40049f89 <memcpy+17>: cmpl $0x7,%edx
(gdb)
0x40049f8c in memcpy ()
2: x/i $eip 0x40049f8c <memcpy+20>: jbe 0x40049fa9 <memcpy+49>
(gdb)
0x40049f8e in memcpy ()
2: x/i $eip 0x40049f8e <memcpy+22>: movl %ebp,%eax
(gdb)
0x40049f90 in memcpy ()
2: x/i $eip 0x40049f90 <memcpy+24>: negl %eax
(gdb)
0x40049f92 in memcpy ()
2: x/i $eip 0x40049f92 <memcpy+26>: andl $0x3,%eax
(gdb)
0x40049f95 in memcpy ()
2: x/i $eip 0x40049f95 <memcpy+29>: subl %eax,%edx
(gdb)
0x40049f97 in memcpy ()
2: x/i $eip 0x40049f97 <memcpy+31>: movl %eax,%ecx
(gdb)
0x40049f99 in memcpy ()
2: x/i $eip 0x40049f99 <memcpy+33>: cld
(gdb)
0x40049f9a in memcpy ()
2: x/i $eip 0x40049f9a <memcpy+34>: repz movsb %ds:(%esi),%es:(%edi)
(gdb) p/x $ecx
$1 = 0x0
(gdb) si
0x40049f9c in memcpy ()
2: x/i $eip 0x40049f9c <memcpy+36>: movl %edx,%eax
(gdb)
0x40049f9e in memcpy ()
2: x/i $eip 0x40049f9e <memcpy+38>: shrl $0x2,%eax
(gdb)
0x40049fa1 in memcpy ()
2: x/i $eip 0x40049fa1 <memcpy+41>: movl %eax,%ecx
(gdb)
0x40049fa3 in memcpy ()
2: x/i $eip 0x40049fa3 <memcpy+43>: cld
(gdb)
0x40049fa4 in memcpy ()
2: x/i $eip 0x40049fa4 <memcpy+44>: repz movsl %ds:(%esi),%es:(%edi)
(gdb) x/20x 0x4036e000
0x4036e000 <ypall_foreach+2708576>: 0x464c457f 0x00010101 0x00000000 0x00000000
0x4036e010 <ypall_foreach+2708592>: 0x00030002 0x00000001 0x0800bc80 0x00000034
0x4036e020 <ypall_foreach+2708608>: 0x00051410 0x00000000 0x00200034 0x00280005
0x4036e030 <ypall_foreach+2708624>: 0x00130016 0x00000006 0x00000034 0x08000034
0x4036e040 <ypall_foreach+2708640>: 0x08000034 0x000000a0 0x000000a0 0x00000005
(gdb) info registers
eax 0xdc 220
ecx 0xdc 220
edx 0x370 880
ebx 0x403bf410 1077670928
esp 0xbfff80c4 0xbfff80c4
ebp 0x403bf410 0x403bf410
esi 0x4012a410 1074963472
edi 0x403bf410 1077670928
eip 0x40049fa4 0x40049fa4
ps 0x312 786
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x2b 43
gs 0x2b 43
(gdb) si
0x40049fa4 in memcpy ()
2: x/i $eip 0x40049fa4 <memcpy+44>: repz movsl %ds:(%esi),%es:(%edi)
(gdb) x/20x 0x4036e000
0x4036e000 <ypall_foreach+2708576>: 0x00000000 0x00000000 0x00000000 0x00000000
0x4036e010 <ypall_foreach+2708592>: 0x00000000 0x00000000 0x00000000 0x00000000
0x4036e020 <ypall_foreach+2708608>: 0x00000000 0x00000000 0x00000000 0x00000000
0x4036e030 <ypall_foreach+2708624>: 0x00000000 0x00000000 0x00000000 0x00000000
0x4036e040 <ypall_foreach+2708640>: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) info registers
eax 0xdc 220
ecx 0xdb 219
edx 0x370 880
ebx 0x403bf410 1077670928
esp 0xbfff80c4 0xbfff80c4
ebp 0x403bf410 0x403bf410
esi 0x4012a414 1074963476
edi 0x403bf414 1077670932
eip 0x40049fa4 0x40049fa4
ps 0x10312 66322
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x2b 43
gs 0x2b 43
(gdb)