Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

From: Michal Hocko
Date: Thu Feb 01 2018 - 08:10:21 EST


[CC Kees and Linus - for your background, we are talking about failures
http://lkml.kernel.org/r/20180107090229.GB24862@xxxxxxxxxxxxxx
introduced by http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@xxxxxxxxxx
Debugging has shown that load_elf_binary tries to map elf segment over
an existing brk - see below.]

On Thu 01-02-18 08:43:34, Anshuman Khandual wrote:
[...]
> [ 9.295990] vma c000001fc8137c80 start 0000000010030000 end 0000000010040000
> next c000001fc81378c0 prev c000001fc8137680 mm c000001fc8108200
> prot 8000000000000104 anon_vma (null) vm_ops (null)
> pgoff 1003 file (null) private_data (null)
> flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> [ 9.296351] CPU: 47 PID: 7537 Comm: sed Not tainted 4.14.0-00006-g4bd92fe-dirty #162
> [ 9.296450] Call Trace:
> [ 9.296482] [c000001fc70db9b0] [c000000000b180e0] dump_stack+0xb0/0xf0 (unreliable)
> [ 9.296588] [c000001fc70db9f0] [c0000000002db0b8] do_brk_flags+0x2d8/0x440
> [ 9.296674] [c000001fc70dbac0] [c0000000002db4d0] vm_brk_flags+0x80/0x130
> [ 9.296751] [c000001fc70dbb20] [c0000000003d2998] set_brk+0x80/0xe8
> [ 9.296824] [c000001fc70dbb60] [c0000000003d2518] load_elf_binary+0x12f8/0x1580
> [ 9.296910] [c000001fc70dbc80] [c00000000035d9e0] search_binary_handler+0xd0/0x270
> [ 9.296999] [c000001fc70dbd10] [c00000000035f938] do_execveat_common.isra.31+0x658/0x890
> [ 9.297089] [c000001fc70dbdf0] [c00000000035ff80] SyS_execve+0x40/0x50
> [ 9.297162] [c000001fc70dbe30] [c00000000000b220] system_call+0x58/0x6c
>
> But coming back to when it failed with MAP_FIXED_NOREPLACE, looking into ELF
> section details (readelf -aW /usr/bin/sed), there was a PT_LOAD segment with
> p_memsz > p_filesz which might be causing set_brk() to be called.
>
>
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> ...
> LOAD 0x020328 0x0000000010030328 0x0000000010030328 0x000384 0x0094a0 RW 0x10000
>
> which can be confirmed by just dumping elf_brk/elf_bss for this particular
> instance. (elf_brk > elf_bss)

Hmm, interesting. So the above is not a regular brk. The check has been
added in 2001 by "v2.4.10.1 -> v2.4.10.2" but the changelog is not
revealing at all.

Btw. my /bin/ls also has MemSiz>FileSiz
LOAD 0x01ade0 0x000000000061ade0 0x000000000061ade0 0x00079c 0x001520 RW 0x200000
113: 000000000061b57c 0 NOTYPE GLOBAL DEFAULT ABS __bss_start

and do not see any problem. So this is more likely a problem of elf_brk
being placed at a wrong address. But I am desperately lost in this code
so I might be completely off.

> $dmesg | grep elf_brk
> [ 9.571192] elf_brk 10030328 elf_bss 10030000

Hmm these are on the same page. Is this really expected?

> static int load_elf_binary(struct linux_binprm *bprm)
> ---------------------
>
> if (unlikely (elf_brk > elf_bss)) {
> unsigned long nbyte;
>
> /* There was a PT_LOAD segment with p_memsz > p_filesz
> before this one. Map anonymous pages, if needed,
> and clear the area. */
> retval = set_brk(elf_bss + load_bias,
> elf_brk + load_bias,
> bss_prot);
>
>
> ---------------------
> So is not there a chance that subsequent file mapping might be overlapping
> with these anon mappings ? I mean may be thats how ELF loading might be
> happening right now.

I will study the code more but it would be really great if
somebody more familiar with this area could help me out a
bit. Why do we add this brk at all and why it doesn't matter that
we map over it by a real file mapping. As per previous email
http://lkml.kernel.org/r/20180130094205.GS21609@xxxxxxxxxxxxxx there
will be a new brk established later.

--
Michal Hocko
SUSE Labs