Re: [RFC5 PATCH v6 00/21] ILP32 for ARM64
From: Yury Norov
Date: Fri Mar 18 2016 - 12:46:57 EST
On Fri, Mar 18, 2016 at 04:55:26PM +0100, Alexander Graf wrote:
>
>
> On 18.03.16 16:49, Yury Norov wrote:
> > On Fri, Mar 18, 2016 at 06:28:29PM +0800, Zhangjian (Bamvor) wrote:
> >>
> >> For the glibc part, I found that there are 11 patches of ilp32 in top,
> >> but the original 28 patches of ilp32 is not in the top, there are more
> >> than 900 patches between them(referece the list below). Are you
> >> willing rebase all the ilp32 relative patches. It is very useful for
> >> reviewing and debugging. I saw andrew request the account in glibc,
> >> maybe it has already been in processs?).
> >>
> >
> > I already told there's mess there, and I'd prefer to make things work
> > first and then do cleanup.
>
> So how is progress going overall? The last submission I've seen is
> already 2 months ago. Are there particular bits holding you up?
>
>
> Alex
Hi Alexander,
For last time I mostly work on library, as it needs to be reworked
well. But yes, there's one serious bug puzzling me.
Tests like umount or pathconf fail but I see no major problem with
it, as it's most probably structure padding mismatch between kernel and
glibc. But there's (at least) one major problem I see.
Float tests fail due to NULL-dereferencing (0x14 actually) at
pthread_join(). It calls tgkill(), and after that child thread crashes.
See stack trace at the end.
The minimal test reproducing it is attached. The similar test where
parent forks a child and then kills it, works fine. (Attached too).
I see that in case of pthread, there's much more stuff that is cloned.
Other's looking similar.
pthread_create():
clone(child_stack=0xb953cea0, flags=CLONE_VM|CLONE_FS|CLONE_FILES
|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS
|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0xb953d398, tls=0xb953d7c0, child_tidptr=0xb953d398) = 1650
fork():
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xe5af6278) = 30537
So this most probably means that ilp32 code doesn't handle one of cloned
item properly. I have already discovered a bug where child processes
used parent TLS, so maybe this is something similar...
Except of this, I think ILP32 series is looking pretty well, at least
kernel part.
If you have any ideas/suggestions, I'll really appreciate it.
Yury.
strace -f ./trigo
[...]
clone(child_stack=0xdbbfb000,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND
|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS
|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0xdbbfb4f8, tls=0xdbbfb920, child_tidptr=0xdbbfb4f8) = 32030
rt_sigprocmask(SIG_BLOCK, [CHLD], Process 32030 attached [], 8) = 0
[pid 32029] rt_sigaction(SIGCHLD, NULL, <unfinished ...>
[pid 32030] set_robust_list(0xdbbfb504, 12 <unfinished ...>
[pid 32029] <... rt_sigaction resumed> {SIG_DFL, [ILL ABRT SEGV URG], 0}, 8) = 0
[pid 32030] <... set_robust_list resumed> ) = 0
[pid 32029] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 32030] write(1, "started\n", 8started
<unfinished ...>
[pid 32029] nanosleep({1, 65536}, <unfinished ...>
[pid 32030] <... write resumed> ) = 8
[pid 32030] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
[pid 32030] rt_sigsuspend([] <unfinished ...>
[pid 32029] <... nanosleep resumed> 0xfff9fd98) = 0
[pid 32029] write(1, "stoping...\n", 11stoping...) = 11
[pid 32029] openat(AT_FDCWD, "/root/sys-root/libilp32/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
[pid 32029] read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0 \0\0004\0\0\0"..., 512) = 512
[pid 32029] fstat(3, {st_mode=S_IFREG|0644, st_size=429138, ...}) = 0
[pid 32029] mmap(NULL, 135104, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xdb3db000
[pid 32029] mprotect(0xdb3ec000, 61440, PROT_NONE) = 0
[pid 32029] mmap(0xdb3fb000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x10000) = 0xdb3fb000
[pid 32029] close(3) = 0
[pid 32029] tgkill(32029, 32030, SIGRTMIN) = 0
[pid 32030] <... rt_sigsuspend resumed> ) = ? ERESTARTNOHAND (To be
restarted if no handler)
[pid 32029] write(1, "pthread_cancel == 0\n", 20pthread_cancel == 0) = 20
[pid 32030] --- SIGRTMIN {si_signo=SIGRTMIN, si_code=SI_TKILL, si_pid=32029, si_uid=0} ---
[pid 32029] write(1, "stopped\n", 8stopped
<unfinished ...>
[pid 32030] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x14} ---
[pid 32029] <... write resumed> ) = ? <unavailable>
[pid 32030] +++ killed by SIGSEGV +++
+++ killed by SIGSEGV +++
Segmentation fault
dmesg:
trigo[32246]: unhandled level 2 translation fault (11) at 0x00000014,
esr 0x90000006
pgd = ffffffc009335000
[00000014] *pgd=000000007917c003, *pud=000000007917c003,
*pmd=0000000000000000
CPU: 2 PID: 32246 Comm: trigo Not tainted 4.5.0+ #91
Hardware name: linux,dummy-virt (DT)
task: ffffffc00900e400 ti: ffffffc009078000 task.ti: ffffffc009078000
PC is at 0xda6853f0
LR is at 0xda6d5440
pc : [<00000000da6853f0>] lr : [<00000000da6d5440>] pstate: 60000000
sp : 00000000da511bc0
x29: 00000000da512e10 x28: 00000000da6a7000
x27: 0000000000000000 x26: 00000000da513490
x25: 0000000000000000 x24: 0000000000400820
x23: 00000000da6a9000 x22: 00000000ff869acb
x21: 00000000da6a9000 x20: 00000000da512e50
x19: 0000000000000000 x18: 0000000000000001
x17: 0000000000410bd8 x16: 00000000da691138
x15: 0000000000000000 x14: 0000000000000000
x13: 00000000da535970 x12: 0000000000000038
x11: 0000000000000028 x10: 0101010101010101
x9 : ff63647371607372 x8 : 0000000000000085
x7 : 0000000000007df5 x6 : 00000000da512e1c
x5 : 00000000da513518 x4 : 0000000000000002
x3 : 00000000da513920 x2 : 0000000000000000
x1 : 0000000000000008 x0 : 00000000da513490
Attachment:
mykill.tar.gz
Description: application/gzip
Attachment:
trigo.tar.gz
Description: application/gzip