Re: Multiple potential races on vma->vm_flags

From: Oleg Nesterov
Date: Fri Sep 25 2015 - 10:29:28 EST


On 09/23, Sasha Levin wrote:
>
> Another similar trace where we see a problem during process exit:
>
> [1922964.887922] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> [1922964.890234] Modules linked in:
> [1922964.890844] CPU: 1 PID: 21477 Comm: trinity-c161 Tainted: G W 4.3.0-rc2-next-20150923-sasha-00079-gec04207-dirty #2569
> [1922964.892584] task: ffff880251858000 ti: ffff88009f258000 task.ti: ffff88009f258000
> [1922964.893723] RIP: acct_collect (kernel/acct.c:542)

530 void acct_collect(long exitcode, int group_dead)
531 {
532 struct pacct_struct *pacct = &current->signal->pacct;
533 cputime_t utime, stime;
534 unsigned long vsize = 0;
535
536 if (group_dead && current->mm) {
537 struct vm_area_struct *vma;
538
539 down_read(&current->mm->mmap_sem);
540 vma = current->mm->mmap;
541 while (vma) {
542 vsize += vma->vm_end - vma->vm_start; // !!!!!!!!!!!!
543 vma = vma->vm_next;
544 }
545 up_read(&current->mm->mmap_sem);
546 }


> [1922964.895105] RSP: 0000:ffff88009f25f908 EFLAGS: 00010207
> [1922964.895935] RAX: dffffc0000000000 RBX: 2ce0ffffffffffff RCX: 0000000000000000
> [1922964.897008] RDX: ffff2152b153ffff RSI: 059c200000000000 RDI: 2ce1000000000007
> [1922964.898091] RBP: ffff88009f25f9e8 R08: 0000000000000001 R09: 00000000000003ef
> [1922964.899178] R10: ffffed014d7a3a01 R11: 0000000000000001 R12: ffff880082b485c0
> [1922964.901643] R13: ffff2152b153ffff R14: 1ffff10013e4bf24 R15: ffff88009f25f9c0
...
> 0: 02 00 add (%rax),%al
> 2: 0f 85 9d 05 00 00 jne 0x5a5
> 8: 48 8b 1b mov (%rbx),%rbx
> b: 48 85 db test %rbx,%rbx
> e: 0f 84 7b 05 00 00 je 0x58f

Probably "mov (%rbx),%rbx" is "vma = mm->mmap",

> 14: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
> 1b: fc ff df
> 1e: 31 d2 xor %edx,%edx
> 20: 48 8d 7b 08 lea 0x8(%rbx),%rdi

and this loads the addr of vma->vm_end for kasan,

> 24: 48 89 fe mov %rdi,%rsi
> 27: 48 c1 ee 03 shr $0x3,%rsi
> 2b:* 80 3c 06 00 cmpb $0x0,(%rsi,%rax,1) <-- trapping instruction

which reporst the error. But in this case this is not NULL-deref,
note that $rbx = 2ce0ffffffffffff and this is below __PAGE_OFFSET
(but above TASK_SIZE_MAX). It seems it is not even canonical. In
any case this odd value can't be valid.

Again, looks like mm->mmap pointer was corrupted. Perhaps you can
re-test with the stupid patch below. But unlikely it will help. If
mm was freed we would probably see something else.

Oleg.
---

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -672,6 +672,7 @@ struct mm_struct *mm_alloc(void)
void __mmdrop(struct mm_struct *mm)
{
BUG_ON(mm == &init_mm);
+ BUG_ON(atomic_read(&mm->mm_users));
mm_free_pgd(mm);
destroy_context(mm);
mmu_notifier_mm_destroy(mm);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/