Re: [PATCH] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits

From: Herton R. Krzesinski
Date: Tue Aug 11 2015 - 12:48:40 EST


On Mon, Aug 10, 2015 at 09:02:11PM +0200, Manfred Spraul wrote:
> Hi Herton,
>
> On 08/10/2015 05:31 PM, Herton R. Krzesinski wrote:
> >Well without the synchronize_rcu() and with the semid list loop fix I was still
> >able to get issues, and I thought the problem is related to racing with IPC_RMID
> >on freeary again. This is one scenario I would imagine:
> >
> > A B
> >
> >freeary()
> > list_del(&un->list_id)
> > spin_lock(&un->ulp->lock)
> > un->semid = -1
> > list_del_rcu(&un->list_proc)
> > __list_del_entry(&un->list_proc)
> > __list_del(entry->prev, entry->next) exit_sem()
> > next->prev = prev; ...
> > prev->next = next; ...
> > ... un = list_entry_rcu(ulp->list_proc.next...)
> > (&un->list_proc)->prev = LIST_POISON2 if (&un->list_proc == &ulp->list_proc) <true, last un removed by thread A>
> > ... kfree(ulp)
> > spin_unlock(&un->ulp->lock) <---- bug
> >
> >Now that is a very tight window, but I had problems even when I tried this patch
> >first:
> >
> >(...)
> >- if (&un->list_proc == &ulp->list_proc)
> >- semid = -1;
> >- else
> >- semid = un->semid;
> >+ if (&un->list_proc == &ulp->list_proc) {
> >+ rcu_read_unlock();
> What about:
> + spin_unlock_wait(&ulp->lock);

spin_unlock_wait works, thanks (both in theory and in practice, I tested).

> >+ break;
> >+ }
> >+ spin_lock(&ulp->lock);
> >+ semid = un->semid;
> >+ spin_unlock(&ulp->lock);
> >
> >+ /* exit_sem raced with IPC_RMID, nothing to do */
> > if (semid == -1) {
> > rcu_read_unlock();
> >- break;
> >+ synchronize_rcu();
> >+ continue;
> > }
> >(...)
> >
> >So even with the bad/uneeded synchronize_rcu() which I had placed there, I could
> >still get issues (however the testing on patch above was on an older kernel than
> >latest upstream, from RHEL 6, I can test without synchronize_rcu() on latest
> >upstream, however the affected code is the same). That's when I thought of
> >scenario above. I was able to get this oops:
> Adding sleep() usually help, too. But it is ugly, so let's try to understand
> the race and to fix it.

Yes, the race should be what I described earlier, and it's understood. I also
now have proof in practice that without synchronization before exit, I'm able to
trigger a crash on latest upstream kernel as well. This is the change I tested
with latest stock 4.2-rc6:

diff --git a/ipc/sem.c b/ipc/sem.c
index bc3d530..3b8b66b 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2074,17 +2074,26 @@ void exit_sem(struct task_struct *tsk)
rcu_read_lock();
un = list_entry_rcu(ulp->list_proc.next,
struct sem_undo, list_proc);
- if (&un->list_proc == &ulp->list_proc)
- semid = -1;
- else
- semid = un->semid;
+ if (&un->list_proc == &ulp->list_proc) {
+ /* we must wait for freeary() before freeing this ulp,
+ * in case we raced with last sem_undo. There is a small
+ * possibility where we exit while freeary() didn't
+ * finish unlocking sem_undo_list */
+ /*spin_unlock_wait(&ulp->lock);*/
+ rcu_read_unlock();
+ break;
+ }
+ spin_lock(&ulp->lock);
+ semid = un->semid;
+ spin_unlock(&ulp->lock);

+ /* exit_sem raced with IPC_RMID, nothing to do */
if (semid == -1) {
rcu_read_unlock();
- break;
+ continue;
}

- sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, un->semid);
+ sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, semid);
/* exit_sem raced with IPC_RMID, nothing to do */
if (IS_ERR(sma)) {
rcu_read_unlock();


I commented the synchronization for testing (the spin_unlock_wait call).

Because of the comment/removing the spin_unlock_wait, I'm then able to trigger
the following crash:

[ 587.768363] BUG: spinlock wrong owner on CPU#2, test/419
[ 587.768434] general protection fault: 0000 [#1] SMP
[ 587.768454] Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6
table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_rng virtio_balloon virtio_console virtio_net iosf_mbi crct10dif_pclmul c
rc32_pclmul ghash_clmulni_intel pcspkr snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcor
e qxl ttm drm_kms_helper drm crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
[ 587.768712] CPU: 2 PID: 419 Comm: test Not tainted 4.2.0-rc6+ #1
[ 587.768732] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
[ 587.768762] task: ffff88003ad68340 ti: ffff88003a0ec000 task.ti: ffff88003a0ec000
[ 587.768785] RIP: 0010:[<ffffffff810d60b3>] [<ffffffff810d60b3>] spin_dump+0x53/0xc0
[ 587.768818] RSP: 0018:ffff88003a0efd68 EFLAGS: 00010202
[ 587.768836] RAX: 000000000000002c RBX: ffff88003d016f08 RCX: 0000000000000000
[ 587.768859] RDX: ffff88003fd11518 RSI: ffff88003fd0eb68 RDI: ffff88003fd0eb68
[ 587.768882] RBP: ffff88003a0efd78 R08: 0000000000000000 R09: 000000006b6b6b6b
[ 587.768904] R10: 000000000000025f R11: ffffc90000b30020 R12: 6b6b6b6b6b6b6b6b
[ 587.768927] R13: ffff88003a5e5c40 R14: ffff88003d1eb340 R15: ffff88003a5e5c40
[ 587.768949] FS: 00007fc2fcd89700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[ 587.768981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 587.769002] CR2: 00007fc2fcb84dd4 CR3: 000000003d230000 CR4: 00000000000406e0
[ 587.769004] Stack:
[ 587.769004] ffff88003d016f08 ffffffff81a4f4ad ffff88003a0efd98 ffffffff810d6150
[ 587.769004] ffff88003d016f08 ffff88003a5e5cb8 ffff88003a0efdb8 ffffffff810d61e2
[ 587.769004] ffff88003a5e5c40 ffff88003a5e5c90 ffff88003a0efdc8 ffffffff8175d50e
[ 587.769004] Call Trace:
[ 587.769004] [<ffffffff810d6150>] spin_bug+0x30/0x40
[ 587.769004] [<ffffffff810d61e2>] do_raw_spin_unlock+0x82/0xa0
[ 587.769004] [<ffffffff8175d50e>] _raw_spin_unlock+0xe/0x10
[ 587.769004] [<ffffffff812f45f2>] freeary+0x82/0x2a0
[ 587.769004] [<ffffffff8175d58e>] ? _raw_spin_lock+0xe/0x10
[ 587.769004] [<ffffffff812f5f2e>] semctl_down.clone.0+0xce/0x160
[ 587.769004] [<ffffffff810633fa>] ? __do_page_fault+0x19a/0x430
[ 587.769004] [<ffffffff8112e5e8>] ? __audit_syscall_entry+0xa8/0x100
[ 587.769004] [<ffffffff812f61f6>] SyS_semctl+0x236/0x2c0
[ 587.769004] [<ffffffff810215ce>] ? syscall_trace_leave+0xde/0x130
[ 587.769004] [<ffffffff8175d7ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 587.769004] Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2d a4 81 31 c0 65 8b 15 8b 40 f3 7e e8 21 33 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
[ 587.769004] RIP [<ffffffff810d60b3>] spin_dump+0x53/0xc0
[ 587.769004] RSP <ffff88003a0efd68>
[ 587.769563] ---[ end trace 844d186084062ba4 ]---
[ 612.022002] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:446]
[ 612.022002] Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_rng virtio_balloon virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore qxl ttm drm_kms_helper drm crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
[ 612.022002] CPU: 3 PID: 446 Comm: test Tainted: G D 4.2.0-rc6+ #1
[ 612.022002] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
[ 612.022002] task: ffff88003e3ab040 ti: ffff88003ac20000 task.ti: ffff88003ac20000
[ 612.022002] RIP: 0010:[<ffffffff8101cb36>] [<ffffffff8101cb36>] native_read_tsc+0x6/0x20
[ 612.022002] RSP: 0018:ffff88003ac23c08 EFLAGS: 00000206
[ 612.022002] RAX: 00000000e0df8c7d RBX: 800000002060b065 RCX: 00000000e0df8c5f
[ 612.022002] RDX: 00000000000001a4 RSI: 0000000000000000 RDI: 0000000000000001
[ 612.022002] RBP: ffff88003ac23c08 R08: 0000000000000000 R09: ffff88003aea5cc0
[ 612.022002] R10: 00007ffdf0e89830 R11: 0000000000000008 R12: ffffffff8106a938
[ 612.022002] R13: ffff88003ac23b98 R14: ffff88003a295300 R15: ffff88003a295000
[ 612.022002] FS: 00007fc2fcd89700(0000) GS:ffff88003fd80000(0000) knlGS:0000000000000000
[ 612.022002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 612.022002] CR2: 00007fc2fc8dffd0 CR3: 000000003905d000 CR4: 00000000000406e0
[ 612.022002] Stack:
[ 612.022002] ffff88003ac23c38 ffffffff8138bd30 ffff88003a5e5c40 00000000ac762d50
[ 612.022002] 000000003662aefc 0000000000000001 ffff88003ac23c48 ffffffff8138bcaf
[ 612.022002] ffff88003ac23c78 ffffffff810d6386 ffff88003a5e5c40 ffff880037a0f880
[ 612.022002] Call Trace:
[ 612.022002] [<ffffffff8138bd30>] delay_tsc+0x40/0x70
[ 612.022002] [<ffffffff8138bcaf>] __delay+0xf/0x20
[ 612.022002] [<ffffffff810d6386>] do_raw_spin_lock+0x96/0x140
[ 612.022002] [<ffffffff8175d58e>] _raw_spin_lock+0xe/0x10
[ 612.022002] [<ffffffff812f4cb1>] sem_lock_and_putref+0x11/0x70
[ 612.022002] [<ffffffff812f59df>] SYSC_semtimedop+0x7bf/0x960
[ 612.022002] [<ffffffff811becd6>] ? handle_mm_fault+0xbf6/0x1880
[ 612.022002] [<ffffffff8175d50e>] ? _raw_spin_unlock+0xe/0x10
[ 612.022002] [<ffffffff81194e16>] ? free_pcppages_bulk+0x436/0x5a0
[ 612.022002] [<ffffffff810633fa>] ? __do_page_fault+0x19a/0x430
[ 612.022002] [<ffffffff811e4866>] ? kfree_debugcheck+0x16/0x40
[ 612.022002] [<ffffffff810633fa>] ? __do_page_fault+0x19a/0x430
[ 612.022002] [<ffffffff8112e5e8>] ? __audit_syscall_entry+0xa8/0x100
[ 612.022002] [<ffffffff81021686>] ? do_audit_syscall_entry+0x66/0x70
[ 612.022002] [<ffffffff810217c9>] ? syscall_trace_enter_phase1+0x139/0x160
[ 612.022002] [<ffffffff812f5b8e>] SyS_semtimedop+0xe/0x10
[ 612.022002] [<ffffffff812f36f0>] SyS_semop+0x10/0x20
[ 612.022002] [<ffffffff8175d7ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 612.022002] Code: c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 55 48 89 e5 0f 31 <89> c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9 c3 66 2e 0f 1f 84
[ 612.022002] Kernel panic - not syncing: softlockup: hung tasks

Then in another build with the spin_unlock_wait in, I left the test case running
for some hours, I didn't see issues. In theory also I expect there is no more
problems.

I'll submit a new version of the patch, it's same diff as above but without the
comment.

>
> Best regards,
> Manfred

Thanks,
--
[]'s
Herton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/