Re: BUG in alloc_workqueue (linux-next)
From: Pavel Skripkin
Date: Fri Jul 09 2021 - 02:58:03 EST
On Fri, 9 Jul 2021 11:59:01 +0800
Lai Jiangshan <jiangshanlai@xxxxxxxxx> wrote:
> Hello, Pavel
> Thanks for the report.
>
> Huawei (CC-ed) is also dealing with the problem:
> https://lore.kernel.org/lkml/20210708093136.2195752-1-yangyingliang@xxxxxxxxxx/t/#u
>
>
> Could you have a try on the fix, please?
>
> Thanks
> Lai
>
Hi, Lai!
I am going to apply this patch to my local tree and let syzbot test the
fix for a day. Will reply to this email with results tomorrow :)
With regards,
Pavel Skripkin
> On Thu, Jul 8, 2021 at 9:24 PM Pavel Skripkin <paskripkin@xxxxxxxxx>
> wrote:
>
> >
> > I've spent some time trying to came up with a fix, but I gave
> > up :( But! I have an idea about what's happening, maybe it will help
> > somehow...
> >
> >
> > So, all 3 reports have same stack trace: alloc_workqueue() in
> > loop_configure(). I skimmed through syzbot's log and found, that
> > syzbot injected failure into alloc_unbound_pwq() in all 3 cases:
> >
> > FAULT_INJECTION: forcing a failure.
> > name failslab, interval 1, probability 0, space 0, times 0
> > CPU: 1 PID: 17986 Comm: syz-executor.0 Tainted: G W
> > 5.13.0-next-20210706 #9 Hardware name: QEMU Standard PC (i440FX +
> > PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org
> > 04/01/2014 Call Trace: dump_stack_lvl (lib/dump_stack.c:106
> > (discriminator 4)) should_fail.cold (lib/fault-inject.c:52
> > lib/fault-inject.c:146) should_failslab (mm/slab_common.c:1327)
> > kmem_cache_alloc_node (mm/slab.h:487 mm/slub.c:2902
> > mm/slub.c:3017) ? alloc_unbound_pwq (kernel/workqueue.c:3813)
> > alloc_unbound_pwq (kernel/workqueue.c:3813)
> > apply_wqattrs_prepare (kernel/workqueue.c:3963)
> > apply_workqueue_attrs_locked (kernel/workqueue.c:4041)
> > alloc_workqueue (kernel/workqueue.c:4078 kernel/workqueue.c:4201
> > kernel/workqueue.c:4309)
> >
> >
> > So, if alloc_unbound_pwq() fails, apply_wqattrs_prepare() will jump
> > to this code:
> >
> > out_free:
> > free_workqueue_attrs(tmp_attrs);
> > free_workqueue_attrs(new_attrs);
> > apply_wqattrs_cleanup(ctx); <----|
> > return NULL; |
> > |
> > put_pwq_unlocked() -> put_pwq() ->
> > schedule_work(&pwq->unbound_release_work);
> >
> >
> > and apply_wqattrs_cleanup() will schedule
> > pwq_unbound_release_workfn() [2], but alloc_workqueue() will free
> > workqueue_struct in case of alloc_unbound_pwq() error [1]. In that
> > case we will get UAF in pwq_unbound_release_workfn() like in 3rd
> > report.
> >
> >
> > Does written above make some sence? :)
> >
> >
> >
> > With regards,
> > Pavel Skripkin