Re: [PATCH] net: sched: use-after-free in tcf_action_destroy

From: Alex Young
Date: Sat Aug 17 2024 - 05:27:45 EST


Hi Jamal,

Thanks your mention. I have reviewed the latest kernel code.
I understand why these two tc function threads can enter the kernel at the same
time. It's because the request_module[2] function in tcf_action_init_1. When the
tc_action_init_1 function to add a new action, it will load the action
module. It will
call rtnl_unlock to let the Thread2 into the kernel space.

Thread1 Thread2
rtnetlink_rcv_msg rtnetlink_rcv_msg
rtnl_lock();
tcf_action_init
for(i;i<TCA_ACT_MAX_PRIO;i++)
act=tcf_action_init_1 //[1]
if (rtnl_held)
rtnl_unlock(); //[2]
request_module("act_%s", act_name);

tcf_del_walker

idr_for_each_entry_ul(idr,p,id)

__tcf_idr_release(p,false,true)

free_tcf(p) //[3]
if (rtnl_held)
rtnl_lock();

if(IS_ERR(act))
goto err
actions[i] = act

err:
tcf_action_destroy
a=actions[i]
ops = a->ops //[4]
I know this time window is small, but it can indeed cause the bug. And
in the latest
kernel, it have fixed the bug. But version 4.19.x is still a
maintenance version.
Is there anyone who can introduce this change into version 4.19.

Best regards,
Alex

Jamal Hadi Salim <jhs@xxxxxxxxxxxx> 于2024年8月16日周五 23:04写道:

>
> On Fri, Aug 16, 2024 at 12:06 AM Jamal Hadi Salim <jhs@xxxxxxxxxxxx> wrote:
> >
> > On Thu, Aug 15, 2024 at 9:54 PM yangzhuorao <alex000young@xxxxxxxxx> wrote:
> > >
> > > There is a uaf bug in net/sched/act_api.c.
> > > When Thread1 call [1] tcf_action_init_1 to alloc act which saves
> > > in actions array. If allocation failed, it will go to err path.
> > > Meanwhile thread2 call tcf_del_walker to delete action in idr.
> > > So thread 1 in err path [3] tcf_action_destroy will cause
> > > use-after-free read bug.
> > >
> > > Thread1 Thread2
> > > tcf_action_init
> > > for(i;i<TCA_ACT_MAX_PRIO;i++)
> > > act=tcf_action_init_1 //[1]
> > > if(IS_ERR(act))
> > > goto err
> > > actions[i] = act
> > > tcf_del_walker
> > > idr_for_each_entry_ul(idr,p,id)
> > > __tcf_idr_release(p,false,true)
> > > free_tcf(p) //[2]
> > > err:
> > > tcf_action_destroy
> > > a=actions[i]
> > > ops = a->ops //[3]
> > >
> > > We add lock and unlock in tcf_action_init and tcf_del_walker function
> > > to aviod race condition.
> > >
>
> Hi Alex,
>
> Thanks for your valiant effort, unfortunately there's nothing to fix
> here for the current kernels.
> For your edification:
>
> This may have been an issue on the 4.x kernels you ran but i dont see
> a valid codepath that would create the kernel parallelism scenario you
> mentioned above (currently or ever actually). Kernel entry is
> syncronized from user space via the rtnetlink lock - meaning you cant
> have two control plane threads (as you show in your nice diagram above
> in your commit) entering from user space in parallel to trigger the
> bug.
>
> I believe the reason it happens in 4.x is there is likely a bug(hand
> waving here) where within a short window upon a) creating a batch of
> actions of the same kind b) followed by partial updates of said action
> you then c) flush that action kind. Theory is the flush will trigger
> the bug. IOW, it is not parallel but rather a single entry. I didnt
> have time to look but whatever this bug is was most certainly fixed at
> some point - perhaps nobody bothered to backport. If this fix is so
> important to you please dig into newer kernels and backport.
>
> There are other technical issues with your solution but I hope the above helps.
> The repro doesnt cause any issues in newer kernels - but please verify
> for yourself as well.
>
> So from me, I am afraid, this is a nack
>
> cheers,
> jamal