Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

From: Xu, Yanfei
Date: Tue May 25 2021 - 22:23:47 EST




On 5/25/21 10:28 PM, Paul E. McKenney wrote:
[Please note: This e-mail is from an EXTERNAL e-mail address]

On Tue, May 25, 2021 at 06:24:10PM +0800, Xu, Yanfei wrote:


On 5/25/21 11:33 AM, Paul E. McKenney wrote:
[Please note: This e-mail is from an EXTERNAL e-mail address]

On Tue, May 25, 2021 at 10:31:55AM +0800, Xu, Yanfei wrote:


On 5/25/21 6:46 AM, Paul E. McKenney wrote:
[Please note: This e-mail is from an EXTERNAL e-mail address]

On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
On Fri, May 21, 2021 at 7:29 PM syzbot
<syzbot+7b2b13f4943374609532@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Hello,

syzbot found the following issue on:

HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
git tree: bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+7b2b13f4943374609532@xxxxxxxxxxxxxxxxxxxxxxxxx

This looks rcu-related. +rcu mailing list

I think I see a possible cause for this, and will say more after some
testing and after becoming more awake Monday morning, Pacific time.

No joy. From what I can see, within RCU Tasks Trace, the calls to
get_task_struct() are properly protected (either by RCU or by an earlier
get_task_struct()), and the calls to put_task_struct() are balanced by
those to get_task_struct().

I could of course have missed something, but at this point I am suspecting
an unbalanced put_task_struct() has been added elsewhere.

As always, extra eyes on this code would be a good thing.

If it were reproducible, I would of course suggest bisection. :-/

Thanx, Paul

Hi Paul,

Could it be?

CPU1 CPU2
trc_add_holdout(t, bhp)
//t->usage==2
release_task
put_task_struct_rcu_user
delayed_put_task_struct
......
put_task_struct(t)
//t->usage==1

check_all_holdout_tasks_trace
->trc_wait_for_one_reader
->trc_del_holdout
->put_task_struct(t)
//t->usage==0 and task_struct freed
READ_ONCE(t->trc_reader_checked)
//ops, t had been freed.

So, after excuting trc_wait_for_one_reader(), task might had been removed
from holdout list and the corresponding task_struct was freed.
And we shouldn't do READ_ONCE(t->trc_reader_checked).

I was suspicious of that call to trc_del_holdout() from within
trc_wait_for_one_reader(), but the only time it executes is in the
context of the current running task, which means that CPU 2 had better
not be invoking release_task() on it just yet.

Or am I missing your point?

Two times.
1. the task is current.

trc_wait_for_one_reader
->trc_del_holdout

This one should be fine because the task cannot be freed until it
actually exits, and the grace-period kthread never exits. But it
could also be removed without any problem that I see. >

Agree, current task's task_struct should be high probably safe. If you think it is safe to remove, I prefer to remove it. Because it can make trc_wait_for_one_reader's behavior about deleting task from holdout more unified. And there should be a very small racy that the task is checked as a current and then turn into a exiting task before its task_struct is accessed in trc_wait_for_one_reader or check_all_holdout_tasks_trace.(or I misunderstand something about rcu tasks)

2. task isn't current.

trc_wait_for_one_reader
->get_task_struct
->try_invoke_on_locked_down_task(trc_inspect_reader)
->trc_del_holdout
->put_task_struct

Ah, this one is more interesting, thank you!

Yes, it is safe from the list's viewpoint to do the removal in the
trc_inspect_reader() callback, but you are right that the grace-period
kthread may touch the task structure after return, and there might not
be anything else holding that task structure in place.

Of course, if you can reproduce it, the following patch might be

Sorry...I can't reproduce it, just analyse syzbot's log. :(

Well, if it could be reproduced, that would mean that it was too easy,
wouldn't it? ;-)

Ha ;-)

How about the (untested) patch below, just to make sure that we are
talking about the same thing? I have started testing, but then
again, I have not yet been able to reproduce this, either.

Thanx, Paul


Yes! we are talking the same thing, Should I send a new patch?

Thanks,
Yanfei

------------------------------------------------------------------------

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index efb8127f3a36..8b25551d10db 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -957,10 +957,9 @@ static bool trc_inspect_reader(struct task_struct *t, void *arg)
in_qs = likely(!t->trc_reader_nesting);
}

- // Mark as checked. Because this is called from the grace-period
- // kthread, also remove the task from the holdout list.
+ // Mark as checked so that the grace-period kthread will
+ // remove it from the holdout list.
t->trc_reader_checked = true;
- trc_del_holdout(t);

if (in_qs)
return true; // Already in quiescent state, done!!!