Re: [PATCH 1/1] rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale

From: Joel Fernandes
Date: Tue Mar 14 2023 - 19:30:12 EST


On Tue, Mar 14, 2023 at 7:02 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Mon, Mar 13, 2023 at 04:04:03PM +0800, Qiuxu Zhuo wrote:
> > When running the 'kfree_rcu_test' test case with commands [1] the call
> > trace [2] was thrown. This was because the kfree_scale_thread thread(s)
> > still run after unloading rcuscale and torture modules. Fix the call
> > trace by invoking kfree_scale_cleanup() when removing the rcuscale module.
>
> Good catch, thank you!
>
> > [1] modprobe torture
> > modprobe rcuscale kfree_rcu_test=1
>
> Given that loading the rcuscale kernel module automatically pulls in
> the rcuscale kernel module, why the "modprobe torture"? Is doing the
> modprobes separately like this necessary to reproduce this bug?
>
> If it reproduces without manually loading and unloading the "torture"
> kernel module, could you please update the commit log to show that
> smaller reproducer?
>
> > // After some time
> > rmmod rcuscale
> > rmmod torture
> >
> > [2] BUG: unable to handle page fault for address: ffffffffc0601a87
> > #PF: supervisor instruction fetch in kernel mode
> > #PF: error_code(0x0010) - not-present page
> > PGD 11de4f067 P4D 11de4f067 PUD 11de51067 PMD 112f4d067 PTE 0
> > Oops: 0010 [#1] PREEMPT SMP NOPTI
[..]
> > Call Trace:
> > <TASK>
> > ? kvfree_call_rcu+0xf0/0x3a0
> > ? kthread+0xf3/0x120
> > ? kthread_complete_and_exit+0x20/0x20
> > ? ret_from_fork+0x1f/0x30
> > </TASK>
> > Modules linked in: rfkill sunrpc ... [last unloaded: torture]
> > CR2: ffffffffc0601a87
> > ---[ end trace 0000000000000000 ]---
> >
> > Fixes: e6e78b004fa7 ("rcuperf: Add kfree_rcu() performance Tests")
> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>
> > ---
> > kernel/rcu/rcuscale.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
> > index 91fb5905a008..5e580cd08c58 100644
> > --- a/kernel/rcu/rcuscale.c
> > +++ b/kernel/rcu/rcuscale.c
> > @@ -522,6 +522,8 @@ rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag)
> > scale_type, tag, nrealreaders, nrealwriters, verbose, shutdown);
> > }
> >
> > +static void kfree_scale_cleanup(void);
> > +
>
> I do applaud minmimizing the size of the patch, but in this case could you
> please pull the kfree_scale_cleanup() function ahead of its first use?

The only trouble with moving the function like that is, the file is
mostly split across kfree and non-kfree functions. So moving a kfree
function to be among the non-kfree ones would look a bit weird.
Perhaps a better place for the function declaration could be a new
"rcuscale.h". But I am really Ok with Paul's suggestion as well.

Reviewed-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>

thanks,

- Joel


>
> Thanx, Paul
>
> > static void
> > rcu_scale_cleanup(void)
> > {
> > @@ -542,6 +544,11 @@ rcu_scale_cleanup(void)
> > if (gp_exp && gp_async)
> > SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
> >
> > + if (kfree_rcu_test) {
> > + kfree_scale_cleanup();
> > + return;
> > + }
> > +
> > if (torture_cleanup_begin())
> > return;
> > if (!cur_ops) {
> > --
> > 2.17.1
> >