Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs

From: Paul E. McKenney
Date: Wed Jun 17 2020 - 23:18:45 EST


On Wed, Jun 17, 2020 at 05:52:14PM -0700, Matthew Wilcox wrote:
> On Wed, Jun 17, 2020 at 04:46:09PM -0700, Paul E. McKenney wrote:
> > > + // Handle two first channels.
> > > + for (i = 0; i < FREE_N_CHANNELS; i++) {
> > > + for (; bkvhead[i]; bkvhead[i] = bnext) {
> > > + bnext = bkvhead[i]->next;
> > > + debug_rcu_bhead_unqueue(bkvhead[i]);
> > > +
> > > + rcu_lock_acquire(&rcu_callback_map);
> > > + if (i == 0) { // kmalloc() / kfree().
> > > + trace_rcu_invoke_kfree_bulk_callback(
> > > + rcu_state.name, bkvhead[i]->nr_records,
> > > + bkvhead[i]->records);
> > > +
> > > + kfree_bulk(bkvhead[i]->nr_records,
> > > + bkvhead[i]->records);
> > > + } else { // vmalloc() / vfree().
> > > + for (j = 0; j < bkvhead[i]->nr_records; j++) {
> > > + trace_rcu_invoke_kfree_callback(
> > > + rcu_state.name,
> > > + bkvhead[i]->records[j], 0);
> > > +
> > > + vfree(bkvhead[i]->records[j]);
> > > + }
> > > + }
> > > + rcu_lock_release(&rcu_callback_map);
> >
> > Not an emergency, but did you look into replacing this "if" statement
> > with an array of pointers to functions implementing the legs of the
> > "if" statement? If nothing else, this would greatly reduced indentation.
>
> I don't think that replacing direct function calls with indirect function
> calls is a great suggestion with the current state of play around branch
> prediction.
>
> I'd suggest:
>
> rcu_lock_acquire(&rcu_callback_map);
> trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> bkvhead[i]->nr_records, bkvhead[i]->records);
> if (i == 0) {
> kfree_bulk(bkvhead[i]->nr_records,
> bkvhead[i]->records);
> } else {
> for (j = 0; j < bkvhead[i]->nr_records; j++) {
> vfree(bkvhead[i]->records[j]);
> }
> }
> rcu_lock_release(&rcu_callback_map);
>
> But I'd also suggest a vfree_bulk be added. There are a few things
> which would be better done in bulk as part of the vfree process
> (we batch them up already, but i'm sure we could do better).

I suspect that he would like to keep the tracing.

It might be worth trying the branches, given that they would be constant
and indexed by "i". The compiler might well remove the indirection.

The compiler guys brag about doing so, which of course might or might
not have any correlation to a given compiler actually doing so. :-/

Having a vfree_bulk() might well be useful, but I would feel more
confidence in that if there were other callers of kfree_bulk().

But again, either way, future work as far as this series is concerned.

Thanx, Paul