Re: [PATCH v2 01/17] add support for Clang CFI

From: Paul E. McKenney
Date: Fri Mar 19 2021 - 13:04:53 EST


On Fri, Mar 19, 2021 at 09:17:14AM -0700, Sami Tolvanen wrote:
> On Fri, Mar 19, 2021 at 6:52 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Fri, Mar 19, 2021 at 01:26:59PM +0100, Peter Zijlstra wrote:
> > > On Thu, Mar 18, 2021 at 04:48:43PM -0700, Sami Tolvanen wrote:
> > > > On Thu, Mar 18, 2021 at 3:29 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Mar 18, 2021 at 10:10:55AM -0700, Sami Tolvanen wrote:
> > > > > > +static void update_shadow(struct module *mod, unsigned long base_addr,
> > > > > > + update_shadow_fn fn)
> > > > > > +{
> > > > > > + struct cfi_shadow *prev;
> > > > > > + struct cfi_shadow *next;
> > > > > > + unsigned long min_addr, max_addr;
> > > > > > +
> > > > > > + next = vmalloc(SHADOW_SIZE);
> > > > > > +
> > > > > > + mutex_lock(&shadow_update_lock);
> > > > > > + prev = rcu_dereference_protected(cfi_shadow,
> > > > > > + mutex_is_locked(&shadow_update_lock));
> > > > > > +
> > > > > > + if (next) {
> > > > > > + next->base = base_addr >> PAGE_SHIFT;
> > > > > > + prepare_next_shadow(prev, next);
> > > > > > +
> > > > > > + min_addr = (unsigned long)mod->core_layout.base;
> > > > > > + max_addr = min_addr + mod->core_layout.text_size;
> > > > > > + fn(next, mod, min_addr & PAGE_MASK, max_addr & PAGE_MASK);
> > > > > > +
> > > > > > + set_memory_ro((unsigned long)next, SHADOW_PAGES);
> > > > > > + }
> > > > > > +
> > > > > > + rcu_assign_pointer(cfi_shadow, next);
> > > > > > + mutex_unlock(&shadow_update_lock);
> > > > > > + synchronize_rcu_expedited();
> > > > >
> > > > > expedited is BAD(tm), why is it required and why doesn't it have a
> > > > > comment?
> > > >
> > > > Ah, this uses synchronize_rcu_expedited() because we have a case where
> > > > synchronize_rcu() hangs here with a specific SoC family after the
> > > > vendor's cpu_pm driver powers down CPU cores.
> > >
> > > Broken vendor drivers seem like an exceedingly poor reason for this.
> >
> > The vendor is supposed to make sure that RCU sees the CPU cores as either
> > deep idle or offline before powering them down. My guess is that the
> > CPU is powered down, but RCU (and probably much else in the system)
> > thinks that the CPU is still up and running. So I bet that you are
> > seeing other issues as well.
> >
> > I take it that the IPIs from synchronize_rcu_expedited() have the effect
> > of momentarily powering up those CPUs?
>
> I suspect you're correct. I'll change this to use synchronize_rcu() in v3.

You might also suggest to the vendor that they look for a missing
rcu_idle_enter(), rcu_irq_exit(), or similar on the code path that the
outgoing CPUs follow before getting powered down. That way, they won't
be wasting power from irrelevant IPIs. You see, RCU will eventually
send IPIs to non-responding CPUs for normal grace periods.

Thanx, Paul